Last data update: 2014.03.03

R: Standardized Differences for Stratified Comparisons
xBalanceR Documentation

Standardized Differences for Stratified Comparisons

Description

Given covariates, a treatment variable, and a stratifying factor, calculates standardized mean differences along each covariate, with and without the stratification and tests for conditional independence of the treatment variable and the covariates within strata.

Usage

xBalance(fmla, strata = list(unstrat = NULL), data, report = c("std.diffs",
  "z.scores", "adj.means", "adj.mean.diffs", "adj.mean.diffs.null.sd",
  "chisquare.test", "p.values", "all")[1:2], stratum.weights = harmonic,
  na.rm = FALSE, covariate.scaling = NULL, normalize.weights = TRUE,
  impfn = median, post.alignment.transform = NULL)

Arguments

fmla

A formula containing an indicator of treatment assignment on the left hand side and covariates at right.

strata

A list of right-hand-side-only formulas containing the factor(s) identifying the strata, with NULL entries interpreted as no stratification; or a factor with length equal to the number of rows in data; or a data frame of such factors. See below for examples.

data

A data frame in which fmla and strata are to be evaluated.

report

Character vector listing measures to report for each stratification; a subset of c("adj.means", "adj.mean.diffs", "adj.mean.diffs.null.sd", "chisquare.test", "std.diffs", "z.scores", "p.values", "all"). P-values reported are two-sided for the null-hypothesis of no effect. The option "all" requests all measures.

stratum.weights

Weights to be applied when aggregating across strata specified by strata, defaulting to weights proportional to the harmonic mean of treatment and control group sizes within strata. This can be either a function used to calculate the weights or the weights themselves; if strata is a data frame, then it can be such a function, a list of such functions, or a data frame of stratum weighting schemes corresponding to the different stratifying factors of strata. See details.

na.rm

Whether to remove rows with NAs on any variables mentioned on the RHS of fmla (i.e. listwise deletion). Defaults to FALSE, wherein rows aren't deleted but for each variable with NAs a missing-data indicator variable is added to the variables on which balance is calculated and medians are imputed for the variable with missing data (in RItools versions 0.1-9 and before the default imputation was the mean, in RItools versions 0.1-11 and henceforth the default is the median). See the example below.

covariate.scaling

A scale factor to apply to covariates in calculating std.diffs. If NULL, xBalance pools standard deviations of each variable in the treatment and control group (defining these groups according to whether the LHS of formula is greater than or equal to 0). Also, see details.

normalize.weights

If TRUE, then stratum weights are normalized so as to sum to 1. Defaults to TRUE.

impfn

A function to impute missing values when na.rm=FALSE. Currently median. To impute means use mean.default.

post.alignment.transform

Optional transformation applied to covariates just after their stratum means are subtracted off.

Details

In the unstratified case, the standardized difference of covariate means is the mean in the treatment group minus the mean in the control group, divided by the S.D. (standard deviation) in the same variable estimated by pooling treatment and control group S.D.s on the same variable. In the stratified case, the denominator of the standardized difference remains the same but the numerator is a weighted average of within-stratum differences in means on the covariate. By default, each stratum is weighted in proportion to the harmonic mean 1/[(1/a + 1/b)/2]=2*a*b/(a+b) of the number of treated units (a) and control units (b) in the stratum; this weighting is optimal under certain modeling assumptions (discussed in Kalton 1968, Hansen and Bowers 2008). This weighting can be modified using the stratum.weights argument; see below.

When the treatment variable, the variable specified by the left-hand side of fmla, is not binary, xBalance calculates the covariates' regressions on the treatment variable, in the stratified case pooling these regressions across strata using weights that default to the stratum-wise sum of squared deviations of the treatment variable from its stratum mean. (Applied to binary treatment variables, this recipe gives the same result as the one given above.) In the numerator of the standardized difference, we get a “pooled S.D.” from separating units into two groups, one in which the treatment variable is 0 or less and another in which it is positive. If report includes "adj.means", covariate means for the former of these groups are reported, along with the sums of these means and the covariates' regressions on either the treatment variable, in the unstratified (“pre”) case, or the treatment variable and the strata, in the stratified (“post”) case.

stratum.weights can be either a function or a numeric vector of weights. If it is a numeric vector, it should be non-negative and it should have stratum names as its names. (i.e., its names should be equal to the levels of the factor specified by strata.) If it is a function, it should accept one argument, a data frame containing the variables in data and additionally Tx.grp and stratum.code, and return a vector of non-negative weights with stratum codes as names; for an example, do getFromNamespace("harmonic", "RItools").

If covariate.scaling is not NULL, no scaling is applied. This behavior is likely to change in future versions. (If you want no scaling, set covariate.scaling=1, as this is likely to retain this meaning in the future.)

adj.mean.diffs.null.sd returns the standard deviation of the Normal approximated randomization distribution of the strata-adjusted difference of means under the strict null of no effect.

Value

An object of class c("xbal", "list"). There are plot, print, and xtable methods for class "xbal"; the print method is demonstrated in the examples.

Note

Evidence pertaining to the hypothesis that a treatment variable is not associated with differences in covariate values is assessed by comparing the differences of means (or regression coefficients), without standardization, to their distributions under hypothetical shuffles of the treatment variable, a permutation or randomization distribution. For the unstratified comparison, this reference distribution consists of differences (more generally, regression coefficients) when the treatment variable is permuted without regard to strata. For the stratified comparison, the reference distribution is determined by randomly permuting the treatment variable within strata, then re-calculating the treatment-control differences (regressions of each covariate on the permuted treatment variable). Significance assessments are based on the large-sample Normal approximation to these reference distributions.

Author(s)

Ben Hansen and Jake Bowers and Mark Fredrickson

References

Hansen, B.B. and Bowers, J. (2008), “Covariate Balance in Simple, Stratified and Clustered Comparative Studies,” Statistical Science 23.

Kalton, G. (1968), “Standardization: A technique to control for extraneous variables,” Applied Statistics 17, 118–136.

Examples

data(nuclearplants)
##No strata, default output
xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
         data=nuclearplants)

##No strata, all output
xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
         data=nuclearplants,
         report=c("all"))

##Stratified, all output
xBalance(pr~.-cost-pt, strata=factor(nuclearplants$pt),
         data=nuclearplants,
         report=c("adj.means", "adj.mean.diffs",
                  "adj.mean.diffs.null.sd",
                  "chisquare.test", "std.diffs",
                  "z.scores", "p.values"))

##Comparing unstratified to stratified, just adjusted means and
#omnibus test
xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
         strata=list(unstrat=NULL, pt=~pt),
         data=nuclearplants,
         report=c("adj.means", "chisquare.test"))

##Comparing unstratified to stratified, just adjusted means and
#omnibus test
xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
         strata=data.frame(unstrat=factor('none'),
           pt=factor(nuclearplants$pt)),
         data=nuclearplants,
         report=c("adj.means", "chisquare.test"))

##Missing data handling.
testdata<-nuclearplants
testdata$date[testdata$date<68]<-NA

##na.rm=FALSE by default
xBalance(pr ~ date, data = testdata, report="all")
xBalance(pr ~ date, data = testdata, na.rm = TRUE,report="all")

##To match versions of RItools 0.1-9 and older, impute means
#rather than medians.
##Not run, impfn option is not implemented in the most recent version
## Not run: xBalance(pr ~ date, data = testdata, na.rm = FALSE,
           report="all", impfn=mean.default)
## End(Not run)

##Comparing unstratified to stratified, just one-by-one wilcoxon
#rank sum tests and omnibus test of multivariate differences on
#rank scale.
xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
         strata=data.frame(unstrat=factor('none'),
           pt=factor(nuclearplants$pt)),
         data=nuclearplants,
         report=c("adj.means", "chisquare.test"),
	 post.alignment.transform=rank)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(RItools)
Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/RItools/xBalance.Rd_%03d_medium.png", width=480, height=480)
> ### Name: xBalance
> ### Title: Standardized Differences for Stratified Comparisons
> ### Aliases: xBalance
> ### Keywords: design nonparametric
> 
> ### ** Examples
> 
> data(nuclearplants)
> ##No strata, default output
> xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
+          data=nuclearplants)
      strata  unstrat                
      stat   std.diff    z           
vars                                 
date          -0.1147 -0.3052        
t1            0.1063  0.2830         
t2            1.0327  2.4674  *      
cap           0.3401  0.8948         
ne            -0.1631 -0.4334        
ct            -0.3080 -0.8121        
bw            0.0451  0.1202         
cum.n         -0.0976 -0.2598        
> 
> ##No strata, all output
> xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
+          data=nuclearplants,
+          report=c("all"))
      strata  unstrat                                                              
      stat       pr=0     pr=1 adj.diff adj.diff.null.sd std.diff     z            
vars                                                                               
date         68.6182  68.5000  -0.1182          0.3872   -0.1147  -0.3052          
t1           13.6364  14.0000  0.3636           1.2852   0.1063   0.2830           
t2           59.3182  69.1000  9.7818           3.9644   1.0327   2.4674   *       
cap          805.1818 869.8000 64.6182          72.2187  0.3401   0.8948           
ne           0.2727   0.2000   -0.0727          0.1678   -0.1631  -0.4334          
ct           0.4545   0.3000   -0.1545          0.1903   -0.3080  -0.8121          
bw           0.1818   0.2000   0.0182           0.1512   0.0451   0.1202           
cum.n        8.7273   8.1000   -0.6273          2.4140   -0.0976  -0.2598          
---Overall Test---
        chisquare df p.value
unstrat      11.5  8   0.177
---
Signif. codes:  0 '***' 0.001 '** ' 0.01 '*  ' 0.05 '.  ' 0.1 '   ' 1 
> 
> ##Stratified, all output
> xBalance(pr~.-cost-pt, strata=factor(nuclearplants$pt),
+          data=nuclearplants,
+          report=c("adj.means", "adj.mean.diffs",
+                   "adj.mean.diffs.null.sd",
+                   "chisquare.test", "std.diffs",
+                   "z.scores", "p.values"))
      strata    strat                                                              
      stat       pr=0     pr=1 adj.diff adj.diff.null.sd std.diff     z            
vars                                                                               
date         68.4924  68.5903  0.0979           0.3350   0.0950   0.2923           
t1           13.3953  14.3488  0.9535           1.2042   0.2787   0.7918           
t2           59.1802  68.5523  9.3721           4.0583   0.9894   2.3094   *       
cap          806.4535 873.0233 66.5698          72.4498  0.3504   0.9188           
ne           0.2442   0.2209   -0.0233          0.1609   -0.0522  -0.1445          
ct           0.4419   0.3314   -0.1105          0.1895   -0.2201  -0.5828          
bw           0.1977   0.1512   -0.0465          0.1506   -0.1154  -0.3088          
cum.n        8.7209   7.9012   -0.8198          2.4130   -0.1275  -0.3397          
---Overall Test---
      chisquare df p.value
strat      10.8  8   0.215
---
Signif. codes:  0 '***' 0.001 '** ' 0.01 '*  ' 0.05 '.  ' 0.1 '   ' 1 
> 
> ##Comparing unstratified to stratified, just adjusted means and
> #omnibus test
> xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
+          strata=list(unstrat=NULL, pt=~pt),
+          data=nuclearplants,
+          report=c("adj.means", "chisquare.test"))
      strata  unstrat                         pt                  
      stat       pr=0     pr=1              pr=0     pr=1         
vars                                                              
date         68.6182  68.5000           68.4924  68.5903          
t1           13.6364  14.0000           13.3953  14.3488          
t2           59.3182  69.1000  *        59.1802  68.5523  *       
cap          805.1818 869.8000          806.4535 873.0233         
ne           0.2727   0.2000            0.2442   0.2209           
ct           0.4545   0.3000            0.4419   0.3314           
bw           0.1818   0.2000            0.1977   0.1512           
cum.n        8.7273   8.1000            8.7209   7.9012           
---Overall Test---
        chisquare df p.value
unstrat      11.5  8   0.177
pt           10.8  8   0.215
---
Signif. codes:  0 '***' 0.001 '** ' 0.01 '*  ' 0.05 '.  ' 0.1 '   ' 1 
> 
> ##Comparing unstratified to stratified, just adjusted means and
> #omnibus test
> xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
+          strata=data.frame(unstrat=factor('none'),
+            pt=factor(nuclearplants$pt)),
+          data=nuclearplants,
+          report=c("adj.means", "chisquare.test"))
      strata  unstrat                         pt                  
      stat       pr=0     pr=1              pr=0     pr=1         
vars                                                              
date         68.6182  68.5000           68.4924  68.5903          
t1           13.6364  14.0000           13.3953  14.3488          
t2           59.3182  69.1000  *        59.1802  68.5523  *       
cap          805.1818 869.8000          806.4535 873.0233         
ne           0.2727   0.2000            0.2442   0.2209           
ct           0.4545   0.3000            0.4419   0.3314           
bw           0.1818   0.2000            0.1977   0.1512           
cum.n        8.7273   8.1000            8.7209   7.9012           
---Overall Test---
        chisquare df p.value
unstrat      11.5  8   0.177
pt           10.8  8   0.215
---
Signif. codes:  0 '***' 0.001 '** ' 0.01 '*  ' 0.05 '.  ' 0.1 '   ' 1 
> 
> ##Missing data handling.
> testdata<-nuclearplants
> testdata$date[testdata$date<68]<-NA
> 
> ##na.rm=FALSE by default
> xBalance(pr ~ date, data = testdata, report="all")
            strata unstrat                                                           
            stat      pr=0    pr=1 adj.diff adj.diff.null.sd std.diff    z           
vars                                                                                 
date               68.9023 68.9760  0.0737           0.2876   0.0963  0.2564         
date.NATRUE        0.2273  0.4000   0.1727           0.1742   0.3780  0.9914         
---Overall Test---
        chisquare df p.value
unstrat      1.15  2   0.563
---
Signif. codes:  0 '***' 0.001 '** ' 0.01 '*  ' 0.05 '.  ' 0.1 '   ' 1 
> xBalance(pr ~ date, data = testdata, na.rm = TRUE,report="all")
     strata unstrat                                                         
     stat      pr=0   pr=1 adj.diff adj.diff.null.sd std.diff    z          
vars                                                                        
date         68.947 69.127   0.180            0.420    0.195   0.427        
---Overall Test---
        chisquare df p.value
unstrat     0.183  1   0.669
---
Signif. codes:  0 '***' 0.001 '** ' 0.01 '*  ' 0.05 '.  ' 0.1 '   ' 1 
> 
> ##To match versions of RItools 0.1-9 and older, impute means
> #rather than medians.
> ##Not run, impfn option is not implemented in the most recent version
> ## Not run: 
> ##D xBalance(pr ~ date, data = testdata, na.rm = FALSE,
> ##D            report="all", impfn=mean.default)
> ## End(Not run)
> 
> ##Comparing unstratified to stratified, just one-by-one wilcoxon
> #rank sum tests and omnibus test of multivariate differences on
> #rank scale.
> xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
+          strata=data.frame(unstrat=factor('none'),
+            pt=factor(nuclearplants$pt)),
+          data=nuclearplants,
+          report=c("adj.means", "chisquare.test"),
+ 	 post.alignment.transform=rank)
      strata  unstrat                         pt                  
      stat       pr=0     pr=1              pr=0     pr=1         
vars                                                              
date         68.6182  68.5000           68.4924  68.5903          
t1           13.6364  14.0000           13.3953  14.3488          
t2           59.3182  69.1000  *        59.1802  68.5523  .       
cap          805.1818 869.8000          806.4535 873.0233         
ne           0.2727   0.2000            0.2442   0.2209           
ct           0.4545   0.3000            0.4419   0.3314           
bw           0.1818   0.2000            0.1977   0.1512           
cum.n        8.7273   8.1000            8.7209   7.9012           
---Overall Test---
        chisquare df p.value
unstrat     10.29  8   0.245
pt           7.31  8   0.503
---
Signif. codes:  0 '***' 0.001 '** ' 0.01 '*  ' 0.05 '.  ' 0.1 '   ' 1 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>