R: Standardized Differences for Stratified Comparisons
xBalance
R Documentation
Standardized Differences for Stratified Comparisons
Description
Given covariates, a treatment variable, and a stratifying factor,
calculates standardized mean differences along each covariate,
with and without the stratification and tests for conditional
independence of the treatment variable and the covariates within
strata.
A formula containing an indicator of treatment
assignment on the left hand side and covariates at right.
strata
A list of right-hand-side-only formulas containing
the factor(s) identifying the strata, with NULL entries
interpreted as no stratification; or a factor with length equal
to the number of rows in data; or a data frame of such
factors. See below for examples.
data
A data frame in which fmla and strata
are to be evaluated.
report
Character vector listing measures to report for each
stratification; a subset of c("adj.means",
"adj.mean.diffs", "adj.mean.diffs.null.sd", "chisquare.test",
"std.diffs", "z.scores", "p.values", "all"). P-values reported
are two-sided for the null-hypothesis of no effect. The option
"all" requests all measures.
stratum.weights
Weights to be applied when aggregating
across strata specified by strata, defaulting to weights
proportional to the harmonic mean of treatment and control group
sizes within strata. This can be either a function used to
calculate the weights or the weights themselves; if
strata is a data frame, then it can be such a function, a
list of such functions, or a data frame of stratum weighting
schemes corresponding to the different stratifying factors of
strata. See details.
na.rm
Whether to remove rows with NAs on any variables
mentioned on the RHS of fmla (i.e. listwise deletion).
Defaults to FALSE, wherein rows aren't deleted but for
each variable with NAs a missing-data indicator variable
is added to the variables on which balance is calculated and
medians are imputed for the variable with missing data (in
RItools versions 0.1-9 and before the default imputation was the
mean, in RItools versions 0.1-11 and henceforth the default is
the median). See the example below.
covariate.scaling
A scale factor to apply to covariates in
calculating std.diffs. If NULL, xBalance
pools standard deviations of each variable in the treatment and
control group (defining these groups according to whether the
LHS of formula is greater than or equal to 0). Also, see
details.
normalize.weights
If TRUE, then stratum weights are
normalized so as to sum to 1. Defaults to TRUE.
impfn
A function to impute missing values when
na.rm=FALSE. Currently median. To impute
means use mean.default.
post.alignment.transform
Optional transformation applied to
covariates just after their stratum means are subtracted off.
Details
In the unstratified case, the standardized difference of covariate
means is the mean in the treatment group minus the mean in the
control group, divided by the S.D. (standard deviation) in the
same variable estimated by pooling treatment and control group
S.D.s on the same variable. In the stratified case, the
denominator of the standardized difference remains the same but
the numerator is a weighted average of within-stratum differences
in means on the covariate. By default, each stratum is weighted
in proportion to the harmonic mean 1/[(1/a +
1/b)/2]=2*a*b/(a+b) of the number of treated units (a) and
control units (b) in the stratum; this weighting is optimal under
certain modeling assumptions (discussed in Kalton 1968, Hansen and
Bowers 2008). This weighting can be modified using the
stratum.weights argument; see below.
When the treatment variable, the variable specified by the
left-hand side of fmla, is not binary, xBalance
calculates the covariates' regressions on the treatment variable,
in the stratified case pooling these regressions across strata
using weights that default to the stratum-wise sum of squared
deviations of the treatment variable from its stratum mean.
(Applied to binary treatment variables, this recipe gives the same
result as the one given above.) In the numerator of the
standardized difference, we get a “pooled S.D.” from separating
units into two groups, one in which the treatment variable is 0 or
less and another in which it is positive. If report
includes "adj.means", covariate means for the former of these
groups are reported, along with the sums of these means and the
covariates' regressions on either the treatment variable, in the
unstratified (“pre”) case, or the treatment variable and the
strata, in the stratified (“post”) case.
stratum.weights can be either a function or a numeric
vector of weights. If it is a numeric vector, it should be
non-negative and it should have stratum names as its names. (i.e.,
its names should be equal to the levels of the factor specified by
strata.) If it is a function, it should accept one
argument, a data frame containing the variables in data and
additionally Tx.grp and stratum.code, and return a
vector of non-negative weights with stratum codes as names; for an
example, do getFromNamespace("harmonic", "RItools").
If covariate.scaling is not NULL, no scaling is
applied. This behavior is likely to change in future versions.
(If you want no scaling, set covariate.scaling=1, as this
is likely to retain this meaning in the future.)
adj.mean.diffs.null.sd returns the standard deviation of
the Normal approximated randomization distribution of the
strata-adjusted difference of means under the strict null of no
effect.
Value
An object of class c("xbal", "list"). There are
plot, print, and xtable methods for class
"xbal"; the print method is demonstrated in the
examples.
Note
Evidence pertaining to the hypothesis that a treatment
variable is not associated with differences in covariate values
is assessed by comparing the differences of means (or regression
coefficients), without standardization, to their distributions
under hypothetical shuffles of the treatment variable, a
permutation or randomization distribution. For the unstratified
comparison, this reference distribution consists of differences
(more generally, regression coefficients) when the treatment
variable is permuted without regard to strata. For the
stratified comparison, the reference distribution is determined
by randomly permuting the treatment variable within strata, then
re-calculating the treatment-control differences (regressions of
each covariate on the permuted treatment variable). Significance
assessments are based on the large-sample Normal approximation
to these reference distributions.
Author(s)
Ben Hansen and Jake Bowers and Mark Fredrickson
References
Hansen, B.B. and Bowers, J. (2008), “Covariate
Balance in Simple, Stratified and Clustered Comparative
Studies,” Statistical Science23.
Kalton, G. (1968), “Standardization: A technique to control for
extraneous variables,” Applied Statistics17,
118–136.
Examples
data(nuclearplants)
##No strata, default output
xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
data=nuclearplants)
##No strata, all output
xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
data=nuclearplants,
report=c("all"))
##Stratified, all output
xBalance(pr~.-cost-pt, strata=factor(nuclearplants$pt),
data=nuclearplants,
report=c("adj.means", "adj.mean.diffs",
"adj.mean.diffs.null.sd",
"chisquare.test", "std.diffs",
"z.scores", "p.values"))
##Comparing unstratified to stratified, just adjusted means and
#omnibus test
xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
strata=list(unstrat=NULL, pt=~pt),
data=nuclearplants,
report=c("adj.means", "chisquare.test"))
##Comparing unstratified to stratified, just adjusted means and
#omnibus test
xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
strata=data.frame(unstrat=factor('none'),
pt=factor(nuclearplants$pt)),
data=nuclearplants,
report=c("adj.means", "chisquare.test"))
##Missing data handling.
testdata<-nuclearplants
testdata$date[testdata$date<68]<-NA
##na.rm=FALSE by default
xBalance(pr ~ date, data = testdata, report="all")
xBalance(pr ~ date, data = testdata, na.rm = TRUE,report="all")
##To match versions of RItools 0.1-9 and older, impute means
#rather than medians.
##Not run, impfn option is not implemented in the most recent version
## Not run: xBalance(pr ~ date, data = testdata, na.rm = FALSE,
report="all", impfn=mean.default)
## End(Not run)
##Comparing unstratified to stratified, just one-by-one wilcoxon
#rank sum tests and omnibus test of multivariate differences on
#rank scale.
xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
strata=data.frame(unstrat=factor('none'),
pt=factor(nuclearplants$pt)),
data=nuclearplants,
report=c("adj.means", "chisquare.test"),
post.alignment.transform=rank)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(RItools)
Loading required package: SparseM
Attaching package: 'SparseM'
The following object is masked from 'package:base':
backsolve
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/RItools/xBalance.Rd_%03d_medium.png", width=480, height=480)
> ### Name: xBalance
> ### Title: Standardized Differences for Stratified Comparisons
> ### Aliases: xBalance
> ### Keywords: design nonparametric
>
> ### ** Examples
>
> data(nuclearplants)
> ##No strata, default output
> xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
+ data=nuclearplants)
strata unstrat
stat std.diff z
vars
date -0.1147 -0.3052
t1 0.1063 0.2830
t2 1.0327 2.4674 *
cap 0.3401 0.8948
ne -0.1631 -0.4334
ct -0.3080 -0.8121
bw 0.0451 0.1202
cum.n -0.0976 -0.2598
>
> ##No strata, all output
> xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
+ data=nuclearplants,
+ report=c("all"))
strata unstrat
stat pr=0 pr=1 adj.diff adj.diff.null.sd std.diff z
vars
date 68.6182 68.5000 -0.1182 0.3872 -0.1147 -0.3052
t1 13.6364 14.0000 0.3636 1.2852 0.1063 0.2830
t2 59.3182 69.1000 9.7818 3.9644 1.0327 2.4674 *
cap 805.1818 869.8000 64.6182 72.2187 0.3401 0.8948
ne 0.2727 0.2000 -0.0727 0.1678 -0.1631 -0.4334
ct 0.4545 0.3000 -0.1545 0.1903 -0.3080 -0.8121
bw 0.1818 0.2000 0.0182 0.1512 0.0451 0.1202
cum.n 8.7273 8.1000 -0.6273 2.4140 -0.0976 -0.2598
---Overall Test---
chisquare df p.value
unstrat 11.5 8 0.177
---
Signif. codes: 0 '***' 0.001 '** ' 0.01 '* ' 0.05 '. ' 0.1 ' ' 1
>
> ##Stratified, all output
> xBalance(pr~.-cost-pt, strata=factor(nuclearplants$pt),
+ data=nuclearplants,
+ report=c("adj.means", "adj.mean.diffs",
+ "adj.mean.diffs.null.sd",
+ "chisquare.test", "std.diffs",
+ "z.scores", "p.values"))
strata strat
stat pr=0 pr=1 adj.diff adj.diff.null.sd std.diff z
vars
date 68.4924 68.5903 0.0979 0.3350 0.0950 0.2923
t1 13.3953 14.3488 0.9535 1.2042 0.2787 0.7918
t2 59.1802 68.5523 9.3721 4.0583 0.9894 2.3094 *
cap 806.4535 873.0233 66.5698 72.4498 0.3504 0.9188
ne 0.2442 0.2209 -0.0233 0.1609 -0.0522 -0.1445
ct 0.4419 0.3314 -0.1105 0.1895 -0.2201 -0.5828
bw 0.1977 0.1512 -0.0465 0.1506 -0.1154 -0.3088
cum.n 8.7209 7.9012 -0.8198 2.4130 -0.1275 -0.3397
---Overall Test---
chisquare df p.value
strat 10.8 8 0.215
---
Signif. codes: 0 '***' 0.001 '** ' 0.01 '* ' 0.05 '. ' 0.1 ' ' 1
>
> ##Comparing unstratified to stratified, just adjusted means and
> #omnibus test
> xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
+ strata=list(unstrat=NULL, pt=~pt),
+ data=nuclearplants,
+ report=c("adj.means", "chisquare.test"))
strata unstrat pt
stat pr=0 pr=1 pr=0 pr=1
vars
date 68.6182 68.5000 68.4924 68.5903
t1 13.6364 14.0000 13.3953 14.3488
t2 59.3182 69.1000 * 59.1802 68.5523 *
cap 805.1818 869.8000 806.4535 873.0233
ne 0.2727 0.2000 0.2442 0.2209
ct 0.4545 0.3000 0.4419 0.3314
bw 0.1818 0.2000 0.1977 0.1512
cum.n 8.7273 8.1000 8.7209 7.9012
---Overall Test---
chisquare df p.value
unstrat 11.5 8 0.177
pt 10.8 8 0.215
---
Signif. codes: 0 '***' 0.001 '** ' 0.01 '* ' 0.05 '. ' 0.1 ' ' 1
>
> ##Comparing unstratified to stratified, just adjusted means and
> #omnibus test
> xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
+ strata=data.frame(unstrat=factor('none'),
+ pt=factor(nuclearplants$pt)),
+ data=nuclearplants,
+ report=c("adj.means", "chisquare.test"))
strata unstrat pt
stat pr=0 pr=1 pr=0 pr=1
vars
date 68.6182 68.5000 68.4924 68.5903
t1 13.6364 14.0000 13.3953 14.3488
t2 59.3182 69.1000 * 59.1802 68.5523 *
cap 805.1818 869.8000 806.4535 873.0233
ne 0.2727 0.2000 0.2442 0.2209
ct 0.4545 0.3000 0.4419 0.3314
bw 0.1818 0.2000 0.1977 0.1512
cum.n 8.7273 8.1000 8.7209 7.9012
---Overall Test---
chisquare df p.value
unstrat 11.5 8 0.177
pt 10.8 8 0.215
---
Signif. codes: 0 '***' 0.001 '** ' 0.01 '* ' 0.05 '. ' 0.1 ' ' 1
>
> ##Missing data handling.
> testdata<-nuclearplants
> testdata$date[testdata$date<68]<-NA
>
> ##na.rm=FALSE by default
> xBalance(pr ~ date, data = testdata, report="all")
strata unstrat
stat pr=0 pr=1 adj.diff adj.diff.null.sd std.diff z
vars
date 68.9023 68.9760 0.0737 0.2876 0.0963 0.2564
date.NATRUE 0.2273 0.4000 0.1727 0.1742 0.3780 0.9914
---Overall Test---
chisquare df p.value
unstrat 1.15 2 0.563
---
Signif. codes: 0 '***' 0.001 '** ' 0.01 '* ' 0.05 '. ' 0.1 ' ' 1
> xBalance(pr ~ date, data = testdata, na.rm = TRUE,report="all")
strata unstrat
stat pr=0 pr=1 adj.diff adj.diff.null.sd std.diff z
vars
date 68.947 69.127 0.180 0.420 0.195 0.427
---Overall Test---
chisquare df p.value
unstrat 0.183 1 0.669
---
Signif. codes: 0 '***' 0.001 '** ' 0.01 '* ' 0.05 '. ' 0.1 ' ' 1
>
> ##To match versions of RItools 0.1-9 and older, impute means
> #rather than medians.
> ##Not run, impfn option is not implemented in the most recent version
> ## Not run:
> ##D xBalance(pr ~ date, data = testdata, na.rm = FALSE,
> ##D report="all", impfn=mean.default)
> ## End(Not run)
>
> ##Comparing unstratified to stratified, just one-by-one wilcoxon
> #rank sum tests and omnibus test of multivariate differences on
> #rank scale.
> xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
+ strata=data.frame(unstrat=factor('none'),
+ pt=factor(nuclearplants$pt)),
+ data=nuclearplants,
+ report=c("adj.means", "chisquare.test"),
+ post.alignment.transform=rank)
strata unstrat pt
stat pr=0 pr=1 pr=0 pr=1
vars
date 68.6182 68.5000 68.4924 68.5903
t1 13.6364 14.0000 13.3953 14.3488
t2 59.3182 69.1000 * 59.1802 68.5523 .
cap 805.1818 869.8000 806.4535 873.0233
ne 0.2727 0.2000 0.2442 0.2209
ct 0.4545 0.3000 0.4419 0.3314
bw 0.1818 0.2000 0.1977 0.1512
cum.n 8.7273 8.1000 8.7209 7.9012
---Overall Test---
chisquare df p.value
unstrat 10.29 8 0.245
pt 7.31 8 0.503
---
Signif. codes: 0 '***' 0.001 '** ' 0.01 '* ' 0.05 '. ' 0.1 ' ' 1
>
>
>
>
>
> dev.off()
null device
1
>