The estimated model, usually an lm or glm class object
cluster
A vector, matrix, or data.frame of cluster variables,
where each column is a separate variable. If the vector 1:nrow(data)
is used, the function effectively produces a regular
heteroskedasticity-robust matrix. Alternatively, a formula specifying the
cluster variables to be used (see Details).
parallel
Scalar or list. If a list, use the list as a list
of connected processing cores/clusters. A scalar indicates no
parallelization. See the parallel package.
use_white
Logical or NULL. See description below.
df_correction
Logical or numeric. TRUE computes degrees
of freedom corrections, FALSE uses no corrections. A vector of length
2^D - 1 will directly set the degrees of freedom corrections.
leverage
Integer. EXPERIMENTAL Uses Mackinnon-White HC3-style leverage
adjustments. Known to work in the non-clustering case,
e.g., it reproduces HC3 if df_correction = FALSE. Set to 3 for HC3-style
and 2 for HC2-style leverage adjustments.
force_posdef
Logical. Force the eigenvalues of the variance-covariance
matrix to be positive.
stata_fe_model_rank
Logical. If TRUE, add 1 to model rank K
to emulate Stata's fixed effect model rank for degrees of freedom adjustments.
debug
Logical. Print internal values useful for debugging to
the console.
Details
This function implements multi-way clustering using the method
suggested by Cameron, Gelbach, & Miller (2011),
which involves clustering on 2^D - 1 dimensional combinations, e.g.,
if we're cluster on firm and year, then we compute for firm,
year, and firm-year. Variance-covariance matrices with an odd
number of cluster variables are added, and those with an even
number are subtracted.
The cluster variable(s) are specified by passing the entire variable(s)
to cluster (cbind()'ed as necessary). The cluster variables should
be of the same number of rows as the original data set; observations
omitted or excluded in the model estimation will be handled accordingly.
Alternatively, you can use a formula to specify which variables from the
original data frame to use as cluster variables, e.g., ~ firmid + year.
Ma (2014) suggests using the White (1980)
variance-covariance matrix as the final, subtracted matrix when the union
of the clustering dimensions U results in a single observation per group in U;
e.g., if clustering by firm and year, there is only one observation
per firm-year, we subtract the White (1980) HC0 variance-covariance
from the sum of the firm and year vcov matrices. This is detected
automatically (if use_white = NULL), but you can force this one way
or the other by setting use_white = TRUE or FALSE.
Some authors suggest avoiding degrees of freedom corrections with
multi-way clustering. By default, the function uses corrections
identical to Petersen (2009) corrections. Passing a numerical
vector to df_correction (of length 2^D - 1) will override
the default, and setting df_correction = FALSE will use no correction.
Cameron, Gelbach, & Miller (2011)
futher suggest a method for forcing
the variance-covariance matrix to be positive semidefinite by correcting
the eigenvalues of the matrix. To use this method, set force_posdef = TRUE.
Do not use this method unless absolutely necessary! The eigen/spectral
decomposition used is not ideal numerically, and may introduce small
errors or deviations. If force_posdef = TRUE, the correction is applied
regardless of whether it's necessary.
The defaults deliberately match the Stata default output for one-way and
Mitchell Petersen's two-way Stata code results. To match the
SAS default output (obtained using the class & repeated subject
statements, see Arellano, 1987)
simply turn off the degrees of freedom correction.
Parallelization is available via the parallel package by passing
the "cluster" list (usually called cl) to the parallel argument.
Value
a K x K variance-covariance matrix of type 'matrix'
Arellano, M. (1987). PRACTITIONERS' CORNER:
Computing Robust Standard Errors for Within-groups Estimators.
Oxford Bulletin of Economics and Statistics, 49(4), 431–434.
Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("10.1111/j.1468-0084.1987.mp49004006.x")}
Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011).
Robust inference with multiway clustering. Journal of Business & Economic Statistics, 29(2).
Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("10.1198/jbes.2010.07136")}
Ma, Mark (Shuai), Are We Really Doing What We Think We Are Doing? A Note on Finite-Sample
Estimates of Two-Way Cluster-Robust Standard Errors (April 9, 2014).
MacKinnon, J. G., & White, H. (1985).
Some heteroskedasticity-consistent covariance matrix estimators with improved finite
sample properties. Journal of Econometrics, 29(3), 305–325.
Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("10.1016/0304-4076(85)90158-7")}
Petersen, M. A. (2009). Estimating standard errors
in finance panel data sets: Comparing approaches. Review of Financial Studies, 22(1), 435–480.
Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("10.1093/rfs/hhn053")}
White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct
test for heteroskedasticity. Econometrica: Journal of the Econometric Society, 817–838.
Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("10.2307/1912934")}
See Also
The coeftest and waldtest functions
from lmtest provide hypothesis testing, sandwich provides other
variance-covariance matrices such as vcovHC and vcovHAC,
and the felm function from lfe also implements multi-way standard
error clustering. The cluster.boot function provides clustering using the bootstrap.
Examples
library(lmtest)
data(petersen)
m1 <- lm(y ~ x, data = petersen)
# Cluster by firm
vcov_firm <- cluster.vcov(m1, petersen$firmid)
coeftest(m1, vcov_firm)
# Cluster by year
vcov_year <- cluster.vcov(m1, petersen$year)
coeftest(m1, vcov_year)
# Cluster by year using a formula
vcov_year_formula <- cluster.vcov(m1, ~ year)
coeftest(m1, vcov_year_formula)
# Double cluster by firm and year
vcov_both <- cluster.vcov(m1, cbind(petersen$firmid, petersen$year))
coeftest(m1, vcov_both)
# Double cluster by firm and year using a formula
vcov_both_formula <- cluster.vcov(m1, ~ firmid + year)
coeftest(m1, vcov_both_formula)
# Replicate Mahmood Arai's double cluster by firm and year
vcov_both <- cluster.vcov(m1, cbind(petersen$firmid, petersen$year), use_white = FALSE)
coeftest(m1, vcov_both)
# For comparison, produce White HC0 VCOV the hard way
vcov_hc0 <- cluster.vcov(m1, 1:nrow(petersen), df_correction = FALSE)
coeftest(m1, vcov_hc0)
# Produce White HC1 VCOV the hard way
vcov_hc1 <- cluster.vcov(m1, 1:nrow(petersen), df_correction = TRUE)
coeftest(m1, vcov_hc1)
# Produce White HC2 VCOV the hard way
vcov_hc2 <- cluster.vcov(m1, 1:nrow(petersen), df_correction = FALSE, leverage = 2)
coeftest(m1, vcov_hc2)
# Produce White HC3 VCOV the hard way
vcov_hc3 <- cluster.vcov(m1, 1:nrow(petersen), df_correction = FALSE, leverage = 3)
coeftest(m1, vcov_hc3)
# Go multicore using the parallel package
## Not run:
library(parallel)
cl <- makeCluster(4)
vcov_both <- cluster.vcov(m1, cbind(petersen$firmid, petersen$year), parallel = cl)
stopCluster(cl)
coeftest(m1, vcov_both)
## End(Not run)