R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Multi-way standard error clustering

cluster.vcov

R Documentation

Multi-way standard error clustering

Description

Return a multi-way cluster-robust variance-covariance matrix

Usage

cluster.vcov(model, cluster, parallel = FALSE, use_white = NULL,
  df_correction = TRUE, leverage = FALSE, force_posdef = FALSE,
  stata_fe_model_rank = FALSE, debug = FALSE)

Arguments

`model`	The estimated model, usually an `lm` or `glm` class object
`cluster`	A `vector`, `matrix`, or `data.frame` of cluster variables, where each column is a separate variable. If the vector `1:nrow(data)` is used, the function effectively produces a regular heteroskedasticity-robust matrix. Alternatively, a `formula` specifying the cluster variables to be used (see Details).
`parallel`	Scalar or list. If a list, use the list as a list of connected processing cores/clusters. A scalar indicates no parallelization. See the parallel package.
`use_white`	Logical or `NULL`. See description below.
`df_correction`	Logical or `numeric`. `TRUE` computes degrees of freedom corrections, `FALSE` uses no corrections. A vector of length 2^D - 1 will directly set the degrees of freedom corrections.
`leverage`	Integer. EXPERIMENTAL Uses Mackinnon-White HC3-style leverage adjustments. Known to work in the non-clustering case, e.g., it reproduces HC3 if `df_correction = FALSE`. Set to 3 for HC3-style and 2 for HC2-style leverage adjustments.
`force_posdef`	Logical. Force the eigenvalues of the variance-covariance matrix to be positive.
`stata_fe_model_rank`	Logical. If `TRUE`, add 1 to model rank K to emulate Stata's fixed effect model rank for degrees of freedom adjustments.
`debug`	Logical. Print internal values useful for debugging to the console.

Details

This function implements multi-way clustering using the method suggested by Cameron, Gelbach, & Miller (2011), which involves clustering on 2^D - 1 dimensional combinations, e.g., if we're cluster on firm and year, then we compute for firm, year, and firm-year. Variance-covariance matrices with an odd number of cluster variables are added, and those with an even number are subtracted.

The cluster variable(s) are specified by passing the entire variable(s) to cluster (cbind()'ed as necessary). The cluster variables should be of the same number of rows as the original data set; observations omitted or excluded in the model estimation will be handled accordingly.

Alternatively, you can use a formula to specify which variables from the original data frame to use as cluster variables, e.g., ~ firmid + year.

Ma (2014) suggests using the White (1980) variance-covariance matrix as the final, subtracted matrix when the union of the clustering dimensions U results in a single observation per group in U; e.g., if clustering by firm and year, there is only one observation per firm-year, we subtract the White (1980) HC0 variance-covariance from the sum of the firm and year vcov matrices. This is detected automatically (if use_white = NULL), but you can force this one way or the other by setting use_white = TRUE or FALSE.

Some authors suggest avoiding degrees of freedom corrections with multi-way clustering. By default, the function uses corrections identical to Petersen (2009) corrections. Passing a numerical vector to df_correction (of length 2^D - 1) will override the default, and setting df_correction = FALSE will use no correction.

Cameron, Gelbach, & Miller (2011) futher suggest a method for forcing the variance-covariance matrix to be positive semidefinite by correcting the eigenvalues of the matrix. To use this method, set force_posdef = TRUE. Do not use this method unless absolutely necessary! The eigen/spectral decomposition used is not ideal numerically, and may introduce small errors or deviations. If force_posdef = TRUE, the correction is applied regardless of whether it's necessary.

The defaults deliberately match the Stata default output for one-way and Mitchell Petersen's two-way Stata code results. To match the SAS default output (obtained using the class & repeated subject statements, see Arellano, 1987) simply turn off the degrees of freedom correction.

Parallelization is available via the parallel package by passing the "cluster" list (usually called cl) to the parallel argument.

Value

a K x K variance-covariance matrix of type 'matrix'

Author(s)

Nathaniel Graham npgraham1@gmail.com

References

Arellano, M. (1987). PRACTITIONERS' CORNER: Computing Robust Standard Errors for Within-groups Estimators. Oxford Bulletin of Economics and Statistics, 49(4), 431–434. Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("10.1111/j.1468-0084.1987.mp49004006.x")}

Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011). Robust inference with multiway clustering. Journal of Business & Economic Statistics, 29(2). Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("10.1198/jbes.2010.07136")}

Ma, Mark (Shuai), Are We Really Doing What We Think We Are Doing? A Note on Finite-Sample Estimates of Two-Way Cluster-Robust Standard Errors (April 9, 2014).

MacKinnon, J. G., & White, H. (1985). Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics, 29(3), 305–325. Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("10.1016/0304-4076(85)90158-7")}

Petersen, M. A. (2009). Estimating standard errors in finance panel data sets: Comparing approaches. Review of Financial Studies, 22(1), 435–480. Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("10.1093/rfs/hhn053")}

White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica: Journal of the Econometric Society, 817–838. Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("10.2307/1912934")}

Examples

library(lmtest)
data(petersen)
m1 <- lm(y ~ x, data = petersen)

# Cluster by firm
vcov_firm <- cluster.vcov(m1, petersen$firmid)
coeftest(m1, vcov_firm)

# Cluster by year
vcov_year <- cluster.vcov(m1, petersen$year)
coeftest(m1, vcov_year)

# Cluster by year using a formula
vcov_year_formula <- cluster.vcov(m1, ~ year)
coeftest(m1, vcov_year_formula)

# Double cluster by firm and year
vcov_both <- cluster.vcov(m1, cbind(petersen$firmid, petersen$year))
coeftest(m1, vcov_both)

# Double cluster by firm and year using a formula
vcov_both_formula <- cluster.vcov(m1, ~ firmid + year)
coeftest(m1, vcov_both_formula)

# Replicate Mahmood Arai's double cluster by firm and year
vcov_both <- cluster.vcov(m1, cbind(petersen$firmid, petersen$year), use_white = FALSE)
coeftest(m1, vcov_both)

# For comparison, produce White HC0 VCOV the hard way
vcov_hc0 <- cluster.vcov(m1, 1:nrow(petersen), df_correction = FALSE)
coeftest(m1, vcov_hc0)

# Produce White HC1 VCOV the hard way
vcov_hc1 <- cluster.vcov(m1, 1:nrow(petersen), df_correction = TRUE)
coeftest(m1, vcov_hc1)

# Produce White HC2 VCOV the hard way
vcov_hc2 <- cluster.vcov(m1, 1:nrow(petersen), df_correction = FALSE, leverage = 2)
coeftest(m1, vcov_hc2)

# Produce White HC3 VCOV the hard way
vcov_hc3 <- cluster.vcov(m1, 1:nrow(petersen), df_correction = FALSE, leverage = 3)
coeftest(m1, vcov_hc3)

# Go multicore using the parallel package
## Not run: 
library(parallel)
cl <- makeCluster(4)
vcov_both <- cluster.vcov(m1, cbind(petersen$firmid, petersen$year), parallel = cl)
stopCluster(cl)
coeftest(m1, vcov_both)

## End(Not run)