R: Compute conditional F statistic for weak instruments in an...
condfstat
R Documentation
Compute conditional F statistic for weak instruments in an IV-estimation
with multiple endogenous variables
Description
When using multiple instruments for multiple endogenous variables, the
ordinary individual t-tests for the instruments in the first stage do not
always reveal a weak set of instruments. Conditional F statistics can be
used for such testing.
Usage
condfstat(object, type = "default", quantiles = 0, bN = 100L)
Arguments
object
object of class "felm", a result of a call to
felm.
type
character. Error structure. Passed to waldtest. If
NULL, both iid and robust Fs are returned.
quantiles
numeric. Quantiles for bootstrap.
bN
integer. Number of bootstrap samples.
Details
IV coefficient estimates are not normally distributed, in particular they do
not have the right expectation. They follow a quite complicated
distribution which is fairly close to normal if the instruments are good.
The conditional F-statistic is a measure of how good the instruments are.
If the F is large, the instruments are good, and any bias due to the
instruments is small compared to the estimated standard errors, and also
small relative to the bias in OLS. See Sanderson and Windmeijer
(2014) and Stock and Yogo (2004). If F is small, the bias can be
large compared to the standard error.
If any(quantiles > 0.0), a bootstrap with bN samples will be
performed to estimate quantiles of the endogenous parameters which includes
the variance both from the 1st and 2nd stage. The result is returned in an
array attribute quantiles of the value returned by condfstat.
The argument quantiles can be a vector to estimate more than one
quantile at once. If quantiles=NULL, the bootstrapped estimates
themselves are returned. The bootstrap is normally much faster than running
felm over and over again. This is so because all exogenous variables
are projected out of the equations before doing the bootstrap.
Value
A p x k matrix, where k is the number of endogenous variables. Each
row are the conditional F statistics on a residual equation as described in
Sanderson and Windmeijer (2014), for a certain error structure. The
default is to use iid, or cluster if a cluster was specified to
felm. The third choice is 'robust', for heteroskedastic
errors. If type=NULL, iid and robust Fs are returned, and cluster, if
that was specified to felm.
Note that for these F statistics it is not the p-value that matters, it is
the F statistic itself which (coincidentally) pops up in the denominator for
the asymptotic bias of the IV estimates, and thus a large F is beneficial.
Note
Please note that condfstat does not work with the old syntax
for IV in felm(...,iv=). The new multipart syntax must be
used.
Stock, J.H. and M. Yogo (2004) Testing for weak instruments in linear
IV regression, http://ssrn.com/abstract=1734933 in
Identification and inference for econometric models: Essays in honor
of Thomas Rothenberg, 2005.
Examples
z1 <- rnorm(4000)
z2 <- rnorm(length(z1))
u <- rnorm(length(z1))
# make x1, x2 correlated with errors u
x1 <- z1 + z2 + 0.2*u + rnorm(length(z1))
x2 <- z1 + 0.94*z2 - 0.3*u + rnorm(length(z1))
y <- x1 + x2 + u
est <- felm(y ~ 1 | 0 | (x1 | x2 ~ z1 + z2))
summary(est)
## Not run:
summary(est$stage1, lhs='x1')
summary(est$stage1, lhs='x2')
## End(Not run)
# the joint significance of the instruments in both the first stages are ok:
t(sapply(est$stage1$lhs, function(lh) waldtest(est$stage1, ~z1|z2, lhs=lh)))
# everything above looks fine, t-tests for instruments,
# as well as F-tests for excluded instruments in the 1st stages.
# The conditional F-test reveals that the instruments are jointly weak
# (it's close to being only one instrument, z1+z2, for both x1 and x2)
condfstat(est, quantiles=c(0.05, 0.95))