R: Instrumental-Variable Regression
Instrumental-Variable Regression


Fit instrumental-variable regression by two-stage least squares. This is equivalent to direct instrumental-variables estimation when the number of instruments is equal to the number of predictors.


ivreg(formula, instruments, data, subset, na.action, weights, offset,
  contrasts = NULL, model = TRUE, y = TRUE, x = FALSE, ...)


formula, instruments

formula specification(s) of the regression relationship and the instruments. Either instruments is missing and formula has three parts as in y ~ x1 + x2 | z1 + z2 + z3 (recommended) or formula is y ~ x1 + x2 and instruments is a one-sided formula ~ z1 + z2 + z3 (only for backward compatibility).


an optional data frame containing the variables in the model. By default the variables are taken from the environment of the formula.


an optional vector specifying a subset of observations to be used in fitting the model.


a function that indicates what should happen when the data contain NAs. The default is set by the na.action option.


an optional vector of weights to be used in the fitting process.


an optional offset that can be used to specify an a priori known component to be included during fitting.


an optional list. See the contrasts.arg of model.matrix.default.

model, x, y

logicals. If TRUE the corresponding components of the fit (the model frame, the model matrices , the response) are returned.


further arguments passed to


ivreg is the high-level interface to the work-horse function, a set of standard methods (including print, summary, vcov, anova, hatvalues, predict, terms, model.matrix, bread, estfun) is available and described on summary.ivreg.

Regressors and instruments for ivreg are most easily specified in a formula with two parts on the right-hand side, e.g., y ~ x1 + x2 | z1 + z2 + z3, where x1 and x2 are the regressors and z1, z2, and z3 are the instruments. Note that exogenous regressors have to be included as instruments for themselves. For example, if there is one exogenous regressor ex and one endogenous regressor en with instrument in, the appropriate formula would be y ~ ex + en | ex + in. Equivalently, this can be specified as y ~ ex + en | . - en + in, i.e., by providing an update formula with a . in the second part of the formula. The latter is typically more convenient, if there is a large number of exogenous regressors.


ivreg returns an object of class "ivreg", with the following components:


parameter estimates.


a vector of residuals.


a vector of predicted means.


either the vector of weights used (if any) or NULL (if none).


either the offset used (if any) or NULL (if none).


number of observations.


number of observations with non-zero weights.


the numeric rank of the fitted linear model.


residual degrees of freedom for fitted model.


unscaled covariance matrix for the coefficients.


residual standard error.


the original function call.


the model formula.


a list with elements "regressors" and "instruments" containing the terms objects for the respective components.


levels of the categorical regressors.


the contrasts used for categorical regressors.


the full model frame (if model = TRUE).


the response vector (if y = TRUE).


a list with elements "regressors", "instruments", "projected", containing the model matrices from the respective components (if x = TRUE). "projected" is the matrix of regressors projected on the image of the instruments.


> ## data
> data("CigarettesSW", package = "AER")
> CigarettesSW$rprice <- with(CigarettesSW, price/cpi)
> CigarettesSW$rincome <- with(CigarettesSW, income/population/cpi)
> CigarettesSW$tdiff <- with(CigarettesSW, (taxs - tax)/cpi)
> ## model 
> fm <- ivreg(log(packs) ~ log(rprice) + log(rincome) | log(rincome) + tdiff + I(tax/cpi),
+   data = CigarettesSW, subset = year == "1995")
> summary(fm)

ivreg(formula = log(packs) ~ log(rprice) + log(rincome) | log(rincome) + 
    tdiff + I(tax/cpi), data = CigarettesSW, subset = year == 

       Min         1Q     Median         3Q        Max 
-0.6006931 -0.0862222 -0.0009999  0.1164699  0.3734227 

             Estimate Std. Error t value Pr(>|t|)    
(Intercept)    9.8950     1.0586   9.348 4.12e-12 ***
log(rprice)   -1.2774     0.2632  -4.853 1.50e-05 ***
log(rincome)   0.2804     0.2386   1.175    0.246    
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1879 on 45 degrees of freedom
Multiple R-Squared: 0.4294,	Adjusted R-squared: 0.4041 
Wald test: 13.28 on 2 and 45 DF,  p-value: 2.931e-05 

> summary(fm, vcov = sandwich, df = Inf, diagnostics = TRUE)

ivreg(formula = log(packs) ~ log(rprice) + log(rincome) | log(rincome) + 
    tdiff + I(tax/cpi), data = CigarettesSW, subset = year == 

       Min         1Q     Median         3Q        Max 
-0.6006931 -0.0862222 -0.0009999  0.1164699  0.3734227 

             Estimate Std. Error z value Pr(>|z|)    
(Intercept)    9.8950     0.9288  10.654  < 2e-16 ***
log(rprice)   -1.2774     0.2417  -5.286 1.25e-07 ***
log(rincome)   0.2804     0.2458   1.141    0.254    

Diagnostic tests:
                 df1 df2 statistic p-value    
Weak instruments   2  44   228.738  <2e-16 ***
Wu-Hausman         1  44     3.823  0.0569 .  
Sargan             1  NA     0.333  0.5641    
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1879 on Inf degrees of freedom
Multiple R-Squared: 0.4294,	Adjusted R-squared: 0.4041 
Wald test: 34.51 on 2 DF,  p-value: 3.214e-08 

> ## ANOVA
> fm2 <- ivreg(log(packs) ~ log(rprice) | tdiff, data = CigarettesSW, subset = year == "1995")
> anova(fm, fm2)
Analysis of Variance Table

Model 1: log(packs) ~ log(rprice) + log(rincome) | log(rincome) + tdiff + 
Model 2: log(packs) ~ log(rprice) | tdiff
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     45 1.5880                           
2     46 1.6668 -1 -0.078748 1.3815  0.246
