npregiv computes nonparametric estimation of an instrumental
regression function phi defined by conditional moment
restrictions stemming from a structural econometric model: E [Y - phi (Z,X) | W ] = 0, and involving
endogenous variables Y and Z and exogenous variables
X and instruments W. The function phi is the
solution of an ill-posed inverse problem.
When method="Tikhonov", npregiv uses the approach of
Darolles, Fan, Florens and Renault (2011) modified for local
polynomial kernel regression of any order (Darolles et al use local
constant kernel weighting which corresponds to setting p=0; see
below for details). When method="Landweber-Fridman",
npregiv uses the approach of Horowitz (2011) again using local
polynomial kernel regression (Horowitz uses B-spline weighting).
a one (1) dimensional numeric or integer vector of dependent data, each
element i corresponding to each observation (row) i of
z.
z
a p-variate data frame of endogenous regressors. The data
types may be continuous, discrete (unordered and ordered factors),
or some combination thereof.
w
a q-variate data frame of instruments. The data types may be
continuous, discrete (unordered and ordered factors), or some
combination thereof.
x
an r-variate data frame of exogenous regressors. The data
types may be continuous, discrete (unordered and ordered factors),
or some combination thereof.
zeval
a p-variate data frame of endogenous regressors on which the
regression will be estimated (evaluation data). By default, evaluation
takes place on the data provided by z.
weval
a q-variate data frame of instruments on which the regression
will be estimated (evaluation data). By default, evaluation
takes place on the data provided by w.
xeval
an r-variate data frame of exogenous regressors on which the
regression will be estimated (evaluation data). By default,
evaluation takes place on the data provided by x.
p
the order of the local polynomial regression (defaults to
p=1, i.e. local linear).
nmulti
integer number of times to restart the process of finding extrema of
the cross-validation function from different (random) initial
points.
random.seed
an integer used to seed R's random number generator. This ensures
replicability of the numerical search. Defaults to 42.
optim.method
method used by optim for minimization of
the objective function. See ?optim for references. Defaults
to "Nelder-Mead".
the default method is an implementation of that of Nelder and Mead
(1965), that uses only function values and is robust but relatively
slow. It will work reasonably well for non-differentiable
functions.
method "BFGS" is a quasi-Newton method (also known as a
variable metric algorithm), specifically that published
simultaneously in 1970 by Broyden, Fletcher, Goldfarb and Shanno.
This uses function values and gradients to build up a picture of the
surface to be optimized.
method "CG" is a conjugate gradients method based
on that by Fletcher and Reeves (1964) (but with the option of
Polak-Ribiere or Beale-Sorenson updates). Conjugate gradient
methods will generally be more fragile than the BFGS method, but as
they do not store a matrix they may be successful in much larger
optimization problems.
optim.maxattempts
maximum number of attempts taken trying to achieve successful
convergence in optim. Defaults to 100.
optim.abstol
the absolute convergence tolerance used by optim. Only useful
for non-negative functions, as a tolerance for reaching
zero. Defaults to .Machine$double.eps.
optim.reltol
relative convergence tolerance used by optim. The algorithm
stops if it is unable to reduce the value by a factor of 'reltol *
(abs(val) + reltol)' at a step. Defaults to
sqrt(.Machine$double.eps), typically about 1e-8.
optim.maxit
maximum number of iterations used by optim. Defaults
to 500.
alpha
a numeric scalar that, if supplied, is used rather than numerically
solving for alpha, when using method="Tikhonov".
alpha.min
minimum of search range for alpha, the Tikhonov
regularization parameter, when using method="Tikhonov".
alpha.max
maximum of search range for alpha, the Tikhonov
regularization parameter, when using method="Tikhonov".
alpha.tol
the search tolerance for optimize when solving for
alpha, the Tikhonov regularization parameter,
when using method="Tikhonov".
iterate.max
an integer indicating the maximum number of iterations permitted
before termination occurs when using method="Landweber-Fridman".
iterate.diff.tol
the search tolerance for the difference in the stopping rule from
iteration to iteration when using method="Landweber-Fridman"
(disable by setting to zero).
constant
the constant to use when using method="Landweber-Fridman".
method
the regularization method employed (defaults to
"Landweber-Fridman", see Horowitz (2011); see Darolles,
Fan, Florens and Renault (2011) for details for
"Tikhonov").
penalize.iteration
a logical value indicating whether to
penalize the norm by the number of iterations or not (default
TRUE)
smooth.residuals
a logical value (defaults to TRUE) indicating whether to
optimize bandwidths for the regression of y-phi(z)
on w or for the regression of phi(z) on
w during iteration
start.from
a character string indicating whether to start from
E(Y|z) (default, "Eyz") or from E(E(Y|z)|z) (this can
be overridden by providing starting.values below)
starting.values
a value indicating whether to commence
Landweber-Fridman assuming
phi[-1]=starting.values (proper
Landweber-Fridman) or instead begin from E(y|z) (defaults to
NULL, see details below)
stop.on.increase
a logical value (defaults to TRUE) indicating whether to halt
iteration if the stopping criterion (see below) increases over the
course of one iteration (i.e. it may be above the iteration tolerance
but increased)
...
additional arguments supplied to npksum.
Details
Tikhonov regularization requires computation of weight matrices of
dimension n x n which can be computationally costly
in terms of memory requirements and may be unsuitable for large
datasets. Landweber-Fridman will be preferred in such settings as it
does not require construction and storage of these weight matrices
while it also avoids the need for numerical optimization methods to
determine alpha.
method="Landweber-Fridman" uses an optimal stopping rule based
upon ||E(y|w)-E(phi(z,x)|w)||^2 . However, if insufficient training is
conducted the estimates can be overly noisy. To best guard against
this eventuality set nmulti to a larger number than the default
nmulti=0 for npreg.
When using method="Landweber-Fridman", iteration will terminate
when either the change in the value of
||(E(y|w)-E(phi(z,x)|w))/E(y|w)||^2 from iteration to iteration is
less than iterate.diff.tol or we hit iterate.max or
||(E(y|w)-E(phi(z,x)|w))/E(y|w)||^2 stops falling in value and
starts rising.
Value
npregiv returns a list with components phi and either
alpha when method="Tikhonov" or num.iterations,
norm.stop and convergence when
method="Landweber-Fridman".
Note
This function should be considered to be in ‘beta test’ status until further notice.
Carrasco, M. and J.P. Florens and E. Renault (2007), “Linear
Inverse Problems in Structural Econometrics Estimation Based on
Spectral Decomposition and Regularization,” In: James J. Heckman and
Edward E. Leamer, Editor(s), Handbook of Econometrics, Elsevier, 2007,
Volume 6, Part 2, Chapter 77, Pages 5633-5751
Darolles, S. and Y. Fan and J.P. Florens and E. Renault (2011),
“Nonparametric instrumental regression,” Econometrica, 79,
1541-1565.
Feve, F. and J.P. Florens (2010), “The practice of
non-parametric estimation by solving inverse problems: the example of
transformation models,” Econometrics Journal, 13, S1-S27.
Florens, J.P. and J.S. Racine (2012), “Nonparametric
instrumental derivatives,” Working Paper.
Fridman, V. M. (1956), “A method of successive approximations
for Fredholm integral equations of the first kind,” Uspeskhi,
Math. Nauk., 11, 233-334, in Russian.
Landweber, L. (1951), “An iterative formula for Fredholm
integral equations of the first kind,” American Journal of
Mathematics, 73, 615-24.
Li, Q. and J.S. Racine (2007), Nonparametric Econometrics:
Theory and Practice, Princeton University Press.
Li, Q. and J.S. Racine (2004), “Cross-validated Local Linear
Nonparametric Regression,” Statistica Sinica, 14, 485-512.
See Also
npregivderiv,npreg
Examples
## Not run:
## This illustration was made possible by Samuele Centorrino
## <samuele.centorrino@univ-tlse1.fr>
set.seed(42)
n <- 1500
## The DGP is as follows:
## 1) y = phi(z) + u
## 2) E(u|z) != 0 (endogeneity present)
## 3) Suppose there exists an instrument w such that z = f(w) + v and
## E(u|w) = 0
## 4) We generate v, w, and generate u such that u and z are
## correlated. To achieve this we express u as a function of v (i.e. u =
## gamma v + eps)
v <- rnorm(n,mean=0,sd=0.27)
eps <- rnorm(n,mean=0,sd=0.05)
u <- -0.5*v + eps
w <- rnorm(n,mean=0,sd=1)
## In Darolles et al (2011) there exist two DGPs. The first is
## phi(z)=z^2 and the second is phi(z)=exp(-abs(z)) (which is
## discontinuous and has a kink at zero).
fun1 <- function(z) { z^2 }
fun2 <- function(z) { exp(-abs(z)) }
z <- 0.2*w + v
## Generate two y vectors for each function.
y1 <- fun1(z) + u
y2 <- fun2(z) + u
## You set y to be either y1 or y2 (ditto for phi) depending on which
## DGP you are considering:
y <- y1
phi <- fun1
## Sort on z (for plotting)
ivdata <- data.frame(y,z,w)
ivdata <- ivdata[order(ivdata$z),]
rm(y,z,w)
attach(ivdata)
model.iv <- npregiv(y=y,z=z,w=w)
phi.iv <- model.iv$phi
## Now the non-iv local linear estimator of E(y|z)
ll.mean <- fitted(npreg(y~z,regtype="ll"))
## For the plots, restrict focal attention to the bulk of the data
## (i.e. for the plotting area trim out 1/4 of one percent from each
## tail of y and z)
trim <- 0.0025
curve(phi,min(z),max(z),
xlim=quantile(z,c(trim,1-trim)),
ylim=quantile(y,c(trim,1-trim)),
ylab="Y",
xlab="Z",
main="Nonparametric Instrumental Kernel Regression",
lwd=2,lty=1)
points(z,y,type="p",cex=.25,col="grey")
lines(z,phi.iv,col="blue",lwd=2,lty=2)
lines(z,ll.mean,col="red",lwd=2,lty=4)
legend(quantile(z,trim),quantile(y,1-trim),
c(expression(paste(varphi(z))),
expression(paste("Nonparametric ",hat(varphi)(z))),
"Nonparametric E(y|z)"),
lty=c(1,2,4),
col=c("black","blue","red"),
lwd=c(2,2,2))
## End(Not run)