Fit coefficients paths for MCP- or SCAD-penalized
regression models over a grid of values for the regularization
parameter lambda. Fits linear and logistic regression models, with
option for an additional L2 penalty.
The design matrix, without an intercept. ncvreg
standardizes the data and includes an intercept by default.
y
The response vector.
family
Either "gaussian", "binomial", or "poisson", depending
on the response.
penalty
The penalty to be applied to the model. Either "MCP"
(the default), "SCAD", or "lasso".
gamma
The tuning parameter of the MCP/SCAD penalty (see
details). Default is 3 for MCP and 3.7 for SCAD.
alpha
Tuning parameter for the Mnet estimator which controls
the relative contributions from the MCP/SCAD penalty and the ridge,
or L2 penalty. alpha=1 is equivalent to MCP/SCAD penalty,
while alpha=0 would be equivalent to ridge regression.
However, alpha=0 is not supported; alpha may be
arbitrarily small, but not exactly 0.
lambda.min
The smallest value for lambda, as a fraction of
lambda.max. Default is .001 if the number of observations is larger
than the number of covariates and .05 otherwise.
nlambda
The number of lambda values. Default is 100.
lambda
A user-specified sequence of lambda values. By default,
a sequence of values of length nlambda is computed, equally
spaced on the log scale.
eps
Convergence threshhold. The algorithm iterates until the
relative change in any coefficient is less than eps. Default
is .001.
max.iter
Maximum number of iterations. Default is 1000.
convex
Calculate index for which objective function ceases to
be locally convex? Default is TRUE.
dfmax
Upper bound for the number of nonzero coefficients.
Default is no upper bound. However, for large data sets,
computational burden may be heavy for models with a large number of
nonzero coefficients.
penalty.factor
A multiplicative factor for the penalty applied
to each coefficient. If supplied, penalty.factor must be a
numeric vector of length equal to the number of columns of
X. The purpose of penalty.factor is to apply
differential penalization if some coefficients are thought to be
more likely than others to be in the model. In particular,
penalty.factor can be 0, in which case the coefficient is
always in the model without shrinkage.
warn
Return warning messages for failures to converge and model
saturation? Default is TRUE.
returnX
Return the standardized design matrix? Default is
FALSE.
...
Not used.
Details
The sequence of models indexed by the regularization parameter
lambda is fit using a coordinate descent algorithm. For
logistic regression models, some care is taken to avoid model
saturation; the algorithm may exit early in this setting. The
objective function is defined to be
RSS/(2*n) +
penalty
for "gaussian" and
-(1/n) loglik + penalty
for "binomial" or "poisson", where the likelihood is
from a traditional generalized linear model assuming the canonical
link (logit for "binomial"; log for "poisson").
This algorithm is stable, very efficient, and generally converges
quite rapidly to the solution. For GLMs, adaptive rescaling (see
reference) is used.
The convexity diagnostics rely on a fine covering of
(lambda.min,lambda.max); choosing a low value of nlambda may
produce unreliable results.
Value
An object with S3 class "ncvreg" containing:
beta
The fitted matrix of coefficients. The number of rows is
equal to the number of coefficients, and the number of columns is
equal to nlambda.
iter
A vector of length nlambda containing the number
of iterations until convergence at each value of lambda.
lambda
The sequence of regularization parameter values in the
path.
penalty
Same as above.
family
Same as above.
gamma
Same as above.
alpha
Same as above.
convex.min
The last index for which the objective function is
locally convex. The smallest value of lambda for which the
objective function is convex is therefore lambda[convex.min],
with corresponding coefficients beta[,convex.min].
loss
A vector containing either the residual sum of squares
("gaussian") or negative log-likelihood ("binomial"
and "poisson") of the fitted model at each value of
lambda.
penalty.factor
Same as above.
Author(s)
Patrick Breheny <patrick-breheny@uiow.edu>
References
Breheny, P. and Huang, J. (2011) Coordinate descent
algorithms for nonconvex penalized regression, with applications to
biological feature selection. Ann. Appl. Statist., 5: 232-253.
See Also
plot.ncvreg, cv.ncvreg
Examples
## Linear regression
data(prostate)
X <- as.matrix(prostate[,1:8])
y <- prostate$lpsa
par(mfrow=c(2,2))
fit <- ncvreg(X,y)
plot(fit,main=expression(paste(gamma,"=",3)))
fit <- ncvreg(X,y,gamma=10)
plot(fit,main=expression(paste(gamma,"=",10)))
fit <- ncvreg(X,y,gamma=1.5)
plot(fit,main=expression(paste(gamma,"=",1.5)))
fit <- ncvreg(X,y,penalty="SCAD")
plot(fit,main=expression(paste("SCAD, ",gamma,"=",3)))
par(mfrow=c(2,2))
fit <- ncvreg(X,y)
plot(fit,main=expression(paste(alpha,"=",1)))
fit <- ncvreg(X,y,alpha=0.9)
plot(fit,main=expression(paste(alpha,"=",0.9)))
fit <- ncvreg(X,y,alpha=0.5)
plot(fit,main=expression(paste(alpha,"=",0.5)))
fit <- ncvreg(X,y,alpha=0.1)
plot(fit,main=expression(paste(alpha,"=",0.1)))
par(mfrow=c(2,2))
fit <- ncvreg(X,y)
plot(fir(fit)) ## Independence approximation
plot(fir(fit), type="EF") ## Independence approximation
perm.fit <- perm.ncvreg(X,y)
plot(perm.fit)
plot(perm.fit, type="EF")
## Logistic regression
data(heart)
X <- as.matrix(heart[,1:9])
y <- heart$chd
par(mfrow=c(2,2))
fit <- ncvreg(X,y,family="binomial")
plot(fit,main=expression(paste(gamma,"=",3)))
fit <- ncvreg(X,y,family="binomial",gamma=10)
plot(fit,main=expression(paste(gamma,"=",10)))
fit <- ncvreg(X,y,family="binomial",gamma=1.5)
plot(fit,main=expression(paste(gamma,"=",1.5)))
fit <- ncvreg(X,y,family="binomial",penalty="SCAD")
plot(fit,main=expression(paste("SCAD, ",gamma,"=",3)))
par(mfrow=c(2,2))
fit <- ncvreg(X,y,family="binomial")
plot(fit,main=expression(paste(alpha,"=",1)))
fit <- ncvreg(X,y,family="binomial",alpha=0.9)
plot(fit,main=expression(paste(alpha,"=",0.9)))
fit <- ncvreg(X,y,family="binomial",alpha=0.5)
plot(fit,main=expression(paste(alpha,"=",0.5)))
fit <- ncvreg(X,y,family="binomial",alpha=0.1)
plot(fit,main=expression(paste(alpha,"=",0.1)))