R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Fit an MCP- or SCAD-penalized regression path

ncvreg

R Documentation

Fit an MCP- or SCAD-penalized regression path

Description

Fit coefficients paths for MCP- or SCAD-penalized regression models over a grid of values for the regularization parameter lambda. Fits linear and logistic regression models, with option for an additional L2 penalty.

Usage

ncvreg(X, y, family=c("gaussian", "binomial", "poisson"),
penalty=c("MCP", "SCAD", "lasso"), gamma=switch(penalty, SCAD=3.7, 3),
alpha=1, lambda.min=ifelse(n>p,.001,.05), nlambda=100, lambda, eps=.001,
max.iter=1000, convex=TRUE, dfmax=p+1, penalty.factor=rep(1, ncol(X)),
warn=TRUE, returnX=FALSE, ...)

Arguments

`X`	The design matrix, without an intercept. `ncvreg` standardizes the data and includes an intercept by default.
`y`	The response vector.
`family`	Either "gaussian", "binomial", or "poisson", depending on the response.
`penalty`	The penalty to be applied to the model. Either "MCP" (the default), "SCAD", or "lasso".
`gamma`	The tuning parameter of the MCP/SCAD penalty (see details). Default is 3 for MCP and 3.7 for SCAD.
`alpha`	Tuning parameter for the Mnet estimator which controls the relative contributions from the MCP/SCAD penalty and the ridge, or L2 penalty. `alpha=1` is equivalent to MCP/SCAD penalty, while `alpha=0` would be equivalent to ridge regression. However, `alpha=0` is not supported; `alpha` may be arbitrarily small, but not exactly 0.
`lambda.min`	The smallest value for lambda, as a fraction of lambda.max. Default is .001 if the number of observations is larger than the number of covariates and .05 otherwise.
`nlambda`	The number of lambda values. Default is 100.
`lambda`	A user-specified sequence of lambda values. By default, a sequence of values of length `nlambda` is computed, equally spaced on the log scale.
`eps`	Convergence threshhold. The algorithm iterates until the relative change in any coefficient is less than `eps`. Default is `.001`.
`max.iter`	Maximum number of iterations. Default is 1000.
`convex`	Calculate index for which objective function ceases to be locally convex? Default is TRUE.
`dfmax`	Upper bound for the number of nonzero coefficients. Default is no upper bound. However, for large data sets, computational burden may be heavy for models with a large number of nonzero coefficients.
`penalty.factor`	A multiplicative factor for the penalty applied to each coefficient. If supplied, `penalty.factor` must be a numeric vector of length equal to the number of columns of `X`. The purpose of `penalty.factor` is to apply differential penalization if some coefficients are thought to be more likely than others to be in the model. In particular, `penalty.factor` can be 0, in which case the coefficient is always in the model without shrinkage.
`warn`	Return warning messages for failures to converge and model saturation? Default is TRUE.
`returnX`	Return the standardized design matrix? Default is FALSE.
`...`	Not used.

Details

The sequence of models indexed by the regularization parameter lambda is fit using a coordinate descent algorithm. For logistic regression models, some care is taken to avoid model saturation; the algorithm may exit early in this setting. The objective function is defined to be

RSS/(2*n) + penalty

for "gaussian" and

-(1/n) loglik + penalty

for "binomial" or "poisson", where the likelihood is from a traditional generalized linear model assuming the canonical link (logit for "binomial"; log for "poisson").

This algorithm is stable, very efficient, and generally converges quite rapidly to the solution. For GLMs, adaptive rescaling (see reference) is used.

The convexity diagnostics rely on a fine covering of (lambda.min,lambda.max); choosing a low value of nlambda may produce unreliable results.

Value

An object with S3 class "ncvreg" containing:

`beta`	The fitted matrix of coefficients. The number of rows is equal to the number of coefficients, and the number of columns is equal to `nlambda`.
`iter`	A vector of length `nlambda` containing the number of iterations until convergence at each value of `lambda`.
`lambda`	The sequence of regularization parameter values in the path.
`penalty`	Same as above.
`family`	Same as above.
`gamma`	Same as above.
`alpha`	Same as above.
`convex.min`	The last index for which the objective function is locally convex. The smallest value of lambda for which the objective function is convex is therefore `lambda[convex.min]`, with corresponding coefficients `beta[,convex.min]`.
`loss`	A vector containing either the residual sum of squares (`"gaussian"`) or negative log-likelihood (`"binomial"` and `"poisson"`) of the fitted model at each value of `lambda`.
`penalty.factor`	Same as above.

Author(s)

Patrick Breheny <patrick-breheny@uiow.edu>

References

Breheny, P. and Huang, J. (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Statist., 5: 232-253.

Examples

## Linear regression
data(prostate)
X <- as.matrix(prostate[,1:8])
y <- prostate$lpsa

par(mfrow=c(2,2))
fit <- ncvreg(X,y)
plot(fit,main=expression(paste(gamma,"=",3)))
fit <- ncvreg(X,y,gamma=10)
plot(fit,main=expression(paste(gamma,"=",10)))
fit <- ncvreg(X,y,gamma=1.5)
plot(fit,main=expression(paste(gamma,"=",1.5)))
fit <- ncvreg(X,y,penalty="SCAD")
plot(fit,main=expression(paste("SCAD, ",gamma,"=",3)))

par(mfrow=c(2,2))
fit <- ncvreg(X,y)
plot(fit,main=expression(paste(alpha,"=",1)))
fit <- ncvreg(X,y,alpha=0.9)
plot(fit,main=expression(paste(alpha,"=",0.9)))
fit <- ncvreg(X,y,alpha=0.5)
plot(fit,main=expression(paste(alpha,"=",0.5)))
fit <- ncvreg(X,y,alpha=0.1)
plot(fit,main=expression(paste(alpha,"=",0.1)))

par(mfrow=c(2,2))
fit <- ncvreg(X,y)
plot(fir(fit))            ## Independence approximation
plot(fir(fit), type="EF") ## Independence approximation
perm.fit <- perm.ncvreg(X,y)
plot(perm.fit)
plot(perm.fit, type="EF")

## Logistic regression
data(heart)
X <- as.matrix(heart[,1:9])
y <- heart$chd

par(mfrow=c(2,2))
fit <- ncvreg(X,y,family="binomial")
plot(fit,main=expression(paste(gamma,"=",3)))
fit <- ncvreg(X,y,family="binomial",gamma=10)
plot(fit,main=expression(paste(gamma,"=",10)))
fit <- ncvreg(X,y,family="binomial",gamma=1.5)
plot(fit,main=expression(paste(gamma,"=",1.5)))
fit <- ncvreg(X,y,family="binomial",penalty="SCAD")
plot(fit,main=expression(paste("SCAD, ",gamma,"=",3)))

par(mfrow=c(2,2))
fit <- ncvreg(X,y,family="binomial")
plot(fit,main=expression(paste(alpha,"=",1)))
fit <- ncvreg(X,y,family="binomial",alpha=0.9)
plot(fit,main=expression(paste(alpha,"=",0.9)))
fit <- ncvreg(X,y,family="binomial",alpha=0.5)
plot(fit,main=expression(paste(alpha,"=",0.5)))
fit <- ncvreg(X,y,family="binomial",alpha=0.1)
plot(fit,main=expression(paste(alpha,"=",0.1)))