R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Cross-validation for ncvreg

cv.ncvreg

R Documentation

Cross-validation for ncvreg

Description

Performs k-fold cross validation for MCP- or SCAD-penalized regression models over a grid of values for the regularization parameter lambda.

Usage

cv.ncvreg(X, y, ..., cluster, nfolds=10, seed, cv.ind, returnY=FALSE,
trace=FALSE)

Arguments

`X`	The design matrix, without an intercept, as in `ncvreg`.
`y`	The response vector, as in `ncvreg`.
`...`	Additional arguments to `ncvreg`.
`cluster`	`cv.ncvreg` can be run in parallel across a cluster using the `parallel` package. The cluster must be set up in advance using the `makeCluster` function from that pacakge. The cluster must then be passed to `cv.ncvreg` (see example).
`nfolds`	The number of cross-validation folds. Default is 10.
`cv.ind`	Which fold each observation belongs to. By default the observations are randomly assigned by `cv.ncvreg`.
`seed`	You may set the seed of the random number generator in order to obtain reproducible results.
`returnY`	Should `cv.ncvreg` return the fitted values from the cross-validation folds? Default is FALSE; if TRUE, this will return a matrix in which the element for row i, column j is the fitted value for observation i from the fold in which observation i was excluded from the fit, at the jth value of lambda.
`trace`	If set to TRUE, cv.ncvreg will inform the user of its progress by announcing the beginning of each CV fold. Default is FALSE.

Details

The function calls ncvreg nfolds times, each time leaving out 1/nfolds of the data. The cross-validation error is based on the residual sum of squares when family="gaussian" and the binomial deviance when family="binomial" or family="poisson".

For family="binomial" models, the cross-validation fold assignments are balanced across the 0/1 outcomes, so that each fold has the same proportion of 0/1 outcomes (or as close to the same proportion as it is possible to achieve if cases do not divide evenly).

Value

An object with S3 class "cv.ncvreg" containing:

`cve`	The error for each value of `lambda`, averaged across the cross-validation folds.
`cvse`	The estimated standard error associated with each value of for `cve`.
`lambda`	The sequence of regularization parameter values along which the cross-validation error was calculated.
`fit`	The fitted `ncvreg` object for the whole data.
`min`	The index of `lambda` corresponding to `lambda.min`.
`lambda.min`	The value of `lambda` with the minimum cross-validation error.
`null.dev`	The deviance for the intercept-only model.
`Bias`	The estimated bias of the minimum cross-validation error, as in Tibshirani RJ and Tibshirani R (2009), "A Bias Correction for the Minimum Error Rate in Cross-Validation", Ann. Appl. Stat. 3:822-829.
`pe`	If `family="binomial"`, the cross-validation prediction error for each value of `lambda`.
`Y`	If `returnY=TRUE`, the matrix of cross-validated fitted values (see above).

Author(s)

Patrick Breheny <patrick-breheny@uiowa.edu>
Grant Brown helped with the parallelization support

References

Breheny, P. and Huang, J. (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Statist., 5: 232-253.

Examples

data(prostate)
X <- as.matrix(prostate[,1:8])
y <- prostate$lpsa

cvfit <- cv.ncvreg(X, y)
plot(cvfit)
summary(cvfit)

fit <- cvfit$fit
plot(fit)
beta <- fit$beta[,cvfit$min]

## requires loading the parallel package
## Not run: 
library(parallel)
cl <- makeCluster(4)
cvfit <- cv.ncvreg(X, y, cluster=cl, nfolds=length(y))
## End(Not run)