Last data update: 2014.03.03

R: Cross-validation for ncvreg
cv.ncvregR Documentation

Cross-validation for ncvreg

Description

Performs k-fold cross validation for MCP- or SCAD-penalized regression models over a grid of values for the regularization parameter lambda.

Usage

cv.ncvreg(X, y, ..., cluster, nfolds=10, seed, cv.ind, returnY=FALSE,
trace=FALSE) 

Arguments

X

The design matrix, without an intercept, as in ncvreg.

y

The response vector, as in ncvreg.

...

Additional arguments to ncvreg.

cluster

cv.ncvreg can be run in parallel across a cluster using the parallel package. The cluster must be set up in advance using the makeCluster function from that pacakge. The cluster must then be passed to cv.ncvreg (see example).

nfolds

The number of cross-validation folds. Default is 10.

cv.ind

Which fold each observation belongs to. By default the observations are randomly assigned by cv.ncvreg.

seed

You may set the seed of the random number generator in order to obtain reproducible results.

returnY

Should cv.ncvreg return the fitted values from the cross-validation folds? Default is FALSE; if TRUE, this will return a matrix in which the element for row i, column j is the fitted value for observation i from the fold in which observation i was excluded from the fit, at the jth value of lambda.

trace

If set to TRUE, cv.ncvreg will inform the user of its progress by announcing the beginning of each CV fold. Default is FALSE.

Details

The function calls ncvreg nfolds times, each time leaving out 1/nfolds of the data. The cross-validation error is based on the residual sum of squares when family="gaussian" and the binomial deviance when family="binomial" or family="poisson".

For family="binomial" models, the cross-validation fold assignments are balanced across the 0/1 outcomes, so that each fold has the same proportion of 0/1 outcomes (or as close to the same proportion as it is possible to achieve if cases do not divide evenly).

Value

An object with S3 class "cv.ncvreg" containing:

cve

The error for each value of lambda, averaged across the cross-validation folds.

cvse

The estimated standard error associated with each value of for cve.

lambda

The sequence of regularization parameter values along which the cross-validation error was calculated.

fit

The fitted ncvreg object for the whole data.

min

The index of lambda corresponding to lambda.min.

lambda.min

The value of lambda with the minimum cross-validation error.

null.dev

The deviance for the intercept-only model.

Bias

The estimated bias of the minimum cross-validation error, as in Tibshirani RJ and Tibshirani R (2009), "A Bias Correction for the Minimum Error Rate in Cross-Validation", Ann. Appl. Stat. 3:822-829.

pe

If family="binomial", the cross-validation prediction error for each value of lambda.

Y

If returnY=TRUE, the matrix of cross-validated fitted values (see above).

Author(s)

Patrick Breheny <patrick-breheny@uiowa.edu>
Grant Brown helped with the parallelization support

References

Breheny, P. and Huang, J. (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Statist., 5: 232-253.

See Also

ncvreg, plot.cv.ncvreg, summary.cv.ncvreg

Examples

data(prostate)
X <- as.matrix(prostate[,1:8])
y <- prostate$lpsa

cvfit <- cv.ncvreg(X, y)
plot(cvfit)
summary(cvfit)

fit <- cvfit$fit
plot(fit)
beta <- fit$beta[,cvfit$min]

## requires loading the parallel package
## Not run: 
library(parallel)
cl <- makeCluster(4)
cvfit <- cv.ncvreg(X, y, cluster=cl, nfolds=length(y))
## End(Not run)

Results