Performs k-fold cross validation for MCP- or SCAD-penalized
regression models over a grid of values for the regularization
parameter lambda.
Usage
cv.ncvreg(X, y, ..., cluster, nfolds=10, seed, cv.ind, returnY=FALSE,
trace=FALSE)
Arguments
X
The design matrix, without an intercept, as in
ncvreg.
y
The response vector, as in ncvreg.
...
Additional arguments to ncvreg.
cluster
cv.ncvreg can be run in parallel across a
cluster using the parallel package. The cluster must be set
up in advance using the makeCluster function from that
pacakge. The cluster must then be passed to cv.ncvreg (see
example).
nfolds
The number of cross-validation folds. Default is 10.
cv.ind
Which fold each observation belongs to. By default the
observations are randomly assigned by cv.ncvreg.
seed
You may set the seed of the random number generator in
order to obtain reproducible results.
returnY
Should cv.ncvreg return the fitted values from
the cross-validation folds? Default is FALSE; if TRUE, this will
return a matrix in which the element for row i, column j is the
fitted value for observation i from the fold in which observation i
was excluded from the fit, at the jth value of lambda.
trace
If set to TRUE, cv.ncvreg will inform the user of its
progress by announcing the beginning of each CV fold. Default is
FALSE.
Details
The function calls ncvregnfolds times, each time
leaving out 1/nfolds of the data. The cross-validation
error is based on the residual sum of squares when
family="gaussian" and the binomial deviance when
family="binomial" or family="poisson".
For family="binomial" models, the cross-validation fold
assignments are balanced across the 0/1 outcomes, so that each fold
has the same proportion of 0/1 outcomes (or as close to the same
proportion as it is possible to achieve if cases do not divide evenly).
Value
An object with S3 class "cv.ncvreg" containing:
cve
The error for each value of lambda, averaged
across the cross-validation folds.
cvse
The estimated standard error associated with each value of
for cve.
lambda
The sequence of regularization parameter values along
which the cross-validation error was calculated.
fit
The fitted ncvreg object for the whole data.
min
The index of lambda corresponding to
lambda.min.
lambda.min
The value of lambda with the minimum
cross-validation error.
null.dev
The deviance for the intercept-only model.
Bias
The estimated bias of the minimum cross-validation error,
as in Tibshirani RJ and Tibshirani R (2009),
"A Bias Correction for the Minimum Error Rate in Cross-Validation",
Ann. Appl. Stat. 3:822-829.
pe
If family="binomial", the cross-validation prediction
error for each value of lambda.
Y
If returnY=TRUE, the matrix of cross-validated fitted
values (see above).
Author(s)
Patrick Breheny <patrick-breheny@uiowa.edu>
Grant Brown helped with the parallelization support
References
Breheny, P. and Huang, J. (2011) Coordinate descent
algorithms for nonconvex penalized regression, with applications to
biological feature selection. Ann. Appl. Statist., 5: 232-253.
See Also
ncvreg, plot.cv.ncvreg, summary.cv.ncvreg
Examples
data(prostate)
X <- as.matrix(prostate[,1:8])
y <- prostate$lpsa
cvfit <- cv.ncvreg(X, y)
plot(cvfit)
summary(cvfit)
fit <- cvfit$fit
plot(fit)
beta <- fit$beta[,cvfit$min]
## requires loading the parallel package
## Not run:
library(parallel)
cl <- makeCluster(4)
cvfit <- cv.ncvreg(X, y, cluster=cl, nfolds=length(y))
## End(Not run)