Observation weights; defaults to 1 per observation
offset
Offset vector (matrix) as in glmnet
lambda
Optional user-supplied lambda sequence; default is
NULL, and glmnet chooses its own sequence
nfolds
number of folds - default is 10. Although nfolds
can be as large as the sample size (leave-one-out CV), it is not
recommended for large datasets. Smallest value allowable is nfolds=3
foldid
an optional vector of values between 1 and nfold
identifying what fold each observation is in. If supplied,
nfold can be missing.
type.measure
loss to use for cross-validation. Currently five
options, not all available for all models. The default is type.measure="deviance", which uses
squared-error for gaussian models (a.k.a type.measure="mse" there), deviance
for logistic and poisson
regression, and partial-likelihood for the Cox
model. type.measure="class" applies to binomial and multinomial logistic regression only,
and gives misclassification error. type.measure="auc" is for
two-class logistic regression only, and gives area under the ROC
curve. type.measure="mse" or type.measure="mae" (mean absolute error)
can be used by all models except the "cox"; they measure the
deviation from the fitted mean to the response.
grouped
This is an experimental argument, with default
TRUE, and can be ignored by most users. For all models
except the "cox", this refers to computing nfolds
separate statistics, and then using their mean and estimated
standard error to describe the CV curve. If grouped=FALSE,
an error matrix is built up at the observation level from the predictions
from the nfold fits, and then summarized (does not apply to
type.measure="auc"). For the "cox" family,
grouped=TRUE obtains the CV partial likelihood for the Kth
fold by subtraction; by subtracting the log partial
likelihood evaluated on the full dataset from that evaluated on
the on the (K-1)/K dataset. This makes more efficient use of risk
sets. With grouped=FALSE the log partial likelihood is
computed only on the Kth fold
keep
If keep=TRUE, a prevalidated array is
returned containing fitted values for each observation and each
value of lambda. This means these fits are computed with
this observation and the rest of its fold omitted. The
folid vector is also returned. Default is keep=FALSE
parallel
If TRUE, use parallel foreach to fit each fold.
Must register parallel before hand, such as doMC or others.
See the example below.
...
Other arguments that can be passed to glmnet
Details
The function runs glmnetnfolds+1 times; the
first to get the lambda sequence, and then the remainder to
compute the fit with each of the folds omitted. The error is
accumulated, and the average error and standard deviation over the
folds is computed.
Note that cv.glmnet does NOT search for
values for alpha. A specific value should be supplied, else
alpha=1 is assumed by default. If users would like to
cross-validate alpha as well, they should call cv.glmnet
with a pre-computed vector foldid, and then use this same fold vector
in separate calls to cv.glmnet with different values of
alpha. Note also that the results of cv.glmnet are
random, since the folds are selected at random. Users can reduce this
randomness by running cv.glmnet many times, and averaging the
error curves.
Value
an object of class "cv.glmnet" is returned, which is a
list with the ingredients of the cross-validation fit.
lambda
the values of lambda used in the fits.
cvm
The mean cross-validated error - a vector of length
length(lambda).
cvsd
estimate of standard error of cvm.
cvup
upper curve = cvm+cvsd.
cvlo
lower curve = cvm-cvsd.
nzero
number of non-zero coefficients at each lambda.
name
a text string indicating type of measure (for plotting
purposes).
glmnet.fit
a fitted glmnet object for the full data.
lambda.min
value of lambda that gives minimum
cvm.
lambda.1se
largest value of lambda such that error is
within 1 standard error of the minimum.
fit.preval
if keep=TRUE, this is the array of
prevalidated fits. Some entries can be NA, if that and
subsequent values of lambda are not reached for that fold
foldid
if keep=TRUE, the fold assignments used
Author(s)
Jerome Friedman, Trevor Hastie and Rob Tibshirani
Noah Simon helped develop the 'coxnet' function.
Jeffrey Wong and B. Narasimhan helped with the parallel option
Maintainer: Trevor Hastie hastie@stanford.edu
References
Friedman, J., Hastie, T. and Tibshirani, R. (2008)
Regularization Paths for Generalized Linear Models via Coordinate
Descent, http://www.stanford.edu/~hastie/Papers/glmnet.pdf Journal of Statistical Software, Vol. 33(1), 1-22 Feb 2010 http://www.jstatsoft.org/v33/i01/
Simon, N., Friedman, J., Hastie, T., Tibshirani, R. (2011)
Regularization Paths for Cox's Proportional Hazards Model via
Coordinate Descent, Journal of Statistical Software, Vol. 39(5)
1-13 http://www.jstatsoft.org/v39/i05/
See Also
glmnet and plot, predict, and coef methods for "cv.glmnet" object.