Last data update: 2014.03.03

R: Cross-validation for biglasso
cv.biglassoR Documentation

Cross-validation for biglasso

Description

Perform k-fold cross validation for penalized regression models over a grid of values for the regularization parameter lambda.

Usage

cv.biglasso(X, y, row.idx = 1:nrow(X), ..., ncores = 1, 
            nfolds=10, seed, cv.ind, trace=FALSE)

Arguments

X

The design matrix, without an intercept, as in biglasso.

y

The response vector, as in biglasso.

row.idx

The integer vector of row indices of X that used for fitting the model. as in biglasso.

...

Additional arguments to biglasso.

ncores

cv.biglasso can be run in parallel across a cluster using the parallel package. If ncores > 1, then a cluster is created to run cv.biglasso in parallel. The code is run in series if ncores = 1 (the default). An error occurs if ncores is larger than the total number of available cores. Since each core takes (around equally) a large portion of memory, the total memory consumed would be proportional to ncores. Be cautious here to prevent the memory usage from blowing up in the big data case.

nfolds

The number of cross-validation folds. Default is 10.

seed

The seed of the random number generator in order to obtain reproducible results.

cv.ind

Which fold each observation belongs to. By default the observations are randomly assigned by cv.biglasso.

trace

If set to TRUE, cv.biglasso will inform the user of its progress by announcing the beginning of each CV fold. Default is FALSE.

Details

The function calls biglasso nfolds times, each time leaving out 1/nfolds of the data. The cross-validation error is based on the residual sum of squares when family="gaussian" and the binomial deviance when family="binomial".

The S3 class object cv.biglasso inherits class cv.ncvreg. So S3 functions such as "summary", "plot" can be directly applied to the cv.biglasso object.

Value

An object with S3 class "cv.biglasso" which inherits from class "cv.ncvreg". The following variables are contained in the class (adopted from cv.ncvreg).

cve

The error for each value of lambda, averaged across the cross-validation folds.

cvse

The estimated standard error associated with each value of for cve.

lambda

The sequence of regularization parameter values along which the cross-validation error was calculated.

fit

The fitted biglasso object for the whole data.

min

The index of lambda corresponding to lambda.min.

lambda.min

The value of lambda with the minimum cross-validation error.

null.dev

The deviance for the intercept-only model.

pe

If family="binomial", the cross-validation prediction error for each value of lambda.

Author(s)

Yaohui Zeng and Patrick Breheny

Maintainer: Yaohui Zeng <yaohui-zeng@uiowa.edu>

See Also

biglasso, plot.cv.biglasso, summary.cv.biglasso, setupX

Examples

## cv.biglasso
seed <- 1234
data(prostate)
X <- as.matrix(prostate[,1:8])
y <- prostate$lpsa
X <- as.big.matrix(X)
# run in series
cvfit <- cv.biglasso(X, y, family = 'gaussian', seed = seed)
par(mfrow = c(2, 2))
plot(cvfit, type = 'all')
summary(cvfit)

# run in parallel
## Not run: 
cvfit2 <- cv.biglasso(X, y, family = 'gaussian', seed = seed, ncores = 5)
plot(cvfit2)
summary(cvfit2)
stopifnot(identical(cvfit, cvfit2))

## End(Not run)

Results