R: Extends the relaxnet Package with Polynomial Basis Expansions
widenet
R Documentation
Extends the relaxnet Package with Polynomial Basis Expansions
Description
Expands the basis according to the order argument, then runs relaxnet in order to select a subset of the basis functions. Multiple values of order and alpha (the elastic net tuning parameter) may be specified, leading to selection of a specific value by cross-validation.
Input matrix, each row is an observation vector. Sparse matrices are not yet supported for the widenet function. Must have unique colnames.
y
Response variable. Quantitative for family="gaussian". For family="binomial" should be either a factor with two levels, or a two-column matrix of counts or proportions.
family
Response type (see above).
order
The order of basis expansion. Elements must be in the set c(1, 2, 3). If there is more than one element, cross-validation is used to chose the order with best cross-validated performance.
alpha
The elastic net mixing parameter, see glmnet. If there is more than one element, cross-validation is used to chose the value with best cross-validated performance.
nfolds
Number of folds - default is 10. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is nfolds=3.
foldid
An optional vector of values between 1 and nfold identifying what fold each observation is in. If supplied, nfolds can be missing.
screen.method
The method to use to screen variables before basis expansion is applied. Default is no screening. "cor" = correlation, i.e. bivariate correlation with the outcome. ttest is meant for binary outcomes (family = "binomial"). The screening methods are adapted from the SuperLearner package, the author of which is Eric Polley.
screen.num.vars
The number of variables (columns of x to screen in when using screening.
multicore
Should execution be parallelized over cv folds (for cv.relaxnet) or over alpha values (for cv.alpha.relaxnet) using multicore functionality from R's parallel package?
mc.cores
Number of cores/cpus to be used for multicore processing. Parallelization is over cross-validation folds.
mc.seed
Integer value with which to seed the RNG when using parallel processing (internally, RNGkind will be called to set the RNG to "L'Ecuyer-CMRG"). Will be ignored if multicore is FALSE. If mulicore is FALSE, one should be able to get reprodicible results by setting the seed normally (with set.seed) prior to running.
...
Further arguments passed to relaxnet or cv.relaxnet, which should also be passed on to glmnet. Use with caution as this has not been tested.
Details
The type.measure argument has not yet been implemented. For type = gaussian models, mean squared error is used, and for type = binomial, binomial deviance is used.
Value
Returns and object of class "widenet" with the following elements:
call
A copy of the call which generated this object
order
The value of the order argument
alpha
The value of the alpha argument
screen.method
The value of the screen.method argument
screened.in.index
A vector which indexes the columns of x, indicating those variables which were screened in for the run on the full data
colsBinary
A vector of length ncol(x) representing which of the columns of x contained binary data. These columns will be represented by a 2. The other columns will have a 3.
cv.relaxnet.results
A list of lists containing "cv.relaxnet" objects, one for each combination of values of alpha and order.
min.cvm.mat
A matrix containing the minimum cross-validated risk for each combination of values of alpha and order
which.order.min
The order which "won" the cross-validation, i.e. resulted in minimum cross-validated risk.
which.alpha.min
The alpha value which "won" the cross-validation.
total.time
Total time in seconds to produce this result.
Note
This is a preliminary release and several additional features are planned for later versions.
Author(s)
Stephan Ritter, with design contributions from Alan Hubbard.
Much of the code (and some help file content) is adapted from the glmnet package, whose authors are Jerome Friedman, Trevor Hastie and Rob Tibshirani.
References
Stephan Ritter and Alan Hubbard, Tech report (forthcoming).
See Also
predict.widenet, relaxnet, cv.relaxnet
Examples
n <- 300
p <- 5
set.seed(23)
x <- matrix(rnorm(n*p), n, p)
colnames(x) <- paste("x", 1:ncol(x), sep = "")
y <- x[, 1] + x[, 2] + x[, 3] * x[, 4] + x[, 5]^2 + rnorm(n)
widenet.result <- widenet(x, y, family = "gaussian",
order = 2, alpha = 0.5)
summary(widenet.result)
coefs <- drop(predict(widenet.result, type = "coef"))
coefs[coefs != 0]