Last data update: 2014.03.03

R: Fit a linear model regularized by the nonconvex MC+ sparsity...
sparsenetR Documentation

Fit a linear model regularized by the nonconvex MC+ sparsity penalty

Description

Sparsenet uses coordinate descent on the MC+ nonconvex penalty family, and fits a surface of solutions over the two-dimensional parameter space. This penalty family is indexed by an overall strength paramter lambda (like lasso), and a convexity parameter gamma. Gamma = infinity corresponds to the lasso, and gamma = 1 best subset.

Usage

sparsenet(x, y, weights, exclude, dfmax = nvars + 1, pmax = min(dfmax *2, nvars),
ngamma = 9, nlambda = 50, max.gamma = 150, min.gamma = 1.000001,
lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04), lambda = NULL,
gamma = NULL, parms = NULL, warm = c("lambda", "gamma", "both"),
thresh = 1e-05, maxit = 1e+06)

Arguments

x

Input matrix of nobs x nvars predictors

y

response vector

weights

Observation weights; default 1 for each observation

exclude

Indices of variables to be excluded from the model. Default is none.

dfmax

Limit the maximum number of variables in the model. Useful for very large nvars, if a partial path is desired.

pmax

Limit the maximum number of variables ever to be nonzero

ngamma

Number of gamma values, if gamma not supplied; default is 9.

nlambda

Number of lambda values, if lambda not supplied; default is 50

max.gamma

Largest gamma value to be used, apart from infinity (lasso), if gamma not supplied; default is 150

min.gamma

Smallest value of gamma to use, and should be >1; default is 1.000001

lambda.min.ratio

Smallest value for lambda, as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). The default depends on the sample size nobs relative to the number of variables nvars. If nobs > nvars, the default is 0.0001, close to zero. If nobs < nvars, the default is 0.01. A very small value of lambda.min.ratio will lead to a saturated fit in the nobs < nvars case.

lambda

A user supplied lambda sequence, in decreasing order. Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this. WARNING: use with care. Do not supply a single value for lambda (for predictions after CV use predict() instead). Supply instead a decreasing sequence of lambda values. sparsenet relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit.

gamma

Sparsity parameter vector, with 1<gamma<infty. Gamma=1 corresponds to best-subset regression, gamma=infty to the lasso. Should be given in decreasing order.

parms

An optional three-dimensional array: 2x ngamma x nlambda. Here the user can supply exactly the gamma, lambda pairs that are to be traversed by the coordinate descent algorithm.

warm

How to traverse the grid. Default is "lambda", meaning warm starts from the previous lambda with the same gamma. "gamma" means the opposite, previous gamma for the same lambda. "both" tries both warm starts, and uses the one that improves the criterion the most.

thresh

Convergence threshold for coordinate descent. Each coordinate-descent loop continues until the maximum change in the objective after any coefficient update is less than thresh times the null Rss. Defaults value is 1E-5.

maxit

Maximum number of passes over the data for all lambda/gamma values; default is 10^6.

Details

This algorithm operates like glmnet, with its alpha parameter which moves the penalty between lasso and ridge; here gamma moves it between lasso and best subset. The algorithm traverses the two dimensional gamma/lambda array in a nested loop, with decreasing gamma in the outer loop, and decreasing lambda in the inner loop. Because of the nature of the MC+ penalty, each coordinate update is a convex problem, with a simple two-threshold shrinking scheme: beta< lambda set to zero; beta > lambda*gamma leave alone; beta inbetween, shrink proportionally. Note that this algorithm ALWAYS standardizes the columns of x and y to have mean zero and variance 1 (using the 1/N averaging) before it computes its fit. The coefficients reflect the original scale.

Value

An object of class "sparsenet", with a number of components. Mostly one will access the components via generic functions like coef(), plot(), predict() etc.

call

the call that produced this object

rsq

The percentage variance explained on the training data; an ngamma x nlambda matrix.

jerr

error flag, for warnings and errors (largely for internal debugging).

coefficients

A coefficient list with ngamma elements; each of these is a coefficient list with various components: the matrix beta of coefficients, its dimension dim, the vector of intercepts, the lambda sequence, the gamma value, the sequence of df (nonzero coefficients) for each solution.

parms

Irrespective how the parameters were input, the three-way array of what was used.

gamma

The gamma values used

lambda

The lambda values used

max.lambda

The entry value for lambda

Author(s)

Rahul Mazumder, Jerome Friedman and Trevor Hastie

Maintainer: Trevor Hastie <hastie@stanford.edu>

References

http://www.stanford.edu/~hastie/Papers/Sparsenet/jasa_MFH_final.pdf

See Also

glmnet package, predict, coef, print and plot methods, and the cv.sparsenet function.

Examples

train.data=gendata(100,1000,nonzero=30,rho=0.3,snr=3)
fit=sparsenet(train.data$x,train.data$y)
par(mfrow=c(3,3))
plot(fit)
par(mfrow=c(1,1))
fitcv=cv.sparsenet(train.data$x,train.data$y,trace.it=TRUE)
plot(fitcv)

Results