R: Fit Regularization Path for Gaussian or Binomial Generalized...
gamsel
R Documentation
Fit Regularization Path for Gaussian or Binomial Generalized Additive Model
Description
Using overlap grouped lasso penalties, gamsel selects whether a term in a gam is nonzero, linear, or a non-linear spline (up to a specified max df per variable). It fits the entire regularization path on a grid of values for the overall penalty lambda, both for gaussian and binomial families.
Input (predictor) matrix of dimension nobs x nvars. Each observation is a row.
y
Response variable. Quantitative for family="gaussian" and with values in {0,1} for family="binomial"
num_lambda
Number of lambda values to use. (Length of lambda sequence.)
lambda
User-supplied lambda sequence. For best performance, leave as NULL and allow the routine to automatically select lambda. Otherwise, supply a (preferably gradually) decreasing sequence.
family
Response type. "gaussian" for linear model (default). "binomial" for logistic model.
degrees
An integer vector of length nvars specifying the maximum number
of spline basis functions to use for each variable.
gamma
Penalty mixing parameter 0 ≤γ≤ 1. Values γ <
0.5 penalize linear fit less than non-linear fit. The default is γ = 0.4, which encourages a linear term over a nonlinear term.
dfs
Numeric vector of length nvars specifying the maximum (end-of-path) degrees of freedom for each variable.
bases
A list of orthonormal bases for the non-linear terms for each variable. The function pseudo.bases generates these, using the parameters dfs and degrees. See the documentation for pseudo.bases.
tol
Convergence threshold for coordinate descent. The coordinate descent loop continues until the total change in objective after a pass over all variables is less than tol. Default is 1e-4.
max_iter
Maximum number of coordinate descent iterations over all the variables for each lambda value. Default is 2000.
traceit
If TRUE, various information is printed during the fitting process.
parallel
passed on to the pseudo.bases() function. Uses
multiple process if available.
...
additional arguments passed on to pseudo.bases()
Details
The sequence of models along the lambda path is fit by (block) cordinate descent. In the case of logistic regression the fitting routine may terminate before all num_lambda values of lambda have been used. This occurs when the fraction of null deviance explained by the model gets too close to 1, at which point the fit becomes numerically unstable. Each of the smooth terms is computed using an approximation to the Demmler-Reinsch smoothing spline basis for that variable, and the accompanying diagonal pernalty matrix.
Value
An object with S3 class gamsel.
intercept
Intercept sequence of length num_lambda
alphas
nvars x num_lambda matrix of linear coefficient estimates
betas
sum(degrees) x num_lambda matrix of non-linear coefficient estimates
lambdas
The sequence of lambda values used
degrees
Number of basis functions used for each variable
parms
A set of parameters that capture the bases used. This allows for efficient generation of the bases elements for predict.gamsel
, the predict method for this class.
family
"gaussian" or "binomial"
nulldev
Null deviance (deviance of the intercept model)
dev.ratio
Vector of length num_lambda giving fraction of (null) deviance explained by each model along the lambda sequence
call
The call that produced this object
Author(s)
Alexandra Chouldechova and Trevor Hastie
Maintainer: Trevor Hastie hastie@stanford.edu