R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Lasso penalized score test

lassoscore

R Documentation

Lasso penalized score test

Description

Test for the association between y and each column of X, adjusted for the other columns using a lasso regression, as described in Voorman et al (2014).

Usage

lassoscore(y,X, lambda=0, family=c("gaussian","binomial","poisson"), 
    tol = .Machine$double.eps, maxit=1000, 
    resvar = NULL, verbose=FALSE, subset = NULL)

Arguments

`y`	outcome variable
`X`	matrix of predictors
`lambda`	tuning parameter value (see details)
`family`	The family, for the likelihood.
`tol,maxit`	convergence tolerance and maximum number of iterations in `glmnet`
`resvar`	value for the residual variance, for "gaussian" family. If not specified, the residual variance from lasso regression on all features is used (see details).
`verbose`	whether or not to print progress bars (defaults to FALSE)
`subset`	a subset of columns to test

Details

For each column of X, denoted by x*, this function computes the score statistic

T_λ = x*^T(y- yhat)/√ n,

where yhat are the fitted values from lasso regression of y on X[,-x*] (see Note 2).

The variance of the score statistic is estimated in 4 ways:

(i) a model-based estimate

(ii) a sandwich varaince

(iii/iv) conservative versions of (i) and (ii), which do not depend on the selected model

Note 1: in lasso regression of y on X, the coefficient of x* is non-zero if and only if

| T_λ | > λ √ n

Note 2: For lasso regression of y on X, we minimize -l(b) + lambda*||b||_1 over vectors b, where l(b) is either RSS/(2n) (for the "gaussian" family), or the log-likelihood for a generalized linear model. See the details of glmnet for more information.

Note 3:Each feature x is rescaled to have mean zero and x^Tx/n = 1, y is centered, but not rescaled.

Value

Object of class ‘lassoscore’, which is an R ‘list’, with elements:

`fit`	Elements of the fitted lasso regression of y on X (see `glmnet` for details.)
`scores`	the score statistics
`resvar`	the value used for the residual variance
`scorevar.model`	the variance of the score statistics, estimated using a model-based approximation
`scorevar.sand`	the variance of the score statistcs, using an model-agnostic, or sandwich formula
`scorevar.model.cons,scorevar.sand.cons`	conservative versions of the variances
`p.model`	p-value, using a model-based variance
`p.sand`	p-value, using sandwich variance
`p.model.cons,p.sand.cons`	p-value, using conservative variance formulas

Author(s)

Arend Voorman voorma@uw.edu

References

Voorman, A, Shojaie, A, and Witten D. Inference in high dimensions with the penalized score test. http://arxiv.org/abs/1401.2678.

Examples

#Simulation from Voorman et al (2014)
set.seed(20)
n <- 300
p <- 100
q <- 10

set.seed(20)
beta <- numeric(p)
beta[sample(p,q)] <- 0.4

Sigma <- forceSymmetric(t(0.5^outer(1:p,1:p,"-")))
cSigma <- chol(Sigma)

x <- scale(replicate(p,rnorm(n))%*%cSigma)
y <- rnorm(n,x%*%beta,1)

mod <- lassoscore(y,x,0.02)
summary(mod)
plot(mod,type="all")

#test only features 10:20:
mod0 <- lassoscore(y,x,0.02, subset = 10:20)

######## Diabetes data set:
#Test features in the diabetes data set, using 2 different values of `lambda', 
#and compare results:
resvar <- with(lm(y~x,data=diabetes), sum(residuals^2)/df.residual)

mod2 <- with(diabetes,lassoscore(y,x,lambda=4,resvar=resvar))
mod3 <- with(diabetes,lassoscore(y,x,lambda=0.5,resvar=resvar))
data.frame(
  "variable"=colnames(diabetes$x),
  "lambda_4"=format(mod2$p.model,digits=2),
  "lambda_0.5"=format(mod3$p.model,digits=2))