Last data update: 2014.03.03

R: Lasso penalized score test
lassoscoreR Documentation

Lasso penalized score test

Description

Test for the association between y and each column of X, adjusted for the other columns using a lasso regression, as described in Voorman et al (2014).

Usage

lassoscore(y,X, lambda=0, family=c("gaussian","binomial","poisson"), 
    tol = .Machine$double.eps, maxit=1000, 
    resvar = NULL, verbose=FALSE, subset = NULL)

Arguments

y

outcome variable

X

matrix of predictors

lambda

tuning parameter value (see details)

family

The family, for the likelihood.

tol,maxit

convergence tolerance and maximum number of iterations in glmnet

resvar

value for the residual variance, for "gaussian" family. If not specified, the residual variance from lasso regression on all features is used (see details).

verbose

whether or not to print progress bars (defaults to FALSE)

subset

a subset of columns to test

Details

For each column of X, denoted by x*, this function computes the score statistic

T_λ = x*^T(y- yhat)/√ n,

where yhat are the fitted values from lasso regression of y on X[,-x*] (see Note 2).

The variance of the score statistic is estimated in 4 ways:

(i) a model-based estimate

(ii) a sandwich varaince

(iii/iv) conservative versions of (i) and (ii), which do not depend on the selected model

Note 1: in lasso regression of y on X, the coefficient of x* is non-zero if and only if

| T_λ | > λ √ n

Note 2: For lasso regression of y on X, we minimize -l(b) + lambda*||b||_1 over vectors b, where l(b) is either RSS/(2n) (for the "gaussian" family), or the log-likelihood for a generalized linear model. See the details of glmnet for more information.

Note 3:Each feature x is rescaled to have mean zero and x^Tx/n = 1, y is centered, but not rescaled.

Value

Object of class ‘lassoscore’, which is an R ‘list’, with elements:

fit

Elements of the fitted lasso regression of y on X (see glmnet for details.)

scores

the score statistics

resvar

the value used for the residual variance

scorevar.model

the variance of the score statistics, estimated using a model-based approximation

scorevar.sand

the variance of the score statistcs, using an model-agnostic, or sandwich formula

scorevar.model.cons,scorevar.sand.cons

conservative versions of the variances

p.model

p-value, using a model-based variance

p.sand

p-value, using sandwich variance

p.model.cons,p.sand.cons

p-value, using conservative variance formulas

Author(s)

Arend Voorman voorma@uw.edu

References

Voorman, A, Shojaie, A, and Witten D. Inference in high dimensions with the penalized score test. http://arxiv.org/abs/1401.2678.

See Also

glassoscore, qqpval

Examples

#Simulation from Voorman et al (2014)
set.seed(20)
n <- 300
p <- 100
q <- 10

set.seed(20)
beta <- numeric(p)
beta[sample(p,q)] <- 0.4

Sigma <- forceSymmetric(t(0.5^outer(1:p,1:p,"-")))
cSigma <- chol(Sigma)

x <- scale(replicate(p,rnorm(n))%*%cSigma)
y <- rnorm(n,x%*%beta,1)

mod <- lassoscore(y,x,0.02)
summary(mod)
plot(mod,type="all")

#test only features 10:20:
mod0 <- lassoscore(y,x,0.02, subset = 10:20)

######## Diabetes data set:
#Test features in the diabetes data set, using 2 different values of `lambda', 
#and compare results:
resvar <- with(lm(y~x,data=diabetes), sum(residuals^2)/df.residual)

mod2 <- with(diabetes,lassoscore(y,x,lambda=4,resvar=resvar))
mod3 <- with(diabetes,lassoscore(y,x,lambda=0.5,resvar=resvar))
data.frame(
  "variable"=colnames(diabetes$x),
  "lambda_4"=format(mod2$p.model,digits=2),
  "lambda_0.5"=format(mod3$p.model,digits=2))

Results