Cross-validation for selecting the number of binary rules in the interaction AIM with continuous outcomes
Usage
cv.lm.interaction(x, trt, y, K.cv=5, num.replicate=1, nsteps, mincut=0.1, backfit=F, maxnumcut=1, dirp=0)
Arguments
x
n by p matrix. The covariate matrix
trt
n vector. The treatment indicator
y
n vector. The continuous response variable
K.cv
K.cv-fold cross validation
num.replicate
number of independent replications of K-fold cross validations
nsteps
the maximum number of binary rules to be included in the index
mincut
the minimum cutting proportion for the binary rule at either end. It typically is between 0 and 0.2.
backfit
T/F. Whether the existing split points are adjusted after including a new binary rule
maxnumcut
the maximum number of binary splits per predictor
dirp
p vector. The given direction of the binary split for each of the p predictors. 0 represents "no pre-given direction"; 1 represents "(x>cut)"; -1 represents "(x<cut)". Alternatively, "dirp=0" represents that there is no pre-given direction for any of the predictor.
Details
cv.lm.interaction implements the K-fold cross-validation for interaction linear AIM. It estimates the score test statistics in the test set for testing the treatment*index interaction. It also provides the pre-validated fits for each observation and pre-validated score test statistics. The output can be used to select the optimal number of binary rules.
Value
cv.lm.interaction returns
kmax
the optimal number of binary rules based the cross-validation
meanscore
nsteps-vector. The cross-validated score test statistics (significant at 0.05, if greater than 1.96) for the treatment*index interaction
pvfit.score
nsteps-vector. The pre-validated score test statistics (significant at 0.05, if greater than 1.96) for the treatment*index interaction.
preval
nsteps by n matrix. Prevalidated fits for individual observation
References
L Tian and R Tibshirani
Adaptive index models for marker-based risk stratification,
Tech Report, available at http://www-stat.stanford.edu/~tibs/AIM.
R Tibshirani and B Efron, Pre-validation and inference in microarrays,
Statist. Appl. Genet. Mol. Biol., 1:1-18, 2002.
Author(s)
Lu Tian and Robert Tibshirani
Examples
## generate data
set.seed(1)
n=400
p=10
x=matrix(rnorm(n*p), n, p)
z=(x[,1]<0.2)+(x[,5]>0.2)
trt=rbinom(n, 1, 0.5)
beta=1
y=trt+beta*trt*z+rnorm(n)
## cross-validate the interaction linear AIM
a=cv.lm.interaction(x, trt, y, nsteps=10, K.cv=5, num.replicate=3)
## examine the score test statistics in the test set
par(mfrow=c(1,2))
plot(a$meanscore, type="l")
plot(a$pvfit.score, type="l")
## construct the index with the optimal number of binary rules
k.opt=a$kmax
a=lm.interaction(x, y, trt, nsteps=k.opt)
print(a)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(AIM)
Loading required package: survival
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/AIM/cv.lm.interaction.Rd_%03d_medium.png", width=480, height=480)
> ### Name: cv.lm.interaction
> ### Title: Cross-validation in interaction linear AIM
> ### Aliases: cv.lm.interaction
>
> ### ** Examples
>
> ## generate data
> set.seed(1)
>
> n=400
> p=10
> x=matrix(rnorm(n*p), n, p)
> z=(x[,1]<0.2)+(x[,5]>0.2)
> trt=rbinom(n, 1, 0.5)
> beta=1
> y=trt+beta*trt*z+rnorm(n)
>
>
>
> ## cross-validate the interaction linear AIM
> a=cv.lm.interaction(x, trt, y, nsteps=10, K.cv=5, num.replicate=3)
>
> ## examine the score test statistics in the test set
> par(mfrow=c(1,2))
> plot(a$meanscore, type="l")
> plot(a$pvfit.score, type="l")
>
>
> ## construct the index with the optimal number of binary rules
> k.opt=a$kmax
> a=lm.interaction(x, y, trt, nsteps=k.opt)
Warning message:
In sqrt(v.stat + 1e-08) : NaNs produced
> print(a)
$res
$res[[1]]
jmax cutp maxdir maxsc
[1,] 8 0.4907739 1 3.847912
$res[[2]]
jmax cutp maxdir maxsc
[1,] 8 0.4907739 1 3.847912
[2,] 2 -1.0624084 1 5.002688
$maxsc
[1] 3.847912 5.002688
>
>
>
>
>
> dev.off()
null device
1
>