Data a simulation study reported by Shao (1993, Table 1).
The linear regression model
Shao (1993, Table 2) reported 4 simulation experiments using
4 different values for the regression coefficients:
y = 2 + b[2] x2 + b[3] x3 + b[4] x4 + b[5] x5 + e,
where e is an independent normal error with unit variance.
The four regression coefficients for the four experiments
are shown in the table below,
Experiment
b[2]
b[3]
b[4]
b[5]
1
0
0
4
0
2
0
0
4
8
3
9
0
4
8
4
9
6
4
8
The table below summarizes the probability of correct model selection
in the experiment reported by Shao (1993, Table 2).
Three model selection methods are compared: LOOCV (leave-one-out CV),
CV(d=25) or the delete-d method with d=25 and APCV which is
a very efficient computation CV method but specialized to the
case of linear regression.
Experiment
LOOCV
CV(d=25)
APCV
1
0.484
0.934
0.501
2
0.641
0.947
0.651
3
0.801
0.965
0.818
4
0.985
0.948
0.999
The CV(d=25) outperforms LOOCV in all cases and it also outforms APCV
by a large margin in Experiments 1, 2 and 3 but in case 4 APCV
is slightly better.
Usage
data(Shao)
Format
A data frame with 40 observations on the following 4 inputs.
x2
a numeric vector
x3
a numeric vector
x4
a numeric vector
x5
a numeric vector
Source
Shao, Jun (1993). Linear Model Selection by Cross-Validation.
Journal of the American Statistical Assocation 88, 486-494.
Examples
#In this example BICq(q=0.25) selects the correct model but BIC does not
data(Shao)
X<-as.matrix.data.frame(Shao)
b<-c(0,0,4,0)
set.seed(123321123)
#Note: matrix multiplication must be escaped in Rd file
y<-X%*%b+rnorm(40)
Xy<-data.frame(Shao, y=y)
bestglm(Xy)
bestglm(Xy, IC="BICq")