R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Simulated Regression Data

Shao	R Documentation

Simulated Regression Data

Description

Data a simulation study reported by Shao (1993, Table 1). The linear regression model Shao (1993, Table 2) reported 4 simulation experiments using 4 different values for the regression coefficients:

y = 2 + b[2] x2 + b[3] x3 + b[4] x4 + b[5] x5 + e,

where e is an independent normal error with unit variance.

The four regression coefficients for the four experiments are shown in the table below,

Experiment	b[2]	b[3]	b[4]	b[5]
1	0	0	4	0
2	0	0	4	8
3	9	0	4	8
4	9	6	4	8

The table below summarizes the probability of correct model selection in the experiment reported by Shao (1993, Table 2). Three model selection methods are compared: LOOCV (leave-one-out CV), CV(d=25) or the delete-d method with d=25 and APCV which is a very efficient computation CV method but specialized to the case of linear regression.

Experiment	LOOCV	CV(d=25)	APCV
1	0.484	0.934	0.501
2	0.641	0.947	0.651
3	0.801	0.965	0.818
4	0.985	0.948	0.999

The CV(d=25) outperforms LOOCV in all cases and it also outforms APCV by a large margin in Experiments 1, 2 and 3 but in case 4 APCV is slightly better.

Usage

data(Shao)

Format

A data frame with 40 observations on the following 4 inputs.

x2: a numeric vector
x3: a numeric vector
x4: a numeric vector
x5: a numeric vector

Source

Shao, Jun (1993). Linear Model Selection by Cross-Validation. Journal of the American Statistical Assocation 88, 486-494.

Examples