R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Simulations of SIDES method

simulation_SIDES

R Documentation

Simulations of SIDES method

Description

simulation_SIDES is used to perform simulations of SIDES algorithm on a data set for binary or continuous outcome.

Usage

simulation_SIDES(all_set, type_var, type_outcome, level_control, D=0, L=3, S, 
M=5, num_crit=1, gamma=NA, alpha, nsim=500, ord.bin=10, nrep=100, seed=42,
H=2, pct_rand=0.5, prop_gpe, alloc_high_prob=TRUE, step=0.5, nb_sub_cross=5, 
nsim_cv=500, M_per_covar=FALSE, upper_best=TRUE, nb_cores=NA, ideal=NA)

Arguments

`all_set`	Matrix or data frame representing the global data set. The first column must be the outcome, the second column must be the treatment variable, and other columns are for covariates.
`type_var`	A vector of length the number of covariates giving for each of them their type. Mus be either "continuous", "ordinal" or "nominal".
`type_outcome`	Type of outcome. For now are implementing "continuous" and "binary".
`level_control`	Value representing the control in the data set.
`D`	Minimum desired difference to be demonstrate between the treament and the control.
`L`	Maximum number of covariates used to define a subgroup (= deepth of the tree). The default value is set at 3.
`S`	Minimum subgroup size desired. (Subgroups that do not meet this requirement will be excluded).
`M`	Maximum number of best promising subgroups selected at each step of the algorithm. The default value is set at 5.
`num_crit`	Integer representing the splitting criterion used. Value equal to 1 stands for criterion maximizing the differential effect between the two child subgroups, while value equal to 2 stands for criterion maximizing the treatment effect in at least one of the two child subgroups. The default value is set at 1.
`gamma`	Vector of length `L` representing the relative improvement parameter. Each element must be between 0 and 1. Smaller values indicates more selective procedure. If any improvment is desired, it is recommended to set all elements to 1.
`alpha`	Overall type I error rate.
`nsim`	Number of permutations for the resampling-based method used to protect the overall Type I error rate in a weak sense.
`ord.bin`	Number of classes continuous covariates will be discretized into.
`nrep`	Number of simulation replicates.
`seed`	Seed. The default value is set at 42.
`H`	Number of data sets the global data set is split into. There will be 1 trainning data set and H-1 validation sets. The default value is set at 2.
`pct_rand`	Proportion of the global data set that is randomly allocated between training and validation sets. The default value is set at 0.5.
`prop_gpe`	Vector of size `H` containing the proportion of patients for each data sets (traning and validation).
`alloc_high_prob`	Boolean with value TRUE indicating that patients are allocated to the set the minimizing the imbalanced score, or FALSE indicated that patients are randomized into those sets inversely proportional to their imbalanced score.
`step`	When `gamma` is not specified, step into which to cut the interval [0,1] to determine `gamma` by cross-validation. Warning, this process is highly time-consuming and several ties are obtained, thus it is more recommended to provide `gamma` after thinking about what is desired. The default value is set at 0.5.
`nb_sub_cross`	Number of folds for cross-validation to determine `gamma`. The default value is set at 5.
`nsim_cv`	Number of permutations for the resampling-based method used to protect the overall Type I error rate in the cross-validation part to determine `gamma`. The default value is set at 500.
`M_per_covar`	Boolean indicating if the `M` best promising child subgroups are selected by covariate (TRUE) or across all remaining covariates. The default value is set at FALSE.
`upper_best`	Boolean indicating if greater values of the outcome mean better responses.
`nb_cores`	Number of cores to use as algorithm is parallelized. The default value used all available cores.
`ideal`	When a simulation study is set up and data are generated, the ideal subgroup can be provided to obtain additional results. See examples for more details.

Value

An object of class "simulation_SIDES" is returned, consisting of:

`pct_no_subgroup`	Percentage of simulations where no sugroup is identified and validated.
`mean_size`	Mean subgroups size across all simulations (returning at least one subgroup).
`subgroups`	List of subgroups that are validated as responders.
`pct_selection`	Vector containing the percentage of selection and validation of each subgroup in `subgroups`.

Author(s)

Marie-Karelle Riviere-Jourdan eldamjh@gmail.com

References

Ilya Lipkovich, Alex Dmitrienko, Jonathan Denne and Gregory Enas. Subgroup identification based on differential effect search - A recursive partitioning method for establishing response to treatment in patient subpopulations. Statistics in Medicine, 2011. <doi:10.1002/sim.4289>

Examples

n=500
x=data.frame(matrix(rnorm(n*5,10,5),n,5),matrix(rbinom(n*5,1,0.5),n,5))
colnames(x)=paste("x",c(1:10),sep='')
rownames(x)=1:n
trt=rbinom(n,1,0.5)
I1=(x$x1>10);n1=sum(I1)
I6=(x$x6==0);n6=sum(I6)
I7=(x$x7==0);n7=sum(I7)
y=trt*(I1*(n-n1)-(1-I1)*n1+I6*(n-n6)-(1-I6)*n6+I7*(n-n7)-(1-I7)*n7)/n+rnorm(n)
data=cbind(y,trt,x)
head(data)

# DUMMY EXAMPLE TO RUN
s1 = simulation_SIDES(all_set=data[,c(1,2,8,9,10)], type_var=rep("ordinal",3), 
type_outcome="continuous", level_control=0, D=0, L=1, S=50, M=1, num_crit=1, 
gamma=c(1), alpha=0.05, nsim=1, ord.bin=10, nrep=1, seed=42,
H=2, pct_rand=1.0, prop_gpe=c(0.7,0.3), upper_best=TRUE, nb_cores=1)

# REAL EXAMPLE TO UNCOMMENT
#s1 = simulation_SIDES(all_set=data, type_var=rep("ordinal",10), 
#type_outcome="continuous", level_control=0, D=0, L=3, S=30, M=5, num_crit=1, 
#gamma=c(1,1,1), alpha=0.10, nsim=1000, ord.bin=10, nrep=1000, seed=42,
#H=2, pct_rand=0.5, prop_gpe=c(0.7,0.3), upper_best=TRUE)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(SIDES)
Loading required package: memoise
Loading required package: doParallel
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
Loading required package: nnet
Loading required package: multicool
Loading required package: Rcpp

Attaching package: 'multicool'

The following object is masked from 'package:nnet':

    multinom

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/SIDES/simulation_SIDES.Rd_%03d_medium.png", width=480, height=480)
> ### Name: simulation_SIDES
> ### Title: Simulations of SIDES method
> ### Aliases: simulation_SIDES print.simulation_SIDES
> 
> ### ** Examples
> 
> n=500
> x=data.frame(matrix(rnorm(n*5,10,5),n,5),matrix(rbinom(n*5,1,0.5),n,5))
> colnames(x)=paste("x",c(1:10),sep='')
> rownames(x)=1:n
> trt=rbinom(n,1,0.5)
> I1=(x$x1>10);n1=sum(I1)
> I6=(x$x6==0);n6=sum(I6)
> I7=(x$x7==0);n7=sum(I7)
> y=trt*(I1*(n-n1)-(1-I1)*n1+I6*(n-n6)-(1-I6)*n6+I7*(n-n7)-(1-I7)*n7)/n+rnorm(n)
> data=cbind(y,trt,x)
> head(data)
            y trt        x1        x2        x3        x4        x5 x6 x7 x8 x9
1 -0.55717739   1  7.411972  5.482782 12.035016 14.956174 10.759390  1  1  1  0
2  0.50111434   0 14.897535 13.919443  7.639155 11.149338 12.333725  0  1  0  0
3  0.33914221   0 13.510165  1.752890 10.165947  8.752516 10.234238  1  1  1  1
4 -2.67689033   1  7.521697  1.579217 17.854382  2.502422 11.769846  1  0  1  0
5 -0.72256417   1  0.972176  6.737257 11.641268  2.100787 10.267603  0  0  1  1
6  0.05852421   1 18.268504  5.460209  5.995356  3.676533  9.667452  0  1  1  0
  x10
1   0
2   0
3   0
4   0
5   1
6   0
> 
> # DUMMY EXAMPLE TO RUN
> s1 = simulation_SIDES(all_set=data[,c(1,2,8,9,10)], type_var=rep("ordinal",3), 
+ type_outcome="continuous", level_control=0, D=0, L=1, S=50, M=1, num_crit=1, 
+ gamma=c(1), alpha=0.05, nsim=1, ord.bin=10, nrep=1, seed=42,
+ H=2, pct_rand=1.0, prop_gpe=c(0.7,0.3), upper_best=TRUE, nb_cores=1)
starting worker pid=6226 on localhost:11160 at 01:31:10.671
Loading required package: SIDES
Loading required package: memoise
Loading required package: doParallel
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
Loading required package: nnet
Loading required package: multicool
Loading required package: Rcpp

Attaching package: 'multicool'

The following object is masked from 'package:nnet':

    multinom

loaded SIDES and set parent environment
[1] 1
[1] "No subgroup confirmed"
No candidate subgroups identified before confirmation phase:
No candidate subgroups confirmed:
No subgroup selected in  100 % 
Average size of the confirmed subgroups in the training data set in  NaN 
> 
> # REAL EXAMPLE TO UNCOMMENT
> #s1 = simulation_SIDES(all_set=data, type_var=rep("ordinal",10), 
> #type_outcome="continuous", level_control=0, D=0, L=3, S=30, M=5, num_crit=1, 
> #gamma=c(1,1,1), alpha=0.10, nsim=1000, ord.bin=10, nrep=1000, seed=42,
> #H=2, pct_rand=0.5, prop_gpe=c(0.7,0.3), upper_best=TRUE)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>