Last data update: 2014.03.03

R: SIDES algorithm
SIDESR Documentation

SIDES algorithm

Description

SIDES apply Subgroup Identification based on Differential Effect Search algorithm on a data set for binary or continuous outcome.

Usage

SIDES_method(all_set, type_var, type_outcome, level_control, D=0, L=3, S, M=5, 
gamma=NA, H=3, pct_rand=0.5, prop_gpe, alloc_high_prob=TRUE, num_crit, step=0.5, 
nb_sub_cross=5, alpha, nsim=500, nsim_cv=500, ord.bin=10, M_per_covar=FALSE, 
upper_best=TRUE, selec=FALSE, seed=42)

Arguments

all_set

Matrix or data frame representing the global data set. The first column must be the outcome, the second column must be the treatment variable, and other columns are for covariates.

type_var

A vector of length the number of covariates giving for each of them their type. Must be either "continuous", "ordinal" or "nominal".

type_outcome

Type of outcome. For now are implementing "continuous" and "binary".

level_control

Value representing the control in the data set.

D

Minimum desired difference to be demonstrated between the treatment and the control.

L

Maximum number of covariates used to define a subgroup (= depth of the tree). The default value is set at 3.

S

Minimum subgroup size desired. (Subgroups that do not meet this requirement will be excluded).

M

Maximum number of best promising subgroups selected at each step of the algorithm. The default value is set at 5.

gamma

Vector of length L representing the relative improvement parameter. Each element must be between 0 and 1. Smaller values indicate more selective procedure. If any improvment is desired, it is recommended to set all elements to 1.

H

Number of data sets the global data set is split into. There will be 1 training data set and H-1 validation sets. The default value is set at 2.

pct_rand

Proportion of the global data set that is randomly allocated between training and validation sets. The default value is set at 0.5.

prop_gpe

Vector of size H containing the proportion of patients for each data sets (training and validation).

alloc_high_prob

Boolean with value TRUE indicating that patients are allocated to the set the minimizing the imbalanced score, or FALSE indicated that patients are randomized into those sets inversely proportional to their imbalanced score.

num_crit

Integer representing the splitting criterion used. Value equal to 1 stands for criterion maximizing the differential effect between the two child subgroups, while value equal to 2 stands for criterion maximizing the treatment effect in at least one of the two child subgroups. The default value is set at 1.

step

When gamma is not specified, step into which to cut the interval [0,1] to determine gamma by cross-validation. Warning, this process is highly time-consuming and several ties are obtained, thus it is more recommended to provide gamma after thinking about what is desired. The default value is set at 0.5.

nb_sub_cross

Number of folds for cross-validation to determine gamma. The default value is set at 5.

alpha

Overall type I error rate.

nsim

Number of permutations for the resampling-based method used to protect the overall Type I error rate in a weak sense.

nsim_cv

Number of permutations for the resampling-based method used to protect the overall Type I error rate in the cross-validation part to determine gamma. The default value is set at 500.

ord.bin

Number of classes continuous covariates will be discretized into.

M_per_covar

Boolean indicating if the M best promising child subgroups are selected by covariate (TRUE) or accross all remaining covariates. The default value is set at FALSE.

upper_best

Boolean indicating if greater values of the outcome mean better responses.

selec

Boolean indicating if in addition of the validated subgroups, the output should also contain subgroups that were selected (before validation).

seed

Seed. The default value is set at 42.

Value

An object of class "SIDES" is returned, consisting of:

candidates

A list containing selected candidates subgroups (before validation step) and their associated p-values.

confirmed

A list containing confirmed/validated subgroups and their associated p-values.

Author(s)

Marie-Karelle Riviere-Jourdan eldamjh@gmail.com

References

Ilya Lipkovich, Alex Dmitrienko, Jonathan Denne and Gregory Enas. Subgroup identification based on differential effect search - A recursive partitioning method for establishing response to treatment in patient subpopulations. Statistics in Medicine, 2011. <doi:10.1002/sim.4289>

Examples

n=500
x=data.frame(matrix(rnorm(n*10,10,5),n,5),matrix(rbinom(n*10,1,0.5),n,5))
colnames(x)=paste("x",c(1:10),sep='')
rownames(x)=1:n
trt=rbinom(n,1,0.5)
I1=(x$x1>10);n1=sum(I1)
I6=(x$x6==0);n6=sum(I6)
I7=(x$x7==0);n7=sum(I7)
y=trt*(I1*(n-n1)-(1-I1)*n1+I6*(n-n6)-(1-I6)*n6+I7*(n-n7)-(1-I7)*n7)/n+rnorm(n)
data=cbind(y,trt,x)
head(data)

# REAL EXAMPLE TO UNCOMMENT
#s1 = SIDES_method(all_set=data, type_var=rep("ordinal",10), 
#type_outcome="continuous", level_control=0, D=0, L=3, S=30, M=5, 
#gamma=c(1,1,1), H=2, pct_rand=0.5, prop_gpe=c(0.7,0.3), num_crit=1, 
#alpha=0.10, nsim=1000, ord.bin=10, upper_best=TRUE, seed=42)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(SIDES)
Loading required package: memoise
Loading required package: doParallel
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
Loading required package: nnet
Loading required package: multicool
Loading required package: Rcpp

Attaching package: 'multicool'

The following object is masked from 'package:nnet':

    multinom

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/SIDES/SIDES.Rd_%03d_medium.png", width=480, height=480)
> ### Name: SIDES
> ### Title: SIDES algorithm
> ### Aliases: SIDES_method print.SIDES_method
> 
> ### ** Examples
> 
> n=500
> x=data.frame(matrix(rnorm(n*10,10,5),n,5),matrix(rbinom(n*10,1,0.5),n,5))
> colnames(x)=paste("x",c(1:10),sep='')
> rownames(x)=1:n
> trt=rbinom(n,1,0.5)
> I1=(x$x1>10);n1=sum(I1)
> I6=(x$x6==0);n6=sum(I6)
> I7=(x$x7==0);n7=sum(I7)
> y=trt*(I1*(n-n1)-(1-I1)*n1+I6*(n-n6)-(1-I6)*n6+I7*(n-n7)-(1-I7)*n7)/n+rnorm(n)
> data=cbind(y,trt,x)
> head(data)
          y trt        x1        x2        x3        x4        x5 x6 x7 x8 x9
1 1.1750646   0  2.008154  8.161327  2.794425 12.428073 10.106438  0  0  1  0
2 1.0360807   0 12.005221 14.765688  9.851336  3.977213  9.353721  1  0  1  1
3 0.1465496   1 16.350346 15.127182 16.687378  6.941301  5.320571  1  1  0  1
4 0.8721832   0 15.069613 10.562111 12.848693  2.506596 17.436265  0  1  0  1
5 0.1956123   1  3.204280 15.111875  6.285995  6.198599 17.246105  0  0  1  1
6 0.9017949   0  9.896241  4.163157  8.906769  6.785683 11.241832  0  0  0  0
  x10
1   0
2   0
3   1
4   0
5   0
6   1
> 
> # REAL EXAMPLE TO UNCOMMENT
> #s1 = SIDES_method(all_set=data, type_var=rep("ordinal",10), 
> #type_outcome="continuous", level_control=0, D=0, L=3, S=30, M=5, 
> #gamma=c(1,1,1), H=2, pct_rand=0.5, prop_gpe=c(0.7,0.3), num_crit=1, 
> #alpha=0.10, nsim=1000, ord.bin=10, upper_best=TRUE, seed=42)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>