Matrix or data frame representing the global data set. The first column must be the outcome, the second column must be the treatment variable, and other columns are for covariates.
type_var
A vector of length the number of covariates giving for each of them their type. Mus be either "continuous", "ordinal" or "nominal".
type_outcome
Type of outcome. For now are implementing "continuous" and "binary".
level_control
Value representing the control in the data set.
D
Minimum desired difference to be demonstrate between the treament and the control.
L
Maximum number of covariates used to define a subgroup (= deepth of the tree). The default value is set at 3.
S
Minimum subgroup size desired. (Subgroups that do not meet this requirement will be excluded).
M
Maximum number of best promising subgroups selected at each step of the algorithm. The default value is set at 5.
num_crit
Integer representing the splitting criterion used. Value equal to 1 stands for criterion maximizing the differential effect between the two child subgroups, while value equal to 2 stands for criterion maximizing the treatment effect in at least one of the two child subgroups. The default value is set at 1.
gamma
Vector of length L representing the relative improvement parameter. Each element must be between 0 and 1. Smaller values indicates more selective procedure. If any improvment is desired, it is recommended to set all elements to 1.
alpha
Overall type I error rate.
nsim
Number of permutations for the resampling-based method used to protect the overall Type I error rate in a weak sense.
ord.bin
Number of classes continuous covariates will be discretized into.
nrep
Number of simulation replicates.
seed
Seed. The default value is set at 42.
H
Number of data sets the global data set is split into. There will be 1 trainning data set and H-1 validation sets. The default value is set at 2.
pct_rand
Proportion of the global data set that is randomly allocated between training and validation sets. The default value is set at 0.5.
prop_gpe
Vector of size H containing the proportion of patients for each data sets (traning and validation).
alloc_high_prob
Boolean with value TRUE indicating that patients are allocated to the set the minimizing the imbalanced score, or FALSE indicated that patients are randomized into those sets inversely proportional to their imbalanced score.
step
When gamma is not specified, step into which to cut the interval [0,1] to determine gamma by cross-validation. Warning, this process is highly time-consuming and several ties are obtained, thus it is more recommended to provide gamma after thinking about what is desired. The default value is set at 0.5.
nb_sub_cross
Number of folds for cross-validation to determine gamma. The default value is set at 5.
nsim_cv
Number of permutations for the resampling-based method used to protect the overall Type I error rate in the cross-validation part to determine gamma. The default value is set at 500.
M_per_covar
Boolean indicating if the M best promising child subgroups are selected by covariate (TRUE) or across all remaining covariates. The default value is set at FALSE.
upper_best
Boolean indicating if greater values of the outcome mean better responses.
nb_cores
Number of cores to use as algorithm is parallelized. The default value used all available cores.
ideal
When a simulation study is set up and data are generated, the ideal subgroup can be provided to obtain additional results. See examples for more details.
Value
An object of class "simulation_SIDES" is returned, consisting of:
pct_no_subgroup
Percentage of simulations where no sugroup is identified and validated.
mean_size
Mean subgroups size across all simulations (returning at least one subgroup).
subgroups
List of subgroups that are validated as responders.
pct_selection
Vector containing the percentage of selection and validation of each subgroup in subgroups.
Ilya Lipkovich, Alex Dmitrienko, Jonathan Denne and Gregory Enas. Subgroup identification based on differential effect search - A recursive partitioning method for establishing response to treatment in patient subpopulations. Statistics in Medicine, 2011. <doi:10.1002/sim.4289>