R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Compute the p-value

p.val.tree

R Documentation

Compute the p-value

Description

Test weither the selected tree by either BIC, AIC or CV procedure is significantly associated to the dependent variable or not, while adjusting for a confounding effect.

Usage

p.val.tree(xtree, xdata, Y.name, X.names, G.names, B = 10, args.rpart = 
list(minbucket = 40, maxdepth = 10, cp = 0), epsi = 0.001, iterMax = 5,
iterMin = 3, family = "binomial", LB = FALSE, 
args.parallel = list(numWorkers = 1), index = 4, verbose = TRUE)

Arguments

`xtree`	the maximal tree obtained by the function pltr.glm
`xdata`	the data frame used to build xtree
`Y.name`	the name of the dependent variable
`X.names`	the names of independent confounding variables to consider in the linear part of the `glm`
`G.names`	the names of independent variables to consider in the tree part of the hybrid `glm`.
`B`	the resampling size of the deviance difference
`args.rpart`	a list of options that control details of the rpart algorithm. `minbucket`: the minimum number of observations in any terminal <leaf> node; `cp`: complexity parameter (Any split that does not decrease the overall lack of fit by a factor of cp is not attempted); `maxdepth`: the maximum depth of any node of the final tree, with the root node counted as depth 0. ... See `rpart.control` for further details
`epsi`	a treshold value to check the convergence of the algorithm
`iterMax`	the maximal number of iteration to consider
`iterMin`	the minimum number of iteration to consider
`family`	the glm family considered depending on the type of the dependent variable.
`LB`	a binary indicator with values TRUE or FALSE indicating weither the loading are balanced or not in the parallel computing
`args.parallel`	parameters of the parallelization. See `mclapply` for more details.
`index`	the size of the selected tree (by the functions `best.tree.BIC.AIC` or `best.tree.CV`) using one of the proposed criteria
`verbose`	Logical; TRUE for printing progress during the computation (helpful for debugging)

Value

A list of three elements:

`p.value`	The `P-value` of the selected tree
`Timediff`	The execution time of the `test` procedure
`Badj`	The number of samples used inside the the procedure

Author(s)

Cyprien Mbogning

References

Mbogning, C., Perdry, H., Toussile, W., Broet, P.: A novel tree-based procedure for deciphering the genomic spectrum of clinical disease entities. Journal of Clinical Bioinformatics 4:6, (2014)

Fan, J., Zhang, C., Zhang, J.: Generalized likelihood ratio statistics and WILKS phenomenon. Annals of Statistics 29(1), 153-193 (2001)

Examples

## Not run: 
## load the data set

data(data_pltr)

## set the parameters 

args.rpart <- list(minbucket = 40, maxdepth = 10, cp = 0)
family <- "binomial"
Y.name <- "Y"
X.names <- "G1"
G.names <- paste("G", 2:15, sep="")

## build a maximal tree

fit_pltr <- pltr.glm(data_pltr, Y.name, X.names, G.names, args.rpart = args.rpart, 
                    family = family,iterMax = 5, iterMin = 3)
                     
##prunned back the maximal tree by BIC or AIC criterion

tree_select <- best.tree.BIC.AIC(xtree = fit_pltr$tree,data_pltr,Y.name, 
                                 X.names, family = family)
                     
## Compute the p-value of the selected tree by BIC

args.parallel = list(numWorkers = 10, type = "PSOCK")
index = tree_select$best_index[[1]]
p_value <- p.val.tree(xtree = fit_pltr$tree, data_pltr, Y.name, X.names, G.names,
            B = 100, args.rpart = args.rpart, epsi = 1e-3, 
            iterMax = 5, iterMin = 3, family = family, LB = FALSE, 
            args.parallel = args.parallel, index = index)

## End(Not run)