Last data update: 2014.03.03

R: Prunning the Maximal tree
best.tree.CVR Documentation

Prunning the Maximal tree

Description

this function is set to prune back the maximal tree by using a K-fold cross-validation procedure.

Usage

best.tree.CV(xtree, xdata, Y.name, X.names, G.names, family = "binomial", 
args.rpart = list(cp = 0, minbucket = 20, maxdepth = 10), epsi = 0.001, 
iterMax = 5, iterMin = 3, ncv = 10, verbose = TRUE)

Arguments

xtree

a tree to prune

xdata

the dataset used to build the tree

Y.name

the name of the dependent variable

X.names

the names of independent variables to consider in the linear part of the glm

G.names

the names of independent variables to consider in the tree part of the hybrid glm.

family

the glm family considered depending on the type of the dependent variable.

args.rpart

a list of options that control details of the rpart algorithm. minbucket: the minimum number of observations in any terminal <leaf> node; cp: complexity parameter (Any split that does not decrease the overall lack of fit by a factor of cp is not attempted); maxdepth: the maximum depth of any node of the final tree, with the root node counted as depth 0. ... See rpart.control for further details

epsi

a treshold value to check the convergence of the algorithm

iterMax

the maximal number of iteration to consider

iterMin

the minimum number of iteration to consider

ncv

The number of folds to consider for the cross-validation

verbose

Logical; TRUE for printing progress during the computation (helpful for debugging)

Value

a list of five elements:

best_index

The size of the selected tree by the cross-validation procedure

tree

The selected tree by CV

fit_glm

The fitted gpltr models selected with CV

CV_ERRORS

A list of two elements containing the cross-validation error of the selected tree by the CV procedure and a vector of cross-validation errors of all the competing models

Timediff

The execution time of the Cross-Validation procedure

Author(s)

Cyprien Mbogning

References

Mbogning, C., Perdry, H., Toussile, W., Broet, P.: A novel tree-based procedure for deciphering the genomic spectrum of clinical disease entities. Journal of Clinical Bioinformatics 4:6, (2014)

See Also

best.tree.BIC.AIC, pltr.glm

Examples

## Not run: 
##load the data set

data(data_pltr)

## set the parameters

args.rpart <- list(minbucket = 40, maxdepth = 10, cp = 0)
family <- "binomial"
Y.name <- "Y"
X.names <- "G1"
G.names <- paste("G", 2:15, sep="")

## build a maximal tree

fit_pltr <- pltr.glm(data_pltr, Y.name, X.names, G.names, args.rpart = args.rpart, 
                     family = family,iterMax = 5, iterMin = 3)
                     
##prunned back the maximal tree by a cross-validation procedure

tree_selected <- best.tree.CV(fit_pltr$tree, data_pltr, Y.name, X.names, G.names, 
     family = family, args.rpart = args.rpart, epsi = 0.001, iterMax = 5, 
     iterMin = 3, ncv = 10)
     
plot(tree_selected$tree, main = 'CV TREE')
text(tree_selected$tree, minlength = 0L, xpd = TRUE, cex = .6)

## End(Not run)

Results