the names of independent variables to consider in the linear part of the glm
G.names
the names of independent variables to consider in the tree part of the hybrid glm.
family
the glm family considered depending on the type of the dependent variable.
args.rpart
a list of options that control details of the rpart algorithm. minbucket: the minimum number of observations in any terminal <leaf> node; cp: complexity parameter (Any split that does not decrease the overall lack of fit by a factor of cp is not attempted); maxdepth: the maximum depth of any node of the final tree, with the root node counted as depth 0. ...
See rpart.control for further details
epsi
a treshold value to check the convergence of the algorithm
iterMax
the maximal number of iteration to consider
iterMin
the minimum number of iteration to consider
ncv
The number of folds to consider for the cross-validation
verbose
Logical; TRUE for printing progress during the computation (helpful for debugging)
Value
a list of five elements:
best_index
The size of the selected tree by the cross-validation procedure
tree
The selected tree by CV
fit_glm
The fitted gpltr models selected with CV
CV_ERRORS
A list of two elements containing the cross-validation error of the selected tree by the CV procedure and a
vector of cross-validation errors of all the competing models
Timediff
The execution time of the Cross-Validation procedure
Author(s)
Cyprien Mbogning
References
Mbogning, C., Perdry, H., Toussile, W., Broet, P.: A novel tree-based procedure for deciphering the genomic spectrum of clinical disease entities. Journal of Clinical Bioinformatics 4:6, (2014)
See Also
best.tree.BIC.AIC, pltr.glm
Examples
## Not run:
##load the data set
data(data_pltr)
## set the parameters
args.rpart <- list(minbucket = 40, maxdepth = 10, cp = 0)
family <- "binomial"
Y.name <- "Y"
X.names <- "G1"
G.names <- paste("G", 2:15, sep="")
## build a maximal tree
fit_pltr <- pltr.glm(data_pltr, Y.name, X.names, G.names, args.rpart = args.rpart,
family = family,iterMax = 5, iterMin = 3)
##prunned back the maximal tree by a cross-validation procedure
tree_selected <- best.tree.CV(fit_pltr$tree, data_pltr, Y.name, X.names, G.names,
family = family, args.rpart = args.rpart, epsi = 0.001, iterMax = 5,
iterMin = 3, ncv = 10)
plot(tree_selected$tree, main = 'CV TREE')
text(tree_selected$tree, minlength = 0L, xpd = TRUE, cex = .6)
## End(Not run)