R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Variable importances from random forest on permuted class...

randomVarImpsRF

R Documentation

Variable importances from random forest on permuted class labels

Description

Return variable importances from random forests fitted to data sets like the original except class labels have been randomly permuted.

Usage

randomVarImpsRF(xdata, Class, forest, numrandom = 100,
                whichImp = "impsUnscaled", usingCluster = TRUE,
                TheCluster = NULL, ...)

Arguments

`xdata`	A data frame or matrix, with subjects/cases in rows and variables in columns. NAs not allowed.
`Class`	The dependent variable; must be a factor.
`forest`	A previously fitted random forest (see `randomForest`).
`numrandom`	The number of random permutations of the class labels.
`whichImp`	A vector of one or more of `impsUnscaled`, `impsScaled`, `impsGini`, that correspond, respectively, to the (unscaled) mean decrease in accuracy, the scaled mean decrease in accuracy, and the Gini index. See below and `randomForest`, `importance` and the references for further explanations of the measures of variable importance.
`usingCluster`	If TRUE use a cluster to parallelize the calculations.
`TheCluster`	The name of the cluster, if one is used.
`...`	Not used.

Details

The measure of variable importance most often used is based on the decrease of classification accuracy when values of a variable in a node of a tree are permuted randomly (see references); we use the unscaled version —see our paper and supplementary material. Note that, by default, importance returns the scaled version.

Value

An object of class randomVarImpsRF, which is a list with one to three named components. The name of each component corresponds to the types of variable importance measures selected (i.e., impsUnscaled, impsScaled, impsGini).

Each component is a matrix, of dimensions number of variables by numrandom; each element (i,j) of this matrix is the variable importance for variable i and random permutation j.

Author(s)

Ramon Diaz-Uriarte rdiaz02@gmail.com

References

Breiman, L. (2001) Random forests. Machine Learning, 45, 5–32.

Diaz-Uriarte, R. and Alvarez de Andres, S. (2005) Variable selection from random forests: application to gene expression data. Tech. report. http://ligarto.org/rdiaz/Papers/rfVS/randomForestVarSel.html

Svetnik, V., Liaw, A. , Tong, C & Wang, T. (2004) Application of Breiman's random forest to modeling structure-activity relationships of pharmaceutical molecules. Pp. 334-343 in F. Roli, J. Kittler, and T. Windeatt (eds.). Multiple Classier Systems, Fifth International Workshop, MCS 2004, Proceedings, 9-11 June 2004, Cagliari, Italy. Lecture Notes in Computer Science, vol. 3077. Berlin: Springer.

Examples


x <- matrix(rnorm(45 * 30), ncol = 30)
x[1:20, 1:2] <- x[1:20, 1:2] + 2
cl <- factor(c(rep("A", 20), rep("B", 25)))  

rf <- randomForest(x, cl, ntree = 200, importance = TRUE)
rf.rvi <- randomVarImpsRF(x, cl, 
                          rf, 
                          numrandom = 20, 
                          usingCluster = FALSE) 

randomVarImpsRFplot(rf.rvi, rf)