A data frame or matrix, with subjects/cases in rows and
variables in columns. NAs not allowed.
Class
The dependent variable; must be a factor.
forest
A previously fitted random forest (see randomForest).
numrandom
The number of random permutations of the class labels.
whichImp
A vector of one or more of impsUnscaled,
impsScaled, impsGini, that correspond, respectively, to
the (unscaled) mean decrease in accuracy, the scaled mean decrease
in accuracy, and the Gini index. See below and
randomForest,
importance and the references for further explanations of the
measures of variable importance.
usingCluster
If TRUE use a cluster to parallelize the calculations.
TheCluster
The name of the cluster, if one is used.
...
Not used.
Details
The measure of variable importance most often used is based on the decrease
of classification accuracy when values of a variable in a node of a
tree are permuted randomly (see references);
we use the unscaled version —see our paper and supplementary
material. Note that, by default, importance returns the scaled
version.
Value
An object of class randomVarImpsRF, which is a list
with one to three named components. The name of each
component corresponds to the types of variable importance measures
selected (i.e., impsUnscaled, impsScaled, impsGini).
Each component is a matrix, of dimensions number of variables by
numrandom; each element (i,j) of this matrix is the variable
importance for variable i and random permutation j.
Svetnik, V., Liaw, A. , Tong, C & Wang, T. (2004) Application of
Breiman's random forest to modeling structure-activity relationships of
pharmaceutical molecules. Pp. 334-343 in F. Roli, J. Kittler, and T. Windeatt
(eds.). Multiple Classier Systems, Fifth International Workshop, MCS
2004, Proceedings, 9-11 June 2004, Cagliari, Italy. Lecture Notes in
Computer Science, vol. 3077. Berlin: Springer.