An object of class (rfsrc, grow) or
(rfsrc, forest).
xvar.names
Character vector of names of target x-variables.
Default is to use all variables.
cause
For competing risk families, integer value between 1
and J indicating the event of interest, where J is
the number of event types. The default is to use the first event
type.
outcome.target
Character value multivariate families
specifying the target outcome to be used. The default is to use
the first coordinate.
importance
Type of variable importance (VIMP). See
rfsrc for details.
method
Method of analysis: maximal subtree or VIMP. See details
below.
sorted
Should variables be sorted by VIMP? Does not apply for
competing risks.
nvar
Number of variables to be used.
nrep
Number of Monte Carlo replicates when method="vimp".
subset
Vector indicating which rows of the x-variable matrix
from the object are to be used. Uses all rows if not specified.
na.action
Action to be taken if the data contains NA
values. Applies only when method="vimp".
seed
Seed for random number generator. Must be a negative
integer.
do.trace
Number of seconds between updates to the user on
approximate time to completion.
verbose
Set to TRUE for verbose output.
...
Further arguments passed to or from other methods.
Details
Using a previously grown forest, identify pairwise interactions for
all pairs of variables from a specified list. There are two
distinct approaches specified by the option method.
method="maxsubtree"
This invokes a maximal subtree analysis. In this case, a matrix
is returned where entries [i][i] are the normalized minimal depth
of variable [i] relative to the root node (normalized wrt the
size of the tree) and entries [i][j] indicate the normalized
minimal depth of a variable [j] wrt the maximal subtree for
variable [i] (normalized wrt the size of [i]'s maximal subtree).
Smaller [i][i] entries indicate predictive variables. Small
[i][j] entries having small [i][i] entries are a sign of an
interaction between variable i and j (note: the user should scan
rows, not columns, for small entries). See Ishwaran et
al. (2010, 2011) for more details.
method="vimp"
This invokes a joint-VIMP approach. Two variables are paired and
their paired VIMP calculated (refered to as 'Paired' importance).
The VIMP for each separate variable is also calculated. The sum
of these two values is refered to as 'Additive' importance. A
large positive or negative difference between 'Paired' and
'Additive' indicates an association worth pursuing if the
univariate VIMP for each of the paired-variables is reasonably
large. See Ishwaran (2007) for more details.
Computations might be slow depending upon the size of the data
and the forest. In such cases, consider setting nvar to a
smaller number. If method="maxsubtree", consider using a
smaller number of trees in the original grow call.
If nrep is greater than 1, the analysis is repeated
nrep times and results averaged over the replications
(applies only when method="vimp").
Value
Invisibly, the interaction table (a list for competing risk data)
or the maximal subtree matrix.
Author(s)
Hemant Ishwaran and Udaya B. Kogalur
References
Ishwaran H. (2007). Variable importance in binary regression
trees and forests, Electronic J. Statist., 1:519-537.
Ishwaran H., Kogalur U.B., Gorodeski E.Z, Minn A.J. and
Lauer M.S. (2010). High-dimensional variable selection for survival
data. J. Amer. Statist. Assoc., 105:205-217.
Ishwaran H., Kogalur U.B., Chen X. and Minn A.J. (2011). Random
survival forests for high-dimensional data. Statist. Anal. Data
Mining, 4:115-132.