Extract maximal subtree information from a RF-SRC object. Used for
variable selection and identifying interactions between variables.
Usage
## S3 method for class 'rfsrc'
max.subtree(object,
max.order = 2, sub.order = FALSE, conservative = FALSE, ...)
Arguments
object
An object of class (rfsrc, grow) or (rfsrc,
forest)
.
max.order
Non-negative integer specifying the target number
of order depths. Default is to return the first and second order
depths. Used to identify predictive variables. Setting
max.order=0 returns the first order depth for each
variable by tree. A side effect is that conservative is
automatically set to FALSE.
sub.order
Set this value to TRUE to return the
minimal depth of each variable relative to another variable.
Used to identify interrelationship between variables. See
details below.
conservative
If TRUE, the threshold value for selecting
variables is calculated using a conservative marginal
approximation to the minimal depth distribution (the method used
in Ishwaran et al. 2010). Otherwise, the minimal depth
distribution is the tree-averaged distribution. The latter method
tends to give larger threshold values and discovers more
variables, especially in high-dimensions.
...
Further arguments passed to or from other methods.
Details
The maximal subtree for a variable x is the largest subtree
whose root node splits on x. Thus, all parent nodes of
x's maximal subtree have nodes that split on variables other
than x. The largest maximal subtree possible is the root
node. In general, however, there can be more than one maximal
subtree for a variable. A maximal subtree may also not exist if
there are no splits on the variable. See Ishwaran et al. (2010,
2011) for details.
The minimal depth of a maximal subtree (the first order depth)
measures predictiveness of a variable x. It equals the
shortest distance (the depth) from the root node to the parent node
of the maximal subtree (zero is the smallest value possible). The
smaller the minimal depth, the more impact x has on
prediction. The mean of the minimal depth distribution is used as
the threshold value for deciding whether a variable's minimal depth
value is small enough for the variable to be classified as strong.
The second order depth is the distance from the root node to the
second closest maximal subtree of x. To specify the target
order depth, use the max.order option (e.g., setting
max.order=2 returns the first and second order depths).
Setting max.order=0 returns the first order depth for each
variable for each tree.
Set sub.order=TRUE to obtain the minimal depth of a
variable relative to another variable. This returns a
pxp matrix, where p is the number of variables,
and entries [i][j] are the normalized relative minimal depth of a
variable [j] within the maximal subtree for variable [i], where
normalization adjusts for the size of [i]'s maximal subtree. Entry
[i][i] is the normalized minimal depth of i relative to the root
node. The matrix should be read by looking across rows (not down
columns) and identifies interrelationship between variables. Small
[i][j] entries indicate interactions. See
find.interaction for related details.
For competing risk data, maximal subtree analyses are unconditional
(i.e., they are non-event specific).
Value
Invisibly, a list with the following components:
order
Order depths for a given variable up to max.order
averaged over a tree and the forest. Matrix of dimension
pxmax.order. If max.order=0, a matrix of
pxntree is returned containing the first order depth
for each variable by tree.
count
Averaged number of maximal subtrees, normalized by
the size of a tree, for each variable.
nodes.at.depth
Number of non-terminal nodes by depth for each tree.
sub.order
Average minimal depth of a variable relative to another
variable. Can be NULL.
threshold
Threshold value (the mean minimal depth) used to
select variables.
threshold.1se
Mean minimal depth plus one standard error.
topvars
Character vector of names of the final selected
variables.
topvars.1se
Character vector of names of the final selected
variables using the 1se threshold rule.
percentile
Minimal depth percentile for each variable.
density
Estimated minimal depth density.
second.order.threshold
Threshold for second order depth.
Author(s)
Hemant Ishwaran and Udaya B. Kogalur
References
Ishwaran H., Kogalur U.B., Gorodeski E.Z, Minn A.J. and
Lauer M.S. (2010). High-dimensional variable selection for survival
data. J. Amer. Statist. Assoc., 105:205-217.
Ishwaran H., Kogalur U.B., Chen X. and Minn A.J. (2011). Random
survival forests for high-dimensional data. Statist. Anal. Data
Mining, 4:115-132.
See Also
find.interaction,
var.select,
vimp
Examples
## Not run:
## ------------------------------------------------------------
## survival analysis
## first and second order depths for all variables
## ------------------------------------------------------------
data(veteran, package = "randomForestSRC")
v.obj <- rfsrc(Surv(time, status) ~ . , data = veteran)
v.max <- max.subtree(v.obj)
# first and second order depths
print(round(v.max$order, 3))
# the minimal depth is the first order depth
print(round(v.max$order[, 1], 3))
# strong variables have minimal depth less than or equal
# to the following threshold
print(v.max$threshold)
# this corresponds to the set of variables
print(v.max$topvars)
## ------------------------------------------------------------
## regression analysis
## try different levels of conservativeness
## ------------------------------------------------------------
mtcars.obj <- rfsrc(mpg ~ ., data = mtcars)
max.subtree(mtcars.obj)$topvars
max.subtree(mtcars.obj, conservative = TRUE)$topvars
## End(Not run)