R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Find Interactions Between Pairs of Variables

find.interaction

R Documentation

Find Interactions Between Pairs of Variables

Description

Find pairwise interactions between variables.

Usage

## S3 method for class 'rfsrc'
find.interaction(object, xvar.names, cause, outcome.target,
  importance = c("permute", "random", "anti",
                 "permute.ensemble", "random.ensemble", "anti.ensemble"),
  method = c("maxsubtree", "vimp"), sorted = TRUE, nvar, nrep = 1, subset, 
  na.action = c("na.omit", "na.impute"),
  seed = NULL, do.trace = FALSE, verbose = TRUE, ...)

Arguments

`object`	An object of class `(rfsrc, grow)` or `(rfsrc, forest)`.
`xvar.names`	Character vector of names of target x-variables. Default is to use all variables.
`cause`	For competing risk families, integer value between 1 and `J` indicating the event of interest, where `J` is the number of event types. The default is to use the first event type.
`outcome.target`	Character value multivariate families specifying the target outcome to be used. The default is to use the first coordinate.
`importance`	Type of variable importance (VIMP). See `rfsrc` for details.
`method`	Method of analysis: maximal subtree or VIMP. See details below.
`sorted`	Should variables be sorted by VIMP? Does not apply for competing risks.
`nvar`	Number of variables to be used.
`nrep`	Number of Monte Carlo replicates when method="vimp".
`subset`	Vector indicating which rows of the x-variable matrix from the `object` are to be used. Uses all rows if not specified.
`na.action`	Action to be taken if the data contains `NA` values. Applies only when method="vimp".
`seed`	Seed for random number generator. Must be a negative integer.
`do.trace`	Number of seconds between updates to the user on approximate time to completion.
`verbose`	Set to `TRUE` for verbose output.
`...`	Further arguments passed to or from other methods.

Details

Using a previously grown forest, identify pairwise interactions for all pairs of variables from a specified list. There are two distinct approaches specified by the option method.

method="maxsubtree"

This invokes a maximal subtree analysis. In this case, a matrix is returned where entries [i][i] are the normalized minimal depth of variable [i] relative to the root node (normalized wrt the size of the tree) and entries [i][j] indicate the normalized minimal depth of a variable [j] wrt the maximal subtree for variable [i] (normalized wrt the size of [i]'s maximal subtree). Smaller [i][i] entries indicate predictive variables. Small [i][j] entries having small [i][i] entries are a sign of an interaction between variable i and j (note: the user should scan rows, not columns, for small entries). See Ishwaran et al. (2010, 2011) for more details.
method="vimp"

This invokes a joint-VIMP approach. Two variables are paired and their paired VIMP calculated (refered to as 'Paired' importance). The VIMP for each separate variable is also calculated. The sum of these two values is refered to as 'Additive' importance. A large positive or negative difference between 'Paired' and 'Additive' indicates an association worth pursuing if the univariate VIMP for each of the paired-variables is reasonably large. See Ishwaran (2007) for more details.

Computations might be slow depending upon the size of the data and the forest. In such cases, consider setting nvar to a smaller number. If method="maxsubtree", consider using a smaller number of trees in the original grow call.

If nrep is greater than 1, the analysis is repeated nrep times and results averaged over the replications (applies only when method="vimp").

Value

Invisibly, the interaction table (a list for competing risk data) or the maximal subtree matrix.

Author(s)

Hemant Ishwaran and Udaya B. Kogalur

References

Ishwaran H. (2007). Variable importance in binary regression trees and forests, Electronic J. Statist., 1:519-537.

Ishwaran H., Kogalur U.B., Gorodeski E.Z, Minn A.J. and Lauer M.S. (2010). High-dimensional variable selection for survival data. J. Amer. Statist. Assoc., 105:205-217.

Ishwaran H., Kogalur U.B., Chen X. and Minn A.J. (2011). Random survival forests for high-dimensional data. Statist. Anal. Data Mining, 4:115-132.

Examples

## Not run: 
## ------------------------------------------------------------
## find interactions, survival setting
## ------------------------------------------------------------

data(pbc, package = "randomForestSRC") 
pbc.obj <- rfsrc(Surv(days,status) ~ ., pbc, nsplit = 10, importance = TRUE)
find.interaction(pbc.obj, method = "vimp", nvar = 8)

## ------------------------------------------------------------
## find interactions, competing risks
## ------------------------------------------------------------

data(wihs, package = "randomForestSRC")
wihs.obj <- rfsrc(Surv(time, status) ~ ., wihs, nsplit = 3, ntree = 100,
                       importance = TRUE)
find.interaction(wihs.obj)
find.interaction(wihs.obj, method = "vimp")

## ------------------------------------------------------------
## find interactions, regression setting
## ------------------------------------------------------------

airq.obj <- rfsrc(Ozone ~ ., data = airquality, importance = TRUE)
find.interaction(airq.obj, method = "vimp", nrep = 3)
find.interaction(airq.obj)

## ------------------------------------------------------------
## find interactions, classification setting
## ------------------------------------------------------------

iris.obj <- rfsrc(Species ~., data = iris, importance = TRUE)
find.interaction(iris.obj, method = "vimp", nrep = 3)
find.interaction(iris.obj)

## ------------------------------------------------------------
## interactions for multivariate mixed forests
## ------------------------------------------------------------

mtcars2 <- mtcars
mtcars2$cyl <- factor(mtcars2$cyl)
mtcars2$carb <- factor(mtcars2$carb, ordered = TRUE)
mv.obj <- rfsrc(cbind(carb, mpg, cyl) ~., data = mtcars2, importance = TRUE)
find.interaction(mv.obj, method = "vimp", outcome.target = "carb")
find.interaction(mv.obj, method = "vimp", outcome.target = "mpg")
find.interaction(mv.obj, method = "vimp", outcome.target = "cyl")

## End(Not run)