R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Creates an object of class 'ice'.

ice	R Documentation

Creates an object of class `ice`.

Description

Creates an ice object with individual conditional expectation curves for the passed model object, X matrix, predictor, and response. See Goldstein et al (2013) for further details.

Usage

ice(object, X, y, predictor, predictfcn, verbose = TRUE, frac_to_build = 1, 
             indices_to_build = NULL, num_grid_pts, logodds = FALSE, probit = FALSE, ...)

Arguments

`object`	The fitted model to estimate ICE curves for.
`X`	The design matrix we wish to estimate ICE curves for. Rows are observations, columns are predictors. Typically this is taken to be `object`'s training data, but this is not strictly necessary.
`y`	Optional vector of the response values `object` was trained on. It is used to compute y-axis ranges that are useful for plotting. If not passed, the range of predicted values is used and a warning is printed.
`predictor`	The column number or variable name in `X` of the predictor of interest, (x_S= X[, j]).
`predictfcn`	Optional function that accepts two arguments, `object` and `newdata`, and returns an `N` vector of `object`'s predicted response for data `newdata`. If this argument is not passed, the procedure attempts to find a generic `predict` function corresponding to `class(object)`.
`verbose`	If `TRUE`, prints messages about the procedure's progress.
`frac_to_build`	Number between 0 and 1, with 1 as default. For large `X` matrices or fitted models that are slow to make predictions, specifying `frac_to_build` less than 1 will choose a subset of the observations to build curves for. The subset is chosen such that the remaining observations' values of `predictor` are evenly spaced throughout the quantiles of the full `X[,predictor]` vector.
`indices_to_build`	Vector of indices, each element between code{1} and code{nrow(X)} specifying which observations to build ICE curves for. As this is an alternative to setting `frac_to_build`, both cannot be specified.
`num_grid_pts`	Optional number of values in the range of `predictor` at which to estimate each curve. If missing, the curves are estimated at each unique value of `predictor` in the `X` observations we estimate ICE curves for.
`logodds`	If `TRUE`, for classification creates PDPs by plotting the centered log-odds implied by the fitted probabilities. We assume that the generic or passed predict function returns probabilities, and so the flag tells us to transform these to centered logits after the predictions are generated. Note: `probit` cannot be `TRUE`.
`probit`	If `TRUE`, for classification creates PDPs by plotting the probit implied by the fitted probabilities. We assume that the generic or passed predict function returns probabilities, and so the flag tells us to transform these to probits after the predictions are generated. Note: `logodds` cannot be `TRUE`.
`...`	Other arguments to be passed to `object`'s generic predict function.

Value

A list of class ice with the following elements.

`gridpts`	Sorted values of `predictor` at which each curve is estimated. Duplicates are removed – by definition, elements of `gridpts` are unique.
`ice_curves`	Matrix of dimension `nrow(X)` by `length(gridpts)`. Each row corresponds to an observation's ICE curve, estimated at the values of `predictor` in `gridpts`.
`xj`	The actual values of `predictor` observed in the data in the order of `Xice`.
`actual_predictions`	Vector of length `nrow(X)` containing the model's predictions at the actual value of the predictors in the order of `Xice`.
`xlab`	String with the predictor name corresponding to `predictor`. If `predictor` is a column number, `xlab` is set to `colnames(X)[, predictor]`.
`nominal_axis`	If `TRUE`, `length(gridpts)` is 5 or fewer; otherwise `FALSE`. When `TRUE` the `plot` function treats the x-axis as if x is nominal.
`range_y`	If `y` was passed, the range of the response. Otherwise it defaults to be `max(ice_curves)` - `min(ice_curves)` and a message is printed to the console.
`sd_y`	If `y` was passed, the standard deviation of the response. Otherwise it is defaults to `sd(actual_predictions)` and a message is printed to the console.
`Xice`	A matrix containing the subset of `X` for which ICE curves are estimated. Observations are ordered to be increasing in `predictor`. This ordering is the same one as in `ice_curves`, `xj` and `actual_predictions`, meaning for all these objects the `i`-th element refers to the same observation in `X`.
`pdp`	A vector of size `length(gridpts)` which is a numerical approximation to the partial dependence function (PDP) corresponding to the estimated ICE curves. See Goldstein et al (2013) for a discussion of how the PDP is a form of post-processing. See Friedman (2001) for a description of PDPs.
`predictor`	Same as the argument, see argument description.
`logodds`	Same as the argument, see argument description.
`indices_to_build`	Same as the argument, see argument description.
`frac_to_build`	Same as the argument, see argument description.
`predictfcn`	Same as the argument, see argument description.

References

Jerome Friedman. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5): 1189-1232, 2001.

Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation. (2014) Journal of Computational and Graphical Statistics, in press

Examples

## Not run: 
require(ICEbox)
require(randomForest)
require(MASS) #has Boston Housing data, Pima

########  regression example
data(Boston) #Boston Housing data
X = Boston
y = X$medv
X$medv = NULL

## build a RF:
bhd_rf_mod = randomForest(X, y)

## Create an 'ice' object for the predictor "age":
bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) 

#### classification example
data(Pima.te)  #Pima Indians diabetes classification
y = Pima.te$type
X = Pima.te
X$type = NULL

## build a RF:
pima_rf_mod = randomForest(x = X, y = y)

## Create an 'ice' object for the predictor "skin":
# For classification we plot the centered log-odds. If we pass a predict
# function that returns fitted probabilities, setting logodds = TRUE instructs
# the function to set each ice curve to the centered log-odds of the fitted 
# probability.
pima.ice = ice(object = pima_rf_mod, X = X, predictor = "skin", logodds = TRUE,
                    predictfcn = function(object, newdata){ 
                         predict(object, newdata, type = "prob")[, 2]
                    }
              )


## End(Not run)

Creates an object of class `ice`.

Description

Usage

Arguments

Value

References

See Also

Examples

Results

Creates an object of class ice.

Description

Usage

Arguments

Value

References

See Also

Examples

Results

Creates an object of class `ice`.