This function produces predicted classes or confidence values from a C5.0 model.
Usage
## S3 method for class 'C5.0'
predict(object, newdata = NULL,
trials = object$trials["Actual"],
type = "class", na.action = na.pass, ...)
Arguments
object
an object of class C5.0
newdata
a matrix or data frame of predictors
trials
an integer for how many boosting iterations are used
for prediction. See the note below.
type
either "class" for the predicted class or "prob" for model confidence values.
na.action
when using a formula for the original model fit, how should missing values be handled?
...
other options (not currently used)
Details
Note that the number of trials in the object my be less than what was
specified originally (unless earlyStopping = FALSE was used in
C5.0Control). If the number requested is larger than the
actual number available, the maximum actual is used and a warning is
issued.
Model confidence values reflect the distribution of the classes in
terminal nodes or within rules.
For rule-based models (i.e. not boosted), the predicted confidence
value is the confidence value from the most specific, active
rule. Note that C4.5 sorts the rules, and uses the first active rule
for prediction. However, the default in the original sources did not
normalize the confidence values. For example, for two classes it was
possible to get confidence values of (0.3815, 0.8850) or (0.0000,
0.922), which do not add to one. For rules, this code divides the
values by there sum. The previous values would be converted to
(0.3012, 0.6988) and (0, 1). There are also cases where no rule is
activated. Here, equal values are assigned to each class.
For boosting, the per-class confidence values are aggregated over all
of the trees created during the boosting process and these aggregate
values are normalized so that the overall per-class confidence values
sum to one.
When the cost argument is used in the main function, class
probabilities derived from the class distribution in the terminal
nodes may not be consistent with the final predicted class. For this
reason, requesting class probabilities from a model using unequal
costs will throw an error.
Value
when type = "class", a factor vector is returned. When type = "prob", a matrix of confidence values is returned (one column per class).
Author(s)
Original GPL C code by Ross Quinlan, R code and modifications to C by
Max Kuhn, Steve Weston and Nathan Coulter
data(churn)
treeModel <- C5.0(x = churnTrain[, -20], y = churnTrain$churn)
predict(treeModel, head(churnTest[, -20]))
predict(treeModel, head(churnTest[, -20]), type = "prob")
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(C50)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/C50/predict.C5.0.Rd_%03d_medium.png", width=480, height=480)
> ### Name: predict.C5.0
> ### Title: Predict new samples using a C5.0 model
> ### Aliases: predict.C5.0
> ### Keywords: models
>
> ### ** Examples
>
> data(churn)
>
> treeModel <- C5.0(x = churnTrain[, -20], y = churnTrain$churn)
> predict(treeModel, head(churnTest[, -20]))
[1] no no no no no no
Levels: yes no
> predict(treeModel, head(churnTest[, -20]), type = "prob")
yes no
1 0.02706792 0.9729321
2 0.01114727 0.9888527
3 0.02488945 0.9751106
4 0.02706792 0.9729321
5 0.02706792 0.9729321
6 0.07481390 0.9251861
>
>
>
>
>
>
>
> dev.off()
null device
1
>