R Graphical Manual

Browse All

Last data update: 2014.03.03

R: C5.0 Decision Trees and Rule-Based Models

C5.0.default

R Documentation

C5.0 Decision Trees and Rule-Based Models

Description

Fit classification tree models or rule-based models using Quinlan's C5.0 algorithm

Usage

C5.0(x, ...)

## Default S3 method:
C5.0(x, y, trials = 1, rules= FALSE, 
     weights = NULL, 
     control = C5.0Control(), 
     costs = NULL, ...)


## S3 method for class 'formula'
C5.0(formula, data, weights, subset,
     na.action = na.pass, ...)

Arguments

`x`	a data frame or matrix of predictors.
`y`	a factor vector with 2 or more levels
`trials`	an integer specifying the number of boosting iterations. A value of one indicates that a single model is used.
`rules`	A logical: should the tree be decomposed into a rule-based model?
`weights`	an optional numeric vector of case weights. Note that the data used for the case weights will not be used as a splitting variable in the model (see http://www.rulequest.com/see5-win.html#CASEWEIGHT for Quinlan's notes on case weights).
`control`	a list of control parameters; see `C5.0Control`
`costs`	a matrix of costs associated with the possible errors. The matrix should have C columns and rows where C is the number of class levels.
`formula`	a formula, with a response and at least one predictor.
`data`	an optional data frame in which to interpret the variables named in the formula.
`subset`	optional expression saying that only a subset of the rows of the data should be used in the fit.
`na.action`	a function which indicates what should happen when the data contain `NA`s. The default is to include missing values since the model can accommodate them.
`...`	other options to pass into the function (not currently used with default method)

Details

This model extends the C4.5 classification algorithms described in Quinlan (1992). The details of the extensions are largely undocumented. The model can take the form of a full decision tree or a collection of rules (or boosted versions of either).

When using the formula method, factors and other classes are preserved (i.e. dummy variables are not automatically created). This particular model handles non-numeric data of some types (such as character, factor and ordered data).

The cost matrix should by CxC, where C is the number of classes. Diagonal elements are ignored. Columns should correspond to the true classes and rows are the predicted classes. For example, if C = 3 with classes Red, Blue and Green (in that order), a value of 5 in the (2,3) element of the matrix would indicate that the cost of predicting a Green sample as Blue is five times the usual value (of one). Note that when costs are used, class probabilities cannot be generated using predict.C5.0.

Internally, the code will attempt to halt boosting if it appears to be ineffective. For this reason, the value of trials may be different from what the model actually produced. There is an option to turn this off in C5.0Control.

Value

An object of class C5.0 with elements:

`boostResults`	a parsed version of the boosting table(s) shown in the output
`call`	the function call
`caseWeights`	not currently supported.
`control`	an echo of the specifications from `C5.0Control`
`cost`	the text version of the cost matrix (or "")
`costMatrix`	an echo of the model argument
`dims`	original dimensions of the predictor matrix or data frame
`levels`	a character vector of factor levels for the outcome
`names`	a string version of the names file
`output`	a string version of the command line output
`predictors`	a character vector of predictor names
`rbm`	a logical for rules
`rules`	a character version of the rules file
`size`	n integer vector of the tree/rule size (or sizes in the case of boosting)
`tree`	a string version of the tree file
`trials`	a named vector with elements `Requested` (an echo of the function call) and `Actual` (how many the model used)

Note

The command line version currently supports more data types than the R port. Currently, numeric, factor and ordered factors are allowed as predictors.

Author(s)

Original GPL C code by Ross Quinlan, R code and modifications to C by Max Kuhn, Steve Weston and Nathan Coulter

References

Quinlan R (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, http://www.rulequest.com/see5-unix.html

Examples

data(churn)

treeModel <- C5.0(x = churnTrain[, -20], y = churnTrain$churn)
treeModel
summary(treeModel)

ruleModel <- C5.0(churn ~ ., data = churnTrain, rules = TRUE)
ruleModel
summary(ruleModel)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(C50)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/C50/C5.0.Rd_%03d_medium.png", width=480, height=480)
> ### Name: C5.0.default
> ### Title: C5.0 Decision Trees and Rule-Based Models
> ### Aliases: C5.0.default C5.0.formula C5.0
> ### Keywords: models
> 
> ### ** Examples
> 
> data(churn)
> 
> treeModel <- C5.0(x = churnTrain[, -20], y = churnTrain$churn)
> treeModel

Call:
C5.0.default(x = churnTrain[, -20], y = churnTrain$churn)

Classification Tree
Number of samples: 3333 
Number of predictors: 19 

Tree size: 27 

Non-standard options: attempt to group attributes

> summary(treeModel)

Call:
C5.0.default(x = churnTrain[, -20], y = churnTrain$churn)


C5.0 [Release 2.07 GPL Edition]  	Mon Jul  4 15:30:26 2016
-------------------------------

Class specified by attribute `outcome'

Read 3333 cases (20 attributes) from undefined.data

Decision tree:

total_day_minutes > 264.4:
:...voice_mail_plan = yes:
:   :...international_plan = no: no (45/1)
:   :   international_plan = yes: yes (8/3)
:   voice_mail_plan = no:
:   :...total_eve_minutes > 187.7:
:       :...total_night_minutes > 126.9: yes (94/1)
:       :   total_night_minutes <= 126.9:
:       :   :...total_day_minutes <= 277: no (4)
:       :       total_day_minutes > 277: yes (3)
:       total_eve_minutes <= 187.7:
:       :...total_eve_charge <= 12.26: no (15/1)
:           total_eve_charge > 12.26:
:           :...total_day_minutes <= 277:
:               :...total_night_minutes <= 224.8: no (13)
:               :   total_night_minutes > 224.8: yes (5/1)
:               total_day_minutes > 277:
:               :...total_night_minutes > 151.9: yes (18)
:                   total_night_minutes <= 151.9:
:                   :...account_length <= 123: no (4)
:                       account_length > 123: yes (2)
total_day_minutes <= 264.4:
:...number_customer_service_calls > 3:
    :...total_day_minutes <= 160.2:
    :   :...total_eve_charge <= 19.83: yes (79/3)
    :   :   total_eve_charge > 19.83:
    :   :   :...total_day_minutes <= 120.5: yes (10)
    :   :       total_day_minutes > 120.5: no (13/3)
    :   total_day_minutes > 160.2:
    :   :...total_eve_charge > 12.05: no (130/24)
    :       total_eve_charge <= 12.05:
    :       :...total_eve_calls <= 125: yes (16/2)
    :           total_eve_calls > 125: no (3)
    number_customer_service_calls <= 3:
    :...international_plan = yes:
        :...total_intl_calls <= 2: yes (51)
        :   total_intl_calls > 2:
        :   :...total_intl_minutes <= 13.1: no (173/7)
        :       total_intl_minutes > 13.1: yes (43)
        international_plan = no:
        :...total_day_minutes <= 223.2: no (2221/60)
            total_day_minutes > 223.2:
            :...total_eve_charge <= 20.5: no (295/22)
                total_eve_charge > 20.5:
                :...voice_mail_plan = yes: no (20)
                    voice_mail_plan = no:
                    :...total_night_minutes > 174.2: yes (50/8)
                        total_night_minutes <= 174.2:
                        :...total_day_minutes <= 246.6: no (12)
                            total_day_minutes > 246.6:
                            :...total_day_charge <= 43.33: yes (4)
                                total_day_charge > 43.33: no (2)


Evaluation on training data (3333 cases):

	    Decision Tree   
	  ----------------  
	  Size      Errors  

	    27  136( 4.1%)   <<


	   (a)   (b)    <-classified as
	  ----  ----
	   365   118    (a): class yes
	    18  2832    (b): class no


	Attribute usage:

	100.00%	total_day_minutes
	 93.67%	number_customer_service_calls
	 87.73%	international_plan
	 20.73%	total_eve_charge
	  8.97%	voice_mail_plan
	  8.01%	total_intl_calls
	  6.48%	total_intl_minutes
	  6.33%	total_night_minutes
	  4.74%	total_eve_minutes
	  0.57%	total_eve_calls
	  0.18%	account_length
	  0.18%	total_day_charge


Time: 0.1 secs

> 
> ruleModel <- C5.0(churn ~ ., data = churnTrain, rules = TRUE)
> ruleModel

Call:
C5.0.formula(formula = churn ~ ., data = churnTrain, rules = TRUE)

Rule-Based Model
Number of samples: 3333 
Number of predictors: 19 

Number of Rules: 19 

Non-standard options: attempt to group attributes

> summary(ruleModel)

Call:
C5.0.formula(formula = churn ~ ., data = churnTrain, rules = TRUE)


C5.0 [Release 2.07 GPL Edition]  	Mon Jul  4 15:30:26 2016
-------------------------------

Class specified by attribute `outcome'

Read 3333 cases (20 attributes) from undefined.data

Rules:

Rule 1: (60, lift 6.8)
	international_plan = yes
	total_intl_calls <= 2
	->  class yes  [0.984]

Rule 2: (57, lift 6.8)
	international_plan = yes
	total_intl_minutes > 13.1
	->  class yes  [0.983]

Rule 3: (32, lift 6.7)
	total_day_minutes <= 120.5
	number_customer_service_calls > 3
	->  class yes  [0.971]

Rule 4: (79/3, lift 6.6)
	total_day_minutes <= 160.2
	total_eve_charge <= 19.83
	number_customer_service_calls > 3
	->  class yes  [0.951]

Rule 5: (43/2, lift 6.4)
	international_plan = no
	voice_mail_plan = no
	total_day_minutes > 246.6
	total_eve_charge > 20.5
	->  class yes  [0.933]

Rule 6: (28/2, lift 6.2)
	total_day_minutes <= 264.4
	total_eve_calls <= 125
	total_eve_charge <= 12.05
	number_customer_service_calls > 3
	->  class yes  [0.900]

Rule 7: (78/8, lift 6.1)
	voice_mail_plan = no
	total_day_minutes > 223.2
	total_eve_charge > 20.5
	total_night_minutes > 174.2
	->  class yes  [0.888]

Rule 8: (114/24, lift 5.4)
	voice_mail_plan = no
	total_day_minutes > 223.2
	total_eve_charge > 20.5
	->  class yes  [0.784]

Rule 9: (152/58, lift 4.3)
	total_day_minutes > 223.2
	total_eve_charge > 20.5
	->  class yes  [0.617]

Rule 10: (211/84, lift 4.1)
	total_day_minutes > 264.4
	->  class yes  [0.601]

Rule 11: (2221/60, lift 1.1)
	international_plan = no
	total_day_minutes <= 223.2
	number_customer_service_calls <= 3
	->  class no  [0.973]

Rule 12: (768/20, lift 1.1)
	international_plan = no
	voice_mail_plan = yes
	number_customer_service_calls <= 3
	->  class no  [0.973]

Rule 13: (140/5, lift 1.1)
	account_length <= 123
	total_eve_minutes <= 187.7
	total_night_minutes <= 151.9
	->  class no  [0.958]

Rule 14: (45/1, lift 1.1)
	international_plan = no
	voice_mail_plan = yes
	total_day_minutes > 264.4
	->  class no  [0.957]

Rule 15: (1972/87, lift 1.1)
	total_day_minutes <= 264.4
	total_intl_minutes <= 13.1
	total_intl_calls > 2
	number_customer_service_calls <= 3
	->  class no  [0.955]

Rule 16: (197/9, lift 1.1)
	total_day_minutes > 120.5
	total_day_minutes <= 160.2
	total_eve_charge > 19.83
	->  class no  [0.950]

Rule 17: (155/10, lift 1.1)
	voice_mail_plan = no
	total_day_minutes <= 277
	total_night_minutes <= 126.9
	->  class no  [0.930]

Rule 18: (1675/185, lift 1.0)
	total_day_minutes > 160.2
	total_day_minutes <= 264.4
	total_eve_charge > 12.05
	->  class no  [0.889]

Rule 19: (434/49, lift 1.0)
	total_eve_charge <= 12.26
	->  class no  [0.885]

Default class: no


Evaluation on training data (3333 cases):

	        Rules     
	  ----------------
	    No      Errors

	    19  146( 4.4%)   <<


	   (a)   (b)    <-classified as
	  ----  ----
	   371   112    (a): class yes
	    34  2816    (b): class no


	Attribute usage:

	 98.23%	total_day_minutes
	 84.61%	number_customer_service_calls
	 75.73%	international_plan
	 71.83%	total_eve_charge
	 60.97%	total_intl_calls
	 60.88%	total_intl_minutes
	 31.02%	voice_mail_plan
	 10.11%	total_night_minutes
	  4.20%	account_length
	  4.20%	total_eve_minutes
	  0.84%	total_eve_calls


Time: 0.1 secs

> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>