Last data update: 2014.03.03

R: Control for C5.0 Models
C5.0ControlR Documentation

Control for C5.0 Models

Description

Various parameters that control aspects of the C5.0 fit.

Usage

C5.0Control(subset = TRUE, 
            bands = 0, 
            winnow = FALSE, 
            noGlobalPruning = FALSE, 
            CF = 0.25, 
            minCases = 2, 
            fuzzyThreshold = FALSE, 
            sample = 0, 
            seed = sample.int(4096, size = 1) - 1L,  
            earlyStopping = TRUE,
            label = "outcome")

Arguments

subset

A logical: should the model evaluate groups of discrete predictors for splits? Note: the C5.0 command line version defaults this parameter to FALSE, meaning no attempted gropings will be evaluated during the tree growing stage.

bands

An integer between 2 and 1000. If TRUE, the model orders the rules by their affect on the error rate and groups the rules into the specified number of bands. This modifies the output so that the effect on the error rate can be seen for the groups of rules within a band. If this options is selected and rules = FALSE, a warning is issued and rules is changed to TRUE.

winnow

A logical: should predictor winnowing (i.e feature selection) be used?

noGlobalPruning

A logical to toggle whether the final, global pruning step to simplify the tree.

CF

A number in (0, 1) for the confidence factor.

minCases

an integer for the smallest number of samples that must be put in at least two of the splits.

fuzzyThreshold

A logical toggle to evaluate possible advanced splits of the data. See Quinlan (1993) for details and examples.

sample

A value between (0, .999) that specifies the random proportion of the data should be used to train the model. By default, all the samples are used for model training. Samples not used for training are used to evaluate the accuracy of the model in the printed output.

seed

An integer for the random number seed within the C code.

earlyStopping

A logical to toggle whether the internal method for stopping boosting should be used.

label

A character label for the outcome used in the output.

Value

A list of options.

Author(s)

Original GPL C code by Ross Quinlan, R code and modifications to C by Max Kuhn, Steve Weston and Nathan Coulter

References

Quinlan R (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, http://www.rulequest.com/see5-unix.html

See Also

C5.0, predict.C5.0, summary.C5.0, C5imp

Examples

data(churn)

treeModel <- C5.0(x = churnTrain[, -20], y = churnTrain$churn,
                  control = C5.0Control(winnow = TRUE))
summary(treeModel)


Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(C50)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/C50/C5.0Control.Rd_%03d_medium.png", width=480, height=480)
> ### Name: C5.0Control
> ### Title: Control for C5.0 Models
> ### Aliases: C5.0Control
> ### Keywords: models
> 
> ### ** Examples
> 
> data(churn)
> 
> treeModel <- C5.0(x = churnTrain[, -20], y = churnTrain$churn,
+                   control = C5.0Control(winnow = TRUE))
> summary(treeModel)

Call:
C5.0.default(x = churnTrain[, -20], y = churnTrain$churn, control
 = C5.0Control(winnow = TRUE))


C5.0 [Release 2.07 GPL Edition]  	Mon Jul  4 15:30:21 2016
-------------------------------

Class specified by attribute `outcome'

Read 3333 cases (20 attributes) from undefined.data

4 attributes winnowed
Estimated importance of remaining attributes:

     51%  total_day_minutes
     40%  international_plan
     32%  total_eve_charge
     25%  voice_mail_plan
     22%  number_customer_service_calls
     20%  total_intl_calls
     18%  total_intl_minutes
     16%  total_day_charge
      9%  total_eve_minutes
     <1%  state
     <1%  account_length
     <1%  area_code
     <1%  total_eve_calls
     <1%  total_night_minutes
     <1%  total_night_calls

Decision tree:

total_day_minutes > 264.4:
:...voice_mail_plan = yes:
:   :...international_plan = no: no (45/1)
:   :   international_plan = yes: yes (8/3)
:   voice_mail_plan = no:
:   :...total_eve_minutes > 187.7:
:       :...total_night_minutes > 126.9: yes (94/1)
:       :   total_night_minutes <= 126.9:
:       :   :...total_day_minutes <= 277: no (4)
:       :       total_day_minutes > 277: yes (3)
:       total_eve_minutes <= 187.7:
:       :...total_eve_charge <= 12.26: no (15/1)
:           total_eve_charge > 12.26:
:           :...total_day_minutes <= 277:
:               :...total_night_minutes <= 224.8: no (13)
:               :   total_night_minutes > 224.8: yes (5/1)
:               total_day_minutes > 277:
:               :...total_night_minutes > 151.9: yes (18)
:                   total_night_minutes <= 151.9:
:                   :...account_length <= 123: no (4)
:                       account_length > 123: yes (2)
total_day_minutes <= 264.4:
:...number_customer_service_calls > 3:
    :...total_day_minutes <= 160.2:
    :   :...total_eve_charge <= 19.83: yes (79/3)
    :   :   total_eve_charge > 19.83:
    :   :   :...total_day_minutes <= 120.5: yes (10)
    :   :       total_day_minutes > 120.5: no (13/3)
    :   total_day_minutes > 160.2:
    :   :...total_eve_charge > 12.05: no (130/24)
    :       total_eve_charge <= 12.05:
    :       :...total_eve_calls <= 125: yes (16/2)
    :           total_eve_calls > 125: no (3)
    number_customer_service_calls <= 3:
    :...international_plan = yes:
        :...total_intl_calls <= 2: yes (51)
        :   total_intl_calls > 2:
        :   :...total_intl_minutes <= 13.1: no (173/7)
        :       total_intl_minutes > 13.1: yes (43)
        international_plan = no:
        :...total_day_minutes <= 223.2: no (2221/60)
            total_day_minutes > 223.2:
            :...total_eve_charge <= 20.5: no (295/22)
                total_eve_charge > 20.5:
                :...voice_mail_plan = yes: no (20)
                    voice_mail_plan = no:
                    :...total_night_minutes > 174.2: yes (50/8)
                        total_night_minutes <= 174.2:
                        :...total_day_minutes <= 246.6: no (12)
                            total_day_minutes > 246.6:
                            :...total_day_charge <= 43.33: yes (4)
                                total_day_charge > 43.33: no (2)


Evaluation on training data (3333 cases):

	    Decision Tree   
	  ----------------  
	  Size      Errors  

	    27  136( 4.1%)   <<


	   (a)   (b)    <-classified as
	  ----  ----
	   365   118    (a): class yes
	    18  2832    (b): class no


	Attribute usage:

	100.00%	total_day_minutes
	 93.67%	number_customer_service_calls
	 87.73%	international_plan
	 20.73%	total_eve_charge
	  8.97%	voice_mail_plan
	  8.01%	total_intl_calls
	  6.48%	total_intl_minutes
	  6.33%	total_night_minutes
	  4.74%	total_eve_minutes
	  0.57%	total_eve_calls
	  0.18%	account_length
	  0.18%	total_day_charge


Time: 0.1 secs

> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>