Last data update: 2014.03.03

R: Non-hierarchical evolutionary multi-objective optimization...
NHEMO_CutoffR Documentation

Non-hierarchical evolutionary multi-objective optimization with local cutoff optimization

Description

NHEMO_Cutoff performs cost-sensitive classification by solving the non-hierarchical evolutionary two-objective optimization problem of minimizing misclassification rate and minimizing total costs for classification. NHEMO_Cutoff is based on an EMOA with tree representation and local cutoff optimization. Cutoffs of the tree learner are optimized analogous to a classification tree with recursive partitioning either based on the Gini index or the misclassification rate.

Usage

NHEMO_Cutoff(data, CostMatrix, 
             gens = 50, popsize = 50, max_nodes = 10, 
             ngens = 14, bound = 10^-10, 
             init_prob = 0.8, 
             ps = c("tournament", "roulette", "winkler"), tournament_size = 4, 
             crossover = c("standard", "brood", "poli"), brood_size = 4, 
             crossover_prob = 0.5, mutation_prob = 0.5, 
             CV = 5, vim = 0, 
             ncutoffs = 10, opt = c("gini", "mcr"))

Arguments

data

A data frame containing in the first column the class for each observation and in the other columns from which variables specified in formula are preferentially to be taken.

CostMatrix

A data frame containing the names (first column) and the costs (second column) for each explanatory variable in formula or x. NHEMOtree does not work with missing data in CostMatrix.

gens

Maximal number of generations of the EMOA (default: gens=50).

popsize

Population size in each generation of the EMOA (default: popsize=50).

max_nodes

Maximal number of nodes within each tree (default: max_nodes=10).

ngens

Preceeding generations for the Online Convergence Detection (OCD, default: ngens=14) (see below for details).

bound

Variance limit for the Online convergence detection (default: bound=10^10).

init_prob

Degree of initilization in [0,1], i.e. the probability of a node having a subnode (default: init_prob=0.80).

ps

Type of parent selection, "tournament" for tournament selection (default), "roulette" for roulette-wheel selection, and "winkler" for random selection of the first parent and the second parent by roulette-wheel selection.

tournament_size

Size of tournament for ps="tournament" (default: tournament_size=4).

crossover

Crossover operator, "standard" for one point crossover swapping two randomly chosen subtrees of the parents, "brood" for brood crossover (for details see Tackett (1994)), "poli" for a size-dependent crossover with the crossover point from the so called common region of both parents (for details see Poli and Langdon (1998)).

brood_size

Number of offspring created by brood crossover (default: brood_size=4).

crossover_prob

Probability to perform crossover in [0,1] (default: crossover_prob=0.50).

mutation_prob

Probability to perform mutation in [0,1] (default: mutation_prob=0.50).

CV

Cross validation steps as natural number bigger than 1 (default: CV=5).

vim

Variable importance measure to be used to improve standard crossover. vim=0 for no variable improtance measure (default), vim=1 for 'simple absolute frequency', vim=2 for 'simple relative frequency', vim=3 for 'relative frequency', vim=4 for 'linear weigthed relative frequency', vim=5 for 'exponential weigthed relative frequency', and vim=6 for 'permutation accuracy importance'.

ncutoffs

Number of cutoffs per explanatory variable to be tested for optimality (default: ncutoffs=10).

opt

Type of local cutoff optimization, "gini" for local cutoff optimization by Gini index (default), "mcr" for local cutoff optimization by misclassification rate.

Details

The non-hierarchical evolutionary multi-objective tree learner (NHEMOtree) with local cutoff optimization solves a two-objective optimization problem with regard to minimizing misclassification rate and minimizing total costs for classification (summarized costs for all used variables in the classifier). It is based on an EMOA with tree representation. It optimizes both objectives simultaneously without any hierarchy and generates Pareto-optimal classifiers being binary trees to solve the problem. Cutoffs of the tree learner are optimized analogous to a classification tree with recursive partitioning either based on the Gini index or the misclassification rate.

Termination criteria of NHEMO_Cutoff are the maximal amount of generations and the Online Convergence Detection (OCD) proposed by Wagner and Trautmann (2010). Here, OCD uses the dominated hypervolume as quality criterion. If its variance over the last g generations is significantly below a given threshold L according to the one-sided χ^2-variance test OCD stops the run. We followed the suggestion of Wagner and Trautmann (2010) and considered their parameter settings as default values.

Missing data in the grouping variable or the explanatory variables are excluded from the analysis automatically. NHEMO_Cutoff does not work with missing data in "CostMatrix". Costs of all explanatory variables set to 1 results in optimizing the amount of explanatory variables in the tree learner as second objective.

Author(s)

Swaantje Casjens

References

R. Poli and W.B. Langdon. Schema theory for genetic programming with one-point crossover and point mutation. Evolutionary Computation, 6(3):231-252, 1998a.

W.A. Tackett. Recombination, selection und the genetic construction of computer programs. PhD thesis, University of Southern California, 1994.

T. Wagner and H. Trautmann. Online convergence detection for evolutionary multiobjective algorithms revisited. In: IEEE Congress on Evolutionary Computation, 1-8, 2010.

See Also

NHEMOtree

Examples

# Simulation of data and costs
  d         <- Sim_Data(Obs=200)
  CostMatrix<- Sim_Costs()

# NHEMO_Cutoff calculations with function NHEMOtree and type="NHEMO_Cutoff"
  res<- NHEMOtree(method="NHEMO_Cutoff", formula=Y2~., data=d, CostMatrix=CostMatrix, 
                  gens=5, popsize=5,
                  max_nodes=5, ngens=5, bound=10^-10, init_prob=0.8, 
                  ps="tournament", tournament_size=4, crossover="standard", 
                  crossover_prob=0.1, mutation_prob=0.1, 
                  CV=5, vim=1,
                  ncutoffs=5, opt="mcr")
  res

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(NHEMOtree)
Loading required package: partykit
Loading required package: grid
Loading required package: emoa
Loading required package: sets
Loading required package: rpart
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/NHEMOtree/NHEMO_Cutoff.Rd_%03d_medium.png", width=480, height=480)
> ### Name: NHEMO_Cutoff
> ### Title: Non-hierarchical evolutionary multi-objective optimization with
> ###   local cutoff optimization
> ### Aliases: NHEMO_Cutoff
> ### Keywords: Non-hierarchical evolutionary multi-objective tree learner
> ###   Multi-objective optimization Evolutionary algorithms Classification
> 
> ### ** Examples
> 
> # Simulation of data and costs
>   d         <- Sim_Data(Obs=200)
>   CostMatrix<- Sim_Costs()
> 
> # NHEMO_Cutoff calculations with function NHEMOtree and type="NHEMO_Cutoff"
>   res<- NHEMOtree(method="NHEMO_Cutoff", formula=Y2~., data=d, CostMatrix=CostMatrix, 
+                   gens=5, popsize=5,
+                   max_nodes=5, ngens=5, bound=10^-10, init_prob=0.8, 
+                   ps="tournament", tournament_size=4, crossover="standard", 
+                   crossover_prob=0.1, mutation_prob=0.1, 
+                   CV=5, vim=1,
+                   ncutoffs=5, opt="mcr")
>   res
S metric:                     5857.598 
Misclassification (min/max):  23.25246 48.90833 
Costs (min/max):              17.85714 46.42857 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>