Last data update: 2014.03.03

R: Function for Exploratory Projection Pursuit.
EPPlabR Documentation

Function for Exploratory Projection Pursuit.

Description

REPPlab optimizes a projection pursuit (PP) index using a Genetic Algorithm (GA) or one of two Particle Swarm Optimisation (PSO) algorithms over several runs, implemented in the Java program EPP-lab. One of the PSO algorithms is a classic one while the other one is a parameter-free extension called Tribes. The parameters of the algorithms (maxiter and individuals for GA and maxiter and particles for PSO) can be modified by the user. The PP indices are the well-known Friedman and Friedman-Tukey indices together with the kurtosis and a so-called discriminant index that is devoted to the detection of groups. At each run, the function finds a local optimum of the PP index and gives the associated projection direction and criterion value.

Usage

EPPlab(x, PPindex = "KurtosisMax", PPalg = "GA", n.simu = 20,
  sphere = FALSE, maxiter = NULL, individuals = NULL, particles = NULL,
  step_iter = 10, eps = 10^(-6))

Arguments

x

Matrix where each row is an observation and each column a dimension.

PPindex

The used index, see details.

PPalg

The used algorithm, see details.

n.simu

Number of simulation runs.

sphere

Logical, sphere the data. Default is FALSE, in which case the data is only standardized.

maxiter

Maximum number of iterations.

individuals

Size of the generated population in GA.

particles

Number of generated particles in the standard PSO algorithm.

step_iter

Convergence criterium parameter, see details. (Default: 10)

eps

Convergence criterium parameter, see details. (Default: 10^(-6))

Details

The function always centers the data using colMeans and divides by the standard deviation. Sphering the data is optional. If sphering is requested the function WhitenSVD is used, which automatically tries to determine the rank of the data.

Currently the function provides the following projection pursuit indices: KurtosisMax, Discriminant, Friedman, FriedmanTukey, KurtosisMin.

Three algorithms can be used to find the projection directions. These are a Genetic Algorithm GA and two Particle Swarm Optimisation algorithms PSO and Tribe.

Since the algorithms might find local optima they are run several times. The function sorts the found directions according to the optimization criterion.

The different algorithms have different default settings. It is for GA: maxiter=50 and individuals=20. For PSO: maxiter=20 and particles=50. For Tribe: maxiter=20.

For GA, the size of the generated population is fixed by the user (individuals). The algorithm is based on a tournament section of three participants. It uses a 2-point crossover with a probability of 0.65 and the mutation operator is applied to all the individuals with a probability of 0.05. The termination criterion corresponds to the number of generations and is also fixed by the user (maxiter).

For PSO, the user can give the number of initial generated particles and also the maximum number of iterations. The other parameters are fixed following Clerc (2006) and using a "cosine" neighborhood adapted to PP for the PSO algorithm. For Tribes, only the maximum number of iterations needs to be fixed. The algorithm proposed by Cooren and Clerc (2009) and adapted to PP using a "cosine neighborhood" is used.

The algorithms stop as soon as one of the two following conditions holds: the maximum number of iterations is reached or the relative difference between the index value of the present iteration i and the value of iteration i-step_iter is less than eps. In the last situation, the algorithm is said to converge and EPPlab will return the number of iterations needed to attain convergence. If the convergence is not reached but the maximum number of iterations is attained, the function will return some warnings. The default values are 10 for step_iter and 1E-06 for eps. Note that if few runs have not converged this might not be problem and even non-converged projections might reveal some structure.

Value

A list with class 'epplab' containing the following components:

PPdir

Matrix containing the PP directions as columns, see details.

PPindexVal

Vector containing the objective criterion value of each run.

PPindex

Name of the used projection index.

PPiter

Vector containing the number of iterations of each run.

PPconv

Boolean vector. Is TRUE if the run converged and FALSE else.

PPalg

Name of the used algorithm.

maxiter

Maximum number of iterations, as given in function call.

x

Matrix containing the data (centered!).

sphere

Logical

transform

The transformation matrix from the whitening or standardization step.

backtransform

The back-transformation matrix from the whitening or standardization step.

center

The mean vector of the data

Author(s)

Daniel Fischer, Klaus Nordhausen

References

Larabi Marie-Sainte, S., (2011), Biologically inspired algorithms for exploratory projection pursuit, PhD thesis, University of Toulouse.

Ruiz-Gazen, A., Larabi Marie-Sainte, S. and Berro, A. (2010), Detecting multivariate outliers using projection pursuit with particle swarm optimization, COMPSTAT2010, pp. 89-98.

Berro, A., Larabi Marie-Sainte, S. and Ruiz-Gazen, A. (2010). Genetic algorithms and particle swarm optimization for exploratory projection pursuit. Annals of Mathematics and Artifcial Intelligence, 60, 153-178.

Larabi Marie-Sainte, S., Berro, A. and Ruiz-Gazen, A. (2010). An effcient optimization method for revealing local optima of projection pursuit indices. Swarm Intelligence, pp. 60-71.

Clerc, M. (2006). Particle Swarm Optimization. ISTE, Wiley.

Cooren, Y., Clerc, M. and Siarry, P. (2009). Performance evaluation of TRIBES, an adaptive particle swarm optimization algorithm. Swarm Intelligence, 3(2), 149-178.

Examples


  library(tourr)
  data(olive)
  olivePP <- EPPlab(olive[,3:10],PPalg="PSO",PPindex="KurtosisMax",n.simu=5, maxiter=20)
  summary(olivePP)

  library(amap)
  data(lubisch)
  X <- lubisch[1:70,2:7]
  rownames(X) <- lubisch[1:70,1]
  res <- EPPlab(X,PPalg="PSO",PPindex="FriedmanTukey",n.simu=15, maxiter=20,sphere=TRUE)
  print(res)
  summary(res)
  fitted(res)
  plot(res)
  pairs(res)
  predict(res,data=lubisch[71:74,2:7])

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(REPPlab)
Loading required package: rJava
Loading required package: lattice
Loading required package: LDRTools
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/REPPlab/EPPlab.Rd_%03d_medium.png", width=480, height=480)
> ### Name: EPPlab
> ### Title: Function for Exploratory Projection Pursuit.
> ### Aliases: EPPlab
> 
> ### ** Examples
> 
> 
>   library(tourr)
>   data(olive)
>   olivePP <- EPPlab(olive[,3:10],PPalg="PSO",PPindex="KurtosisMax",n.simu=5, maxiter=20)

Simulation 0... finished (I 15189.569153928 in 0.634s)
Simulation 1... finished (I 15114.617490739 in 0.172s)
Simulation 2... finished (I 13756.862018574 in 0.055s)
Simulation 3... finished (I 16410.346776275 in 0.024s)
Simulation 4... finished (I 14239.975449086 in 0.030s)
Warning message:
In EpplabOutputConv(jepplab, maxiter) :
  There were 5 non-converged simulation runs!
>   summary(olivePP)
REPPlab Summary
---------------
Index name       : KurtosisMax 
Index values     : 16410.35 15189.57 15114.62 14239.98 13756.86 
Algorithm used   : PSO 
Sphered          : FALSE 
Iterations       : 20 20 20 20 20 
> 
>   library(amap)
>   data(lubisch)
>   X <- lubisch[1:70,2:7]
>   rownames(X) <- lubisch[1:70,1]
>   res <- EPPlab(X,PPalg="PSO",PPindex="FriedmanTukey",n.simu=15, maxiter=20,sphere=TRUE)
Simulation 0... finished (I 0.165513798 in 0.865s)
Simulation 1... finished (I 0.163505029 in 0.322s)
Simulation 2... finished (I 0.162539879 in 0.260s)
Simulation 3... finished (I 0.165447055 in 0.116s)
Simulation 4... finished (I 0.165639215 in 0.116s)
Simulation 5... finished (I 0.165278667 in 0.112s)
Simulation 6... finished (I 0.163131165 in 0.112s)
Simulation 7... finished (I 0.162238723 in 0.116s)
Simulation 8... finished (I 0.167045426 in 0.111s)
Simulation 9... finished (I 0.164780114 in 0.115s)
Simulation 10... finished (I 0.167535400 in 0.113s)
Simulation 11... finished (I 0.164259526 in 0.112s)
Simulation 12... finished (I 0.163869846 in 0.112s)
Simulation 13... finished (I 0.166419372 in 0.116s)
Simulation 14... finished (I 0.160963824 in 0.111s)
Warning message:
In EpplabOutputConv(jepplab, maxiter) :
  There were 14 non-converged simulation runs!
>   print(res)
$PPindex
[1] "FriedmanTukey"

$PPindexVal
[1] 0.1656392

$PPalg
[1] "PSO"

$PPdir
[1] -0.039150997 -0.046203341  0.319818005  0.002467755  0.333588206
[6] -0.087238589

$PPiter
[1] 20

>   summary(res)
REPPlab Summary
---------------
Index name       : FriedmanTukey 
Index values     : 0.1656392 0.1675354 0.1670454 0.1664194 0.1655138 0.1654471 0.1652787 0.1647801 0.1642595 0.1638698 
Algorithm used   : PSO 
Sphered          : TRUE 
Iterations       : 20 20 20 20 20 20 20 20 20 20 
>   fitted(res)
            Run1
1a  -0.018891739
2a  -1.643868063
3a  -1.142206322
4a   0.840637347
5a  -0.748750714
6a   0.288039954
7a   0.472080315
8a  -1.244660109
9a  -1.068400600
10a  0.663086851
11a -1.107192105
12a -0.400017396
13a -2.199377832
14a -0.236753580
15a -0.688195497
16a  0.035007501
17a -0.691959115
18a -0.642089058
19a -1.790132676
20a -1.129385168
21a -0.043893281
1b  -0.199024211
2b  -0.311993299
3b  -1.190462210
4b  -0.584708066
5b   0.982169192
6b   0.579343702
7b   0.578316466
8b  -0.064154657
9b   0.263828018
10b -0.929621440
11b  0.024166082
12b  0.151679918
13b  0.113077533
14b  1.980853273
15b -1.135944951
16b  0.454289869
17b -0.568073668
18b -0.772097012
19b  0.916804878
20b -1.082957169
21b  0.291479800
22b -0.003013495
1c   1.343622310
2c  -0.613247521
3c   0.690361402
4c  -0.187733480
5c   0.899265622
6c   0.311338834
7c   0.747378264
8c   0.307394096
9c  -0.366157328
10c -0.722875888
11c -0.893040216
12c -0.540723448
13c  0.942161971
14c  0.510446202
15c  0.568970979
16c  0.323942523
17c  0.732938947
18c -0.775486552
19c  0.251949422
20c  0.832223911
22c  0.069587806
23c -0.254553575
24c  0.652597511
25c  3.354661203
26c  3.600905925
27c  0.991263157
28c  0.225770657
>   plot(res)
>   pairs(res)
>   predict(res,data=lubisch[71:74,2:7])
         [,1]
71 1.58493576
72 1.11605759
73 0.09274319
74 0.19226996
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>