REPPlab optimizes a projection pursuit (PP) index using a Genetic Algorithm
(GA) or one of two Particle Swarm Optimisation (PSO) algorithms over several
runs, implemented in the Java program EPP-lab. One of the PSO algorithms is
a classic one while the other one is a parameter-free extension called
Tribes. The parameters of the algorithms (maxiter and individuals for GA and
maxiter and particles for PSO) can be modified by the user. The PP indices
are the well-known Friedman and Friedman-Tukey indices together with the
kurtosis and a so-called discriminant index that is devoted to the detection
of groups. At each run, the function finds a local optimum of the PP index
and gives the associated projection direction and criterion value.
Matrix where each row is an observation and each column a
dimension.
PPindex
The used index, see details.
PPalg
The used algorithm, see details.
n.simu
Number of simulation runs.
sphere
Logical, sphere the data. Default is FALSE, in which
case the data is only standardized.
maxiter
Maximum number of iterations.
individuals
Size of the generated population in GA.
particles
Number of generated particles in the standard PSO
algorithm.
step_iter
Convergence criterium parameter, see details. (Default: 10)
eps
Convergence criterium parameter, see details. (Default: 10^(-6))
Details
The function always centers the data using colMeans and
divides by the standard deviation. Sphering the data is optional. If
sphering is requested the function WhitenSVD is used, which
automatically tries to determine the rank of the data.
Currently the function provides the following projection pursuit indices:
KurtosisMax, Discriminant, Friedman,
FriedmanTukey, KurtosisMin.
Three algorithms can be used to find the projection directions. These are a
Genetic Algorithm GA and two Particle Swarm Optimisation algorithms
PSO and Tribe.
Since the algorithms might find local optima they are run several times. The
function sorts the found directions according to the optimization criterion.
The different algorithms have different default settings. It is for GA:
maxiter=50 and individuals=20. For PSO: maxiter=20 and
particles=50. For Tribe: maxiter=20.
For GA, the size of the generated population is fixed by the user
(individuals). The algorithm is based on a tournament section of three
participants. It uses a 2-point crossover with a probability of 0.65 and
the mutation operator is applied to all the individuals with a probability
of 0.05. The termination criterion corresponds to the number of generations
and is also fixed by the user (maxiter).
For PSO, the user can give the number of initial generated particles and
also the maximum number of iterations. The other parameters are fixed
following Clerc (2006) and using a "cosine" neighborhood adapted to PP for
the PSO algorithm. For Tribes, only the maximum number of iterations needs
to be fixed. The algorithm proposed by Cooren and Clerc (2009) and adapted
to PP using a "cosine neighborhood" is used.
The algorithms stop as soon as one of the two following conditions holds:
the maximum number of iterations is reached or the relative difference
between the index value of the present iteration i and the value of
iteration i-step_iter is less than eps. In the last situation,
the algorithm is said to converge and EPPlab will return the number
of iterations needed to attain convergence. If the convergence is not
reached but the maximum number of iterations is attained, the function will
return some warnings. The default values are 10 for step_iter and
1E-06 for eps. Note that if few runs have not converged this
might not be problem and even non-converged projections might reveal some
structure.
Value
A list with class 'epplab' containing the following components:
PPdir
Matrix containing the PP directions as columns, see details.
PPindexVal
Vector containing the objective criterion value of each
run.
PPindex
Name of the used projection index.
PPiter
Vector containing the number of iterations of each run.
PPconv
Boolean vector. Is TRUE if the run converged and FALSE else.
PPalg
Name of the used algorithm.
maxiter
Maximum number of
iterations, as given in function call.
x
Matrix containing the data
(centered!).
sphere
Logical
transform
The transformation
matrix from the whitening or standardization step.
backtransform
The
back-transformation matrix from the whitening or standardization step.
center
The mean vector of the data
Author(s)
Daniel Fischer, Klaus Nordhausen
References
Larabi Marie-Sainte, S., (2011), Biologically inspired
algorithms for exploratory projection pursuit, PhD thesis, University of
Toulouse.
Ruiz-Gazen, A., Larabi Marie-Sainte, S. and Berro, A. (2010),
Detecting multivariate outliers using projection pursuit with particle swarm
optimization, COMPSTAT2010, pp. 89-98.
Berro, A., Larabi Marie-Sainte, S. and Ruiz-Gazen, A. (2010). Genetic
algorithms and particle swarm optimization for exploratory projection
pursuit. Annals of Mathematics and Artifcial Intelligence, 60, 153-178.
Larabi Marie-Sainte, S., Berro, A. and Ruiz-Gazen, A. (2010). An
effcient optimization method for revealing local optima of projection
pursuit indices. Swarm Intelligence, pp. 60-71.
Clerc, M. (2006). Particle Swarm Optimization. ISTE, Wiley.
Cooren, Y., Clerc, M. and Siarry, P. (2009). Performance evaluation of
TRIBES, an adaptive particle swarm optimization algorithm. Swarm
Intelligence, 3(2), 149-178.