R Graphical Manual

Browse All

Last data update: 2014.03.03

R: The Fisher-EM algorithm

fem	R Documentation

The Fisher-EM algorithm

Description

The Fisher-EM algorithm is a subspace clustering method for high-dimensional data. It is based on the Gaussian Mixture Model and on the idea that the data lives in a common and low dimensional subspace. An EM-like algorithm estimates both the discriminative subspace and the parameters of the mixture model.

Usage

fem(Y,K=2:6,model='AkjB',method='reg',crit='icl',maxit=50,eps=1e-6,init='kmeans',
	nstart=25,Tinit=c(),kernel='',disp=F)

Arguments

`Y`	The data matrix. Categorical variables and missing values are not allowed.
`K`	An integer vector specifying the numbers of mixture components (clusters) among which the model selection criterion will choose the most appropriate number of groups. Default is 2:6.
`model`	A vector of discriminative latent mixture (DLM) models to fit. There are 12 different models: "DkBk", "DkB", "DBk", "DB", "AkjBk", "AkjB", "AkBk", "AkBk", "AjBk", "AjB", "ABk", "AB". The option "all" executes the Fisher-EM algorithm on the 12 DLM models and select the best model according to the maximum value obtained by model selection criterion.
`method`	The method use for the fitting of the projection matrix associated to the discriminative subspace. Three methods are available: 'svd', 'reg' and 'gs'. The 'reg' method is the default.
`crit`	The model selection criterion to use for selecting the most appropriate model for the data. There are 3 possibilities: "bic", "aic" or "icl". Default is "icl".
`maxit`	The maximum number of iterations before the stop of the Fisher-EM algorithm.
`eps`	The threshold value for the likelihood differences to stop the Fisher-EM algorithm.
`init`	The initialization method for the Fisher-EM algorithm. There are 4 options: "random" for a randomized initialization, "kmeans" for an initialization by the kmeans algorithm, "hclust" for hierarchical clustering initialization or "user" for a specific initialization through the parameter "Tinit". Default is "kmeans". Notice that for "kmeans" and "random", several initializations are asked and the initialization associated with the highest likelihood is kept (see "nstart").
`nstart`	The number of restart if the initialization is "kmeans" or "random". In such a case, the initialization associated with the highest likelihood is kept.
`Tinit`	A n x K matrix which contains posterior probabilities for initializing the algorithm (each line corresponds to an individual).
`kernel`	It enables to deal with the n < p problem. By default, no kernel (" ") is used. But the user has the choice between 3 options for the kernel: "linear", "sigmoid" or "rbf".
`disp`	If true, some messages are printed during the clustering. Default is false.

Value

A list is returned:

`K`	The number of groups.
`cls`	the group membership of each individual estimated by the Fisher-EM algorithm.
`P`	the posterior probabilities of each individual for each group.
`U`	The loading matrix which determines the orientation of the discriminative subspace.
`mean`	The estimated mean in the subspace.
`my`	The estimated mean in the observation space.
`prop`	The estimated mixture proportion.
`D`	The covariance matrices in the subspace.
`aic`	The value of the Akaike information criterion.
`bic`	The value of the Bayesian information criterion.
`icl`	The value of the integrated completed likelihood criterion.
`loglik`	The log-likelihood values computed at each iteration of the FEM algorithm.
`ll`	the log-likelihood value obtained at the last iteration of the FEM algorithm.
`method`	The method used.
`call`	The call of the function.
`plot`	Some information to pass to the plot.fem function.
`crit`	The model selction criterion used.

Author(s)

Charles Bouveyron and Camille Brunet

References

Charles Bouveyron and Camille Brunet (2012), Simultaneous model-based clustering and visualization in the Fisher discriminative subspace, Statistics and Computing, 22(1), 301-324.

Charles Bouveyron and Camille Brunet (2013), "Discriminative variable selection for clustering with the sparse Fisher-EM algorithm", Computational Statistics, to appear.

Examples

data(iris)
res = fem(iris[,-5],K=2:5,model='AkB')
summary(res)
plot(res)
fem.ari(res,as.numeric(iris[,5]))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(FisherEM)
Loading required package: MASS
Loading required package: elasticnet
Loading required package: lars
Loaded lars 1.2

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/FisherEM/fem.Rd_%03d_medium.png", width=480, height=480)
> ### Name: fem
> ### Title: The Fisher-EM algorithm
> ### Aliases: fem
> 
> ### ** Examples
> 
> data(iris)
> res = fem(iris[,-5],K=2:5,model='AkB')
> summary(res)
* Model: the chosen model is AkB with K = 5 ( icl = 2302.721 )
* Loading matrix:
                     U1         U2         U3
Sepal.Length -0.3078638 -0.2776052 -0.4979237
Sepal.Width  -0.4601867  0.5161905  0.6036463
Petal.Length  0.7217695 -0.2354164  0.4310403
Petal.Width   0.4153274  0.7752818 -0.4493188
> plot(res)
> fem.ari(res,as.numeric(iris[,5]))
[1] 0.735431
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>