R: The funFEM algorithm for the clustering of functional data.
funFEM
R Documentation
The funFEM algorithm for the clustering of functional data.
Description
The funFEM algorithm allows to cluster time series or, more generally, functional data. It is based on a discriminative functional mixture model which allows the clustering of the data in a unique and discriminative functional subspace. This model presents the advantage to be parsimonious and can therefore handle long time series.
a functional data object produced by the fda package.
K
an integer vector specifying the numbers of mixture components (clusters) among which the model selection criterion will choose the most appropriate number of groups. Default is 2:6.
model
a vector of discriminative latent mixture (DLM) models to fit. There are 12 different models: "DkBk", "DkB", "DBk", "DB", "AkjBk", "AkjB", "AkBk", "AkBk", "AjBk", "AjB", "ABk", "AB". The option "all" executes the funFEM algorithm on the 12 models and select the best model according to the maximum value obtained by model selection criterion.
crit
the criterion to be used for model selection ('bic', 'aic' or 'icl'). 'bic' is the default.
init
the initialization type ('random', 'kmeans' of 'hclust'). 'hclust' is the default.
Tinit
a n x K matrix which contains posterior probabilities for initializing the algorithm (each line corresponds to an individual).
maxit
the maximum number of iterations before the stop of the Fisher-EM algorithm.
eps
the threshold value for the likelihood differences to stop the Fisher-EM algorithm.
disp
if true, some messages are printed during the clustering. Default is false.
lambda
the l0 penalty (between 0 and 1) for the sparse version. See (Bouveyron et al., 2014) for details. Default is 0.
graph
if true, it plots the evolution of the log-likelhood. Default is false.
Value
A list is returned:
model
the model name.
K
the number of groups.
cls
the group membership of each individual estimated by the Fisher-EM algorithm.
P
the posterior probabilities of each individual for each group.
prms
the model parameters.
U
the orientation of the functional subspace according to the basis functions.
aic
the value of the Akaike information criterion.
bic
the value of the Bayesian information criterion.
icl
the value of the integrated completed likelihood criterion.
loglik
the log-likelihood values computed at each iteration of the FEM algorithm.
ll
the log-likelihood value obtained at the last iteration of the FEM algorithm.
nbprm
the number of free parameters in the model.
call
the call of the function.
plot
some information to pass to the plot.fem function.
crit
the model selction criterion used.
Author(s)
Charles Bouveyron
References
C. Bouveyron, E. Côme and J. Jacques, The discriminative functional mixture model for the analysis of bike sharing systems, Preprint HAL n.01024186, University Paris Descartes, 2014.
Examples
# Clustering the well-known "Canadian temperature" data (Ramsay & Silverman)
basis <- create.bspline.basis(c(0, 365), nbasis=21, norder=4)
fdobj <- smooth.basis(day.5, CanadianWeather$dailyAv[,,"Temperature.C"],basis,
fdnames=list("Day", "Station", "Deg C"))$fd
res = funFEM(fdobj,K=4)
# Visualization of the partition and the group means
par(mfrow=c(1,2))
plot(fdobj,col=res$cls,lwd=2,lty=1)
fdmeans = fdobj; fdmeans$coefs = t(res$prms$my)
plot(fdmeans,col=1:max(res$cls),lwd=2)
## DO NOT RUN
# # Load the velib data and smoothing
# data(velib)
# basis<- create.fourier.basis(c(0, 181), nbasis=25)
# fdobj <- smooth.basis(1:181,t(velib$data),basis)$fd
#
# # Clustrering with FunFEM
# res = funFEM(fdobj,K=6,model='AkjBk',init='kmeans',lambda=0,disp=TRUE)
#
# # Visualization of group means
# fdmeans = fdobj; fdmeans$coefs = t(res$prms$my)
# plot(fdmeans,col=1:res$K,xaxt='n',lwd=2)
# axis(1,at=seq(5,181,6),labels=velib$dates[seq(5,181,6)],las=2)
#
# # Choice of K
# res = funFEM(fdobj,K=2:20,model='AkjBk',init='kmeans',lambda=0,disp=TRUE)
# plot(2:20,res$plot$bic,type='b',xlab='K',main='BIC')
#
# # Computation of the closest stations from the group means
# par(mfrow=c(3,2))
# for (i in 1:res$K) {
# matplot(t(velib$data[which.max(res$P[,i]),]),type='l',lty=i,col=i,xaxt='n',
# lwd=2,ylim=c(0,1))
# axis(1,at=seq(5,181,6),labels=velib$dates[seq(5,181,6)],las=2)
# title(main=paste('Cluster',i,' - ',velib$names[which.max(res$P[,i])]))
# }
#
# # Visualization in the discriminative subspace (projected scores)
# par(mfrow=c(1,1))
# plot(t(fdobj$coefs)
# text(t(fdobj$coefs)
#
# # Spatial visualization of the clustering (with library ggmap)
# library(ggmap)
# Mymap = get_map(location = 'Paris', zoom = 12, maptype = 'terrain')
# ggmap(Mymap) + geom_point(data=velib$position,aes(longitude,latitude),
# colour = I(res$cl), size = I(3))
#
# # FunFEM clustering with sparsity
# res2 = funFEM(fdobj,K=res$K,model='AkjBk',init='user',Tinit=res$P,
# lambda=0.01,disp=TRUE)
#
# # Visualization of group means and the selected functional bases
# split.screen(c(2,1))
# fdmeans = fdobj; fdmeans$coefs = t(res2$prms$my)
# screen(1); plot(fdmeans,col=1:res2$K,xaxt='n',lwd=2); axis(1,at=seq(5,181,6),
# labels=velib$dates[seq(5,181,6)],las=2)
# basis$dropind = which(rowSums(abs(res2$U))==0)
# screen(2); plot(basis,col=1,lty=1,xaxt='n',xlab='Disc. basis functions')
# axis(1,at=seq(5,181,6),labels=velib$dates[seq(5,181,6)],las=2)
# close.screen(all=TRUE)