R Graphical Manual

Browse All

Last data update: 2014.03.03

R: The funFEM algorithm for the clustering of functional data.

funFEM

R Documentation

The funFEM algorithm for the clustering of functional data.

Description

The funFEM algorithm allows to cluster time series or, more generally, functional data. It is based on a discriminative functional mixture model which allows the clustering of the data in a unique and discriminative functional subspace. This model presents the advantage to be parsimonious and can therefore handle long time series.

Usage

funFEM(fd, K=2:6, model = "AkjBk", crit = "bic", init = "hclust", Tinit = c(), maxit = 50,
  eps = 1e-06, disp = FALSE, lambda = 0, graph = FALSE)

Arguments

`fd`	a functional data object produced by the fda package.
`K`	an integer vector specifying the numbers of mixture components (clusters) among which the model selection criterion will choose the most appropriate number of groups. Default is 2:6.
`model`	a vector of discriminative latent mixture (DLM) models to fit. There are 12 different models: "DkBk", "DkB", "DBk", "DB", "AkjBk", "AkjB", "AkBk", "AkBk", "AjBk", "AjB", "ABk", "AB". The option "all" executes the funFEM algorithm on the 12 models and select the best model according to the maximum value obtained by model selection criterion.
`crit`	the criterion to be used for model selection ('bic', 'aic' or 'icl'). 'bic' is the default.
`init`	the initialization type ('random', 'kmeans' of 'hclust'). 'hclust' is the default.
`Tinit`	a n x K matrix which contains posterior probabilities for initializing the algorithm (each line corresponds to an individual).
`maxit`	the maximum number of iterations before the stop of the Fisher-EM algorithm.
`eps`	the threshold value for the likelihood differences to stop the Fisher-EM algorithm.
`disp`	if true, some messages are printed during the clustering. Default is false.
`lambda`	the l0 penalty (between 0 and 1) for the sparse version. See (Bouveyron et al., 2014) for details. Default is 0.
`graph`	if true, it plots the evolution of the log-likelhood. Default is false.

Value

A list is returned:

`model`	the model name.
`K`	the number of groups.
`cls`	the group membership of each individual estimated by the Fisher-EM algorithm.
`P`	the posterior probabilities of each individual for each group.
`prms`	the model parameters.
`U`	the orientation of the functional subspace according to the basis functions.
`aic`	the value of the Akaike information criterion.
`bic`	the value of the Bayesian information criterion.
`icl`	the value of the integrated completed likelihood criterion.
`loglik`	the log-likelihood values computed at each iteration of the FEM algorithm.
`ll`	the log-likelihood value obtained at the last iteration of the FEM algorithm.
`nbprm`	the number of free parameters in the model.
`call`	the call of the function.
`plot`	some information to pass to the plot.fem function.
`crit`	the model selction criterion used.

Author(s)

Charles Bouveyron

References

C. Bouveyron, E. CÃ´me and J. Jacques, The discriminative functional mixture model for the analysis of bike sharing systems, Preprint HAL n.01024186, University Paris Descartes, 2014.

Examples

# Clustering the well-known "Canadian temperature" data (Ramsay & Silverman)
basis <- create.bspline.basis(c(0, 365), nbasis=21, norder=4)
fdobj <- smooth.basis(day.5, CanadianWeather$dailyAv[,,"Temperature.C"],basis,
        fdnames=list("Day", "Station", "Deg C"))$fd
res = funFEM(fdobj,K=4)

# Visualization of the partition and the group means
par(mfrow=c(1,2))
plot(fdobj,col=res$cls,lwd=2,lty=1)
fdmeans = fdobj; fdmeans$coefs = t(res$prms$my)
plot(fdmeans,col=1:max(res$cls),lwd=2)

## DO NOT RUN
# # Load the velib data and smoothing
# data(velib)
# basis<- create.fourier.basis(c(0, 181), nbasis=25)
# fdobj <- smooth.basis(1:181,t(velib$data),basis)$fd
# 
# # Clustrering with FunFEM
# res = funFEM(fdobj,K=6,model='AkjBk',init='kmeans',lambda=0,disp=TRUE)
# 
# # Visualization of group means
# fdmeans = fdobj; fdmeans$coefs = t(res$prms$my)
# plot(fdmeans,col=1:res$K,xaxt='n',lwd=2)
# axis(1,at=seq(5,181,6),labels=velib$dates[seq(5,181,6)],las=2)
# 
# # Choice of K
# res = funFEM(fdobj,K=2:20,model='AkjBk',init='kmeans',lambda=0,disp=TRUE)
# plot(2:20,res$plot$bic,type='b',xlab='K',main='BIC')
# 
# # Computation of the closest stations from the group means
# par(mfrow=c(3,2))
# for (i in 1:res$K) {
#   matplot(t(velib$data[which.max(res$P[,i]),]),type='l',lty=i,col=i,xaxt='n',
#      lwd=2,ylim=c(0,1))
#   axis(1,at=seq(5,181,6),labels=velib$dates[seq(5,181,6)],las=2)
#   title(main=paste('Cluster',i,' - ',velib$names[which.max(res$P[,i])]))
# }
# 
# # Visualization in the discriminative subspace (projected scores)
# par(mfrow=c(1,1))
# plot(t(fdobj$coefs)
# text(t(fdobj$coefs)
# 
# # Spatial visualization of the clustering (with library ggmap)
# library(ggmap)
# Mymap = get_map(location = 'Paris', zoom = 12, maptype = 'terrain')
# ggmap(Mymap) + geom_point(data=velib$position,aes(longitude,latitude),
#     colour = I(res$cl), size = I(3))
# 
# # FunFEM clustering with sparsity
# res2 = funFEM(fdobj,K=res$K,model='AkjBk',init='user',Tinit=res$P,
#     lambda=0.01,disp=TRUE)
# 
# # Visualization of group means and the selected functional bases
# split.screen(c(2,1))
# fdmeans = fdobj; fdmeans$coefs = t(res2$prms$my)
# screen(1); plot(fdmeans,col=1:res2$K,xaxt='n',lwd=2); axis(1,at=seq(5,181,6),
#       labels=velib$dates[seq(5,181,6)],las=2)
# basis$dropind = which(rowSums(abs(res2$U))==0)
# screen(2); plot(basis,col=1,lty=1,xaxt='n',xlab='Disc. basis functions')
# axis(1,at=seq(5,181,6),labels=velib$dates[seq(5,181,6)],las=2)
# close.screen(all=TRUE)