Implements methods for processing a sample of (hard) clusterings, e.g. the MCMC output of a Bayesian clustering model.
Among them are methods that find a single best clustering to represent the sample, which are based on the posterior
similarity matrix or a relabelling algorithm.
Details
Package:
mcclust
Type:
Package
Version:
1.0
Date:
2009-03-12
License:
GPL (>= 2)
LazyLoad:
yes
Most important functions:
comp.psm for computing posterior similarity matrix (PSM). Based on the PSM maxpear and minbinder provide
several optimization methods to find a clustering with maximal posterior expected adjusted Rand index with the true clustering or
one that minimizes the posterior expectation of a loss function by Binder (1978). minbinder provides the optimization algorithm of
Lau and Green.
relabel contains the relabelling algorithm of Stephens (2000).
arandi and vi.dist compute distance functions for clusterings, the (adjusted) Rand index and the entropy-based variation of
information distance.
Author(s)
Arno Fritsch
Maintainer: Arno Fritsch <arno.fritsch@tu-dortmund.de>
Fritsch, A. and Ickstadt, K. (2009) An improved criterion for clustering based on the
posterior similarity matrix, Bayesian Analysis, accepted.
Lau, J.W. and Green, P.J. (2007) Bayesian model based clustering
procedures, Journal of Computational and Graphical Statistics16, 526–558.
Stephens, M. (2000) Dealing with label switching in mixture models.
Journal of the Royal Statistical Society Series B, 62, 795–809.
Examples
data(cls.draw2)
# sample of 500 clusterings from a Bayesian cluster model
tru.class <- rep(1:8,each=50)
# the true grouping of the observations
psm2 <- comp.psm(cls.draw2)
# posterior similarity matrix
# optimize criteria based on PSM
mbind2 <- minbinder(psm2)
mpear2 <- maxpear(psm2)
# Relabelling
k <- apply(cls.draw2,1, function(cl) length(table(cl)))
max.k <- as.numeric(names(table(k))[which.max(table(k))])
relab2 <- relabel(cls.draw2[k==max.k,])
# compare clusterings found by different methods with true grouping
arandi(mpear2$cl, tru.class)
arandi(mbind2$cl, tru.class)
arandi(relab2$cl, tru.class)