R: Cluster Solution Diagnositics Using Bootstrap Replicates
bootCVD
R Documentation
Cluster Solution Diagnositics Using Bootstrap Replicates
Description
Provides a plot of both the Rand index and the Calinski-Harabas index for
different numbers of clusters for a common underlying dataset using either
the K-Means, K-Medians, or Neural Gas clusting algorithms based on a set of
bootstrap replicates of the data.
An integer vector giving the set of clustering solutions to be
examined.
nboot
The number of bootstrap replicates to use for the assessment.
nrep
The number of each set of initial cluster seeds on which to base
a solution.
method
The clustering method, one of "kmn" (K-Means), "kmd"
(K-Medians), and "neuralgas" (neural gas).
col1
The color to use for the plot of the Rand index values.
col2
The color to use for the plot of the Calinski-Harabas index values.
dsname
The name of the dataset being used (used only for output purposes.
xdat
A numeric matrix of the data to be clustered.
k_vals
An integer vector giving the set of clustering solutions to be examined.
clstr1
The cluster assignments from a bootFlexclust object for one side
of the Rand index paired comparisons.
clstr2
The cluster assignments from a bootFlexclust object for the
other side of the Rand index paired comparisons.
cntrs1
The cluster centers from a bootFlexclust object for one side
of the bootFlexclust Rand index paired comparisons.
cntrs2
The cluster centers from a bootFlexclust object for the other
side of the bootFlexclust Rand index paired comparisons.
fc
A bootFlexclust object.
ch
A matrix of Calinski-Harabas index values from bootCH.
Details
The Rand index provides a measure of cluster stability, with relatively
higher values indicating relatively more stable clusters, and the the
Calinski-Harabas index gives a ratio of cluster seperation to cluster
homogeneity, with higher values of the index being comparatively more
preferred. The use of bootstrap replicates addresses both potential
randomness in both the sample data and the clustering algorithms.
Value
The functions bootCVD and bootPlot return invisibly. Their
benefit is the side effect plot produced and the printed summary of the index
values. The function bootCH a matrix of Calinski-Harabas index values, the
rows are the replicates, and each column corresponds to a particular number
of clusters for a solution.
Author(s)
Dan Putler
References
S. Dolnicar, F. Leisch (2010), Evaluation of Structure and Reproducibility
of Cluster Solution Using the Bootstrap. Marketing Letters, 21:1.
F. Leisch (2006), A Toolbox for K-Centroids Cluster Analysis.
Computational Statistics and Data Analysis, 51:2.