The data matrix where columns correspond to variables and
rows to observations.
centers
Number of clusters or initial values for cluster
centers.
iter.max
Maximum number of iterations.
verbose
If TRUE, make some output during learning.
dist
Must be one of the following: If "euclidean", the
mean square error, if "manhattan", the mean absolute error is
computed. Abbreviations are also accepted.
method
If "cmeans", then we have the c-means fuzzy
clustering method, if "ufcl" we have the on-line update.
Abbreviations are also accepted.
m
A number greater than 1 giving the degree of fuzzification.
rate.par
A number between 0 and 1 giving the parameter of the
learning rate for the on-line variant. The default corresponds to
0.3.
weights
a numeric vector with non-negative case weights.
Recycled to the number of observations in x if necessary.
control
a list of control parameters. See Details.
Details
The data given by x is clustered by generalized versions of the
fuzzy c-means algorithm, which use either a fixed-point or an
on-line heuristic for minimizing the objective function
∑_i ∑_j w_i u_{ij}^m d_{ij},
where w_i is the weight of observation i, u_{ij} is
the membership of observation i in cluster j, and
d_{ij} is the distance (dissimilarity) between observation
i and center j. The dissimilarities used are the sums of
squares ("euclidean") or absolute values ("manhattan")
of the element-wise differences.
If centers is a matrix, its rows are taken as the initial cluster
centers. If centers is an integer, centers rows of
x are randomly chosen as initial values.
The algorithm stops when the maximum number of iterations (given by
iter.max) is reached, or when the algorithm is unable to reduce
the current value val of the objective function by
reltol * (abs(val) * reltol) at a step. The relative
convergence tolerance reltol can be specified as the
reltol component of the list of control parameters, and
defaults to sqrt(.Machine$double.eps).
If verbose is TRUE, each iteration displays its number
and the value of the objective function.
If method is "cmeans", then we have the c-means
fuzzy clustering method, see for example Bezdek (1981). If
"ufcl", we have the On-line Update (Unsupervised Fuzzy
Competitive Learning) method due to Chung and Lee (1992), see also Pal
et al (1996). This method works by performing an update directly
after each input signal (i.e., for each single observation).
The parameters m defines the degree of fuzzification. It is
defined for real values greater than 1 and the bigger it is the more
fuzzy the membership values of the clustered data points are.
Value
An object of class "fclust" which is a list with components:
centers
the final cluster centers.
size
the number of data points in each cluster of the closest
hard clustering.
cluster
a vector of integers containing the indices of the
clusters where the data points are assigned to for the closest hard
clustering, as obtained by assigning points to the (first) class with
maximal membership.
iter
the number of iterations performed.
membership
a matrix with the membership values of the data points
to the clusters.
withinerror
the value of the objective function.
call
the call used to create the object.
Author(s)
Evgenia Dimitriadou and Kurt Hornik
References
J. C. Bezdek (1981).
Pattern recognition with fuzzy objective function algorithms.
New York: Plenum.
Fu Lai Chung and Tong Lee (1992).
Fuzzy competitive learning.
Neural Networks, 7(3), 539–551.
Nikhil R. Pal, James C. Bezdek, and Richard J. Hathaway (1996).
Sequential competitive learning and the fuzzy c-means clustering
algorithms.
Neural Networks, 9(5), 787–796.
Examples
# a 2-dimensional example
x<-rbind(matrix(rnorm(100,sd=0.3),ncol=2),
matrix(rnorm(100,mean=1,sd=0.3),ncol=2))
cl<-cmeans(x,2,20,verbose=TRUE,method="cmeans",m=2)
print(cl)
# a 3-dimensional example
x<-rbind(matrix(rnorm(150,sd=0.3),ncol=3),
matrix(rnorm(150,mean=1,sd=0.3),ncol=3),
matrix(rnorm(150,mean=2,sd=0.3),ncol=3))
cl<-cmeans(x,6,20,verbose=TRUE,method="cmeans")
print(cl)