numeric matrix or data.frame where columns correspond to variables and rows to
observations
diss.mx
square, symmetric numeric matrix or data.frame, representation of
dissimilarity matrix where infomartion about distances between objects is stored.
clust
integer vector with information about cluster id the object is assigned to.
If vector is not integer type, it will be coerced with warning.
dist
chosen metric: "euclidean" (default value), "manhattan", "correlation"
(variable enable only in cls.scatt.data function).
Details
Six intercluster distances and three intracluster diameters can be used to
calculate such validity indices as Dunn and Davies-Bouldin like.
Let d(x,y) be a distance function between two objects comming from our data set.
Intracluster diameters
The complete diameter represents the distance between two the most remote objects belonging
to the same cluster.
diam1(C) = max{ d(x,y): x,y belongs to cluster C }
The average diameter distance defines the average distance between all of the
samples belonging to the same cluster.
diam2(C) = 1/|C|(|C|-1) * sum{ forall x,y belongs to cluster C and x != y } d(x,y)
The centroid diameter distance reflects the double average distance between all of the
samples and the cluster's center (v(C) - cluster center).
diam3(C) = 1/|C| * sum{ forall x belonging to cluster C} d(x,v(C))
Intercluster distances
The single linkage distance defines the closest distance between two samples
belonging to two different clusters.
dist1(Ci,Cj) = min{ d(x,y): x belongs to Ci and y to Cj cluster }
The complete linkage distance represents the distance between the most remote samples
belonging to two different clusters.
dist2(Ci,Cj) = max{ d(x,y): x belongs to Ci and y to Cj cluster }
The average linkage distance defines the average distance between all of the samples
belonging to two different clusters.
dist3(Ci,Cj) = 1/(|Ci|*|Cj|) * sum{ forall x belongs Ci and y to Cj } d(x,y)
The centroid linkage distance reflects the distance between the centres of two clusters
(v(i), v(j) - clusters' centers).
dist4(Ci,Cj) = d(v(i), V(j))
The average of centroids linkage represents the distance between the centre of a cluster
and all of samples belonging to a different cluster.
dist5(Ci,Cj) = 1/(|Ci|+|Cj|) *
( sum{ forall x belongs Ci } d(x,v(j)) + sum{ forall y belongs Cj } d(y,v(i)) )
Hausdorff metrics are based on the discovery of a maximal distance from samples of one
cluster to the nearest sample of another cluster.
dist6(Ci,Cj) = max{ distH(Ci,Cj), distH(Cj,Ci) }
where: distH(A,B) = max{ min{ d(x,y): y belongs to B}: x belongs to A }
Value
cls.scatt.data returns an object of class "list".
Intracluster diameters:
intracls.complete,
intracls.average,
intracls.centroid,
are stored in vectors and intercluster distances:
intercls.single,
intercls.complete,
intercls.average,
intercls.centroid,
intercls.ave_to_cent,
intercls.hausdorff
in symmetric matrices.
Vectors' lengths and both dimensions of each matrix are equal to number of clusters.
Additionally in result list cluster.center matrix (rows correspond to clusters centers)
and cluster.size vector is given (information about size of each cluster).
cls.scatt.diss.mx returns an object of class "list".
Intracluster diameters:
intracls.complete,
intracls.average,
are stored in vectors and intercluster distances:
intercls.single,
intercls.complete,
intercls.average,
intercls.hausdorff
in symmetric matrices.
Vectors' lengths and both dimensions of each matrix are equal to number of clusters.
Additionally in result list cluster.size vector is given (information about size of each cluster).