an (N<c3><83><c2><97>D) matrix of 'double' values:
N observations in D variables.
method
the agglomeration method to be used. This must be (an
unambiguous abbreviation of) one of "single",
"ward", "centroid" or "median".
members
NULL or a vector with length the number of observations.
metric
the distance measure to be used. This must be one of
"euclidean", "maximum", "manhattan",
"canberra", "binary" or "minkowski". Any
unambiguous substring can be given.
p
parameter for the Minkowski metric.
Details
The function hclust.vector provides clustering when the
input is vector data. It uses memory-saving algorithms which allow
processing of larger data sets than hclust does.
The "ward", "centroid" and "median" methods
require metric="euclidean" and cluster the data set with
respect to Euclidean distances.
For "single" linkage clustering, any dissimilarity
measure may be chosen. Currently, the same metrics are implemented as the
dist function provides.
The call
hclust.vector(X, method='single', metric=[...])
gives the same result as
hclust(dist(X, metric=[...]), method='single')
but uses less memory and is equally fast.
For the Euclidean methods, care must be taken since
hclust expects squared Euclidean
distances. Hence, the call
hclust.vector(X, method='centroid')
is, aside from the lesser memory requirements, equivalent to
d = dist(X)
hc = hclust(d^2, method='centroid')
hc$height = sqrt(hc$height)
The same applies to the "median" method. The "ward" method in
hclust.vector is equivalent to hclust with method "ward.D2",
but to method "ward.D" only after squaring as above.
More details are in the User's manual
fastcluster.pdf, which is available as
a vignette. Get this from the R command line with
vignette('fastcluster').
# Taken and modified from stats::hclust
## Perform centroid clustering with squared Euclidean distances,
## cut the tree into ten clusters and reconstruct the upper part of the
## tree from the cluster centers.
hc <- hclust.vector(USArrests, "cen")
# squared Euclidean distances
hc$height <- hc$height^2
memb <- cutree(hc, k = 10)
cent <- NULL
for(k in 1:10){
cent <- rbind(cent, colMeans(USArrests[memb == k, , drop = FALSE]))
}
hc1 <- hclust.vector(cent, method = "cen", members = table(memb))
# squared Euclidean distances
hc1$height <- hc1$height^2
opar <- par(mfrow = c(1, 2))
plot(hc, labels = FALSE, hang = -1, main = "Original Tree")
plot(hc1, labels = FALSE, hang = -1, main = "Re-start from 10 clusters")
par(opar)