Perform stochastic QT clustering on a data matrix.
Usage
qtclust(x, radius, family = kccaFamily("kmeans"), control = NULL,
save.data=FALSE, kcca=FALSE)
Arguments
x
A numeric matrix of data, or an object that can be coerced to
such a matrix (such as a numeric vector or a data frame with all
numeric columns).
radius
Maximum radius of clusters.
family
Object of class kccaFamily specifying the
distance measure to be used.
control
An object of class flexclustControl
specifying the minimum number of observations per cluster
(min.size), and trials per iteration (ntry, see details
below).
.
save.data
Save a copy of x in the return object?
kcca
Run kcca after the QT cluster algorithm has
converged?
Details
This function implements a variation of the QT clustering algorithm by
Heyer et al. (1999), see Scharl and Leisch (2006). The main difference
is that in each iteration not
all possible cluster start points are considered, but only a random
sample of size control@ntry. We also consider only points as initial
centers where at least one other point is within a circle with radius
radius. In most cases the resulting
solutions are almost
the same at a considerable speed increase, in some cases even better
solutions are obtained than with the original algorithm. If
control@ntry is set to the size of the data set, an algorithm
similar to the original algorithm as proposed by Heyer et al. (1999)
is obtained.
Value
Function qtclust by default returns objects of class
"kccasimple". If argument kcca is TRUE, function
kcca() is run afterwards (initialized on the QT cluster
solution). Data points
not clustered by the QT cluster algorithm are omitted from the
kcca() iterations, but filled back into the return
object. All plot methods defined for objects of class "kcca"
can be used.
Author(s)
Friedrich Leisch
References
Heyer, L. J., Kruglyak, S., Yooseph, S. (1999). Exploring expression data:
Identification and analysis of coexpressed genes. Genome Research 9,
1106–1115.
Theresa Scharl and Friedrich Leisch. The stochastic QT-clust
algorithm: evaluation of stability and variance on time-course
microarray data. In Alfredo Rizzi and Maurizio Vichi, editors,
Compstat 2006 – Proceedings in Computational Statistics, pages
1015-1022. Physica Verlag, Heidelberg, Germany, 2006.
Examples
x <- matrix(10*runif(1000), ncol=2)
## maximum distrance of point to cluster center is 3
cl1 <- qtclust(x, radius=3)
## maximum distrance of point to cluster center is 1
## -> more clusters, longer runtime
cl2 <- qtclust(x, radius=1)
opar <- par(c("mfrow","mar"))
par(mfrow=c(2,1), mar=c(2.1,2.1,1,1))
plot(x, col=predict(cl1), xlab="", ylab="")
plot(x, col=predict(cl2), xlab="", ylab="")
par(opar)