R Graphical Manual

Browse All

Last data update: 2014.03.03

R: kGmedian

kGmedian

R Documentation

kGmedian

Description

Fast k-medians clustering based on recursive averaged stochastic gradient algorithms. The procedure is similar to the kmeans clustering technique performed recursively with the MacQueen algorithm. The advantage of the kGmedian algorithm compared to MacQueen strategy is that it deals with sum of norms instead of sum of squared norms, ensuring a more robust behaviour against outlying values.

Usage

kGmedian(X, ncenters=2, gamma=1, alpha=0.75, nstart = 10, nstartkmeans = 10)

Arguments

`X`	matrix, with n observations (rows) in dimension d (columns).
`ncenters`	Either the number of clusters, say k, or a set of initial (distinct) cluster centres. If a number, the initial centres are chosen as the output of the `kmeans` function computed with the `MacQueen` algorithm.
`gamma`	Value of the constant controling the descent steps (see details).
`alpha`	Rate of decrease of the descent steps.
`nstart`	Number of times the algorithm is ran, with random sets of initialization centers chosen among the observations.
`nstartkmeans`	Number of initialization points in the `kmeans` function for choosing the starting point of `kGmedian`.

Details

See Cardot, Cenac and Monnez (2012).

Value

`cluster`	A vector of integers (from 1:k) indicating the cluster to which each point is allocated.
`centers`	A matrix of cluster centres.
`withinsrs`	Vector of within-cluster sum of norms, one component per cluster.
`size`	The number of points in each cluster.

References

Cardot, H., Cenac, P. and Monnez, J-M. (2012). A fast and recursive algorithm for clustering large datasets with k-medians. Computational Statistics and Data Analysis, 56, 1434-1449.

Cardot, H., Cenac, P. and Zitt, P-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, 19, 18-43.

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam and J. Neyman, 1, pp. 281-297. Berkeley, CA: University of California Press.

Examples

# a 2-dimensional example 
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")

cl.kmeans <- kmeans(x, 2)
cl.kmedian <- kGmedian(x)

par(mfrow=c(1,2))
plot(x, col = cl.kmeans$cluster, main="kmeans")
points(cl.kmeans$centers, col = 1:2, pch = 8, cex = 2)

plot(x, col = cl.kmedian$cluster, main="kmedian")
points(cl.kmedian$centers, col = 1:2, pch = 8, cex = 2)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(Gmedian)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/Gmedian/kGmedian.Rd_%03d_medium.png", width=480, height=480)
> ### Name: kGmedian
> ### Title: kGmedian
> ### Aliases: kGmedian
> ### Keywords: Gmedian
> 
> ### ** Examples
> 
> # a 2-dimensional example 
> x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
+            matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
> colnames(x) <- c("x", "y")
> 
> cl.kmeans <- kmeans(x, 2)
> cl.kmedian <- kGmedian(x)
> 
> par(mfrow=c(1,2))
> plot(x, col = cl.kmeans$cluster, main="kmeans")
> points(cl.kmeans$centers, col = 1:2, pch = 8, cex = 2)
> 
> plot(x, col = cl.kmedian$cluster, main="kmedian")
> points(cl.kmedian$centers, col = 1:2, pch = 8, cex = 2)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>

kGmedian

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Results