Last data update: 2014.03.03

R: K-Means Clustering Using Multiple Random Seeds
KMeansR Documentation

K-Means Clustering Using Multiple Random Seeds

Description

Finds a number of k-means clusting solutions using R's kmeans function, and selects as the final solution the one that has the minimum total within-cluster sum of squared distances.

Usage

KMeans(x, centers, iter.max=10, num.seeds=10)

Arguments

x

A numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a dataframe with all numeric columns).

centers

The number of clusters in the solution.

iter.max

The maximum number of iterations allowed.

num.seeds

The number of different starting random seeds to use. Each random seed results in a different k-means solution.

Value

A list with components:

cluster

A vector of integers indicating the cluster to which each point is allocated.

centers

A matrix of cluster centres (centroids).

withinss

The within-cluster sum of squares for each cluster.

tot.withinss

The within-cluster sum of squares summed across clusters.

betweenss

The between-cluster sum of squared distances.

size

The number of points in each cluster.

Author(s)

Dan Putler

See Also

kmeans

Examples

  data(USArrests)
  KMeans(USArrests, centers=3, iter.max=5, num.seeds=5)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(RcmdrMisc)
Loading required package: car
Loading required package: sandwich
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/RcmdrMisc/KMeans.Rd_%03d_medium.png", width=480, height=480)
> ### Name: KMeans
> ### Title: K-Means Clustering Using Multiple Random Seeds
> ### Aliases: KMeans
> ### Keywords: misc
> 
> ### ** Examples
> 
>   data(USArrests)
>   KMeans(USArrests, centers=3, iter.max=5, num.seeds=5)
K-means clustering with 3 clusters of sizes 20, 14, 16

Cluster means:
     Murder  Assault UrbanPop     Rape
1  4.270000  87.5500 59.75000 14.39000
2  8.214286 173.2857 70.64286 22.84286
3 11.812500 272.5625 68.31250 28.37500

Clustering vector:
       Alabama         Alaska        Arizona       Arkansas     California 
             3              3              3              2              3 
      Colorado    Connecticut       Delaware        Florida        Georgia 
             2              1              3              3              2 
        Hawaii          Idaho       Illinois        Indiana           Iowa 
             1              1              3              1              1 
        Kansas       Kentucky      Louisiana          Maine       Maryland 
             1              1              3              1              3 
 Massachusetts       Michigan      Minnesota    Mississippi       Missouri 
             2              3              1              3              2 
       Montana       Nebraska         Nevada  New Hampshire     New Jersey 
             1              1              3              1              2 
    New Mexico       New York North Carolina   North Dakota           Ohio 
             3              3              3              1              1 
      Oklahoma         Oregon   Pennsylvania   Rhode Island South Carolina 
             2              2              1              2              3 
  South Dakota      Tennessee          Texas           Utah        Vermont 
             1              2              2              1              1 
      Virginia     Washington  West Virginia      Wisconsin        Wyoming 
             2              2              1              1              2 

Within cluster sum of squares by cluster:
[1] 19263.760  9136.643 19563.863
 (between_SS / total_SS =  86.5 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
[6] "betweenss"    "size"         "iter"         "ifault"      
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>