R: Determination of optimal clustering procedure for a data set
cluster.Sim
R Documentation
Determination of optimal clustering procedure for a data set
Description
Determination of optimal clustering procedure for a data set by varying all combinations of normalization formulas, distance measures, and clustering methods
path of simulation: 1 - ratio data, 2 - interval or mixed (ratio & interval) data, 3 - ordinal data, 4 - nominal data, 5 - binary data, 6 - ratio data without normalization, 7 - interval or mixed (ratio & interval) data without normalization, 8 - ratio data with k-means, 9 - interval or mixed (ratio & interval) data with k-means
minClusterNo
minimal number of clusters, between 2 and no. of objects - 1 (for G3: no. of objects - 2)
maxClusterNo
maximal number of clusters, between 2 and no. of objects - 1 (for G3: no. of objects - 2; for KL: no. of objects - 3), greater or equal minClusterNo
icq
Internal cluster quality index, "S" - Silhouette,"G1" - Calinski & Harabasz index, "G2" - Baker & Hubert index ,"G3" - Hubert & Levine index, "KL" - Krzanowski & Lai index
outputHtml
optional, name of html file with results
outputCsv
optional, name of csv file with results
outputCsv2
optional, name of csv (comma as decimal point sign) file with results
normalizations
optional, vector of normalization formulas that should be used in procedure
distances
optional, vector of distance measures that should be used in procedure
methods
optional, vector of classification methods that should be used in procedure
Details
Parameter normalizations for each path may be the subset of the following values
path 1: "n6" to "n11" (if measurement scale of variables is ratio and transformed measurement scale of variables is ratio) or
"n1" to "n5" (if measurement scale of variables is ratio and transformed measurement scale of variables is interval)
path 2: "n1" to "n5"
path 3 to 7 : "n0"
path 8: "n1" to "n11"
path 9: "n1" to "n5"
Parameter distances for each path may be the subset of the following values
path 1: "d1" to "d7" (if measurement scale of variables is ratio and transformed measurement scale of variables is ratio) or
"d1" to "d5" (if measurement scale of variables is ratio and transformed measurement scale of variables is interval)
path 2: "d1" to "d5"
path 3: "d8"
path 4: "d9"
path 5: "b1" to "b10"
path 6: "d1" to "d7"
path 7: "d1" to "d5"
path 8 and 9: N.A.
Parameter methods for each path may be the subset of the following values
Everitt, B.S., Landau, E., Leese, M. (2001), Cluster analysis, Arnold, London.
Gatnar, E., Walesiak, M. (Eds.) (2004), Metody statystycznej analizy wielowymiarowej w badaniach marketingowych [Multivariate statistical analysis methods in marketing research], Wydawnictwo AE, Wroclaw.
Gordon, A.D. (1999), Classification, Chapman & Hall/CRC, London.
Milligan, G.W., Cooper, M.C. (1985), An examination of procedures of determining the number of cluster in a data set, "Psychometrika", vol. 50, no. 2, 159-179.
Milligan, G.W., Cooper, M.C. (1988), A study of standardization of variables in cluster analysis, "Journal of Classification", vol. 5, 181-204.
Walesiak, M., Dudek, A. (2006), Symulacyjna optymalizacja wyboru procedury klasyfikacyjnej dla danego typu danych - oprogramowanie komputerowe i wyniki badan, Prace Naukowe AE we Wroclawiu, 1126, 120-129.
Walesiak, M., Dudek, A. (2007), Symulacyjna optymalizacja wyboru procedury klasyfikacyjnej dla danego typu danych - charakterystyka problemu, Zeszyty Naukowe Uniwersytetu Szczecinskiego nr 450, 635-646.