The HIerarchical Partitioning Around Medoids clustering method (HIPAM) was originally created to gene clustering (Wit et al. (2004)). The HIPAM algorithm is a divisive hierarchical clustering method based on the PAM algorithm.
This function is a HIPAM algorithm adapted to deal with anthropometric data. To that end, a different dissimilarity function is incorporated. This function is that explained in McCulloch et al. (1998) and it is implemented in getDistMatrix. We call it $d_MO$. In addition, a different method to obtain a classification tree is also incorporated.
Two HIPAM algorithms are proposed. The first one, called $HIPAM_MO$, is a HIPAM that uses $d_MO$. The second one, $HIPAM_IMO$, is a HIPAM algorithm that uses $d_MO$ and the INCA (Index Number Clusters Atypical) statistic criterion (Irigoien et al. (2008)) to decide the number of child clusters and as a stopping rule.
Data frame. In our approach, this is each of the subframes originated after segmenting the whole anthropometric Spanish survey into twelve bust segments, according to the European standard on sizing systems. Size designation of clothes. Part 3: Measurements and intervals. Each row corresponds to an observation, and each column corresponds to a variable. All variables are numeric.
asw.tol
If this value is given, a tolerance or penalty can be introduced (asw.tol > 0 or asw.tol < 0, respectively) in the branch splitting procedure. Default value (0) is maintained. See page 154 of Wit et al. (2004) for more details.
maxsplit
The maximum number of clusters that any cluster can be divided into when searching for the best clustering.
local.const
If this value is given (meaningful values are those between -1 and 1), a proposed partition is accepted only if the associated asw is greater than this constant. Default option for this argument is maintained, that is to say, this value is ignored. See page 154 of Wit et al. (2004) for more details.
orness
Quantity to measure the degree to which the aggregation is like a min or max operation. See weightsMixtureUB and getDistMatrix.
type
Type of HIPAM algorithm to be used. The possible options are 'MO' (for $HIPAM_MO$) and 'IMO' (for $HIPAM_IMO$).
ah
Constants that define the ah slopes of the distance function in getDistMatrix. Given the five variables considered, this vector is c(23,28,20,25,25). This vector would be different according to the variables considered.
verbose
Boolean variable (TRUE or FALSE) to indicate whether to report information on progress.
...
Other arguments that may be supplied to the internal functions of the HIPAM algorithms.
Details
The $HIPAM_MO$ algorithm uses the getBestPamsamMO and checkBranchLocalMO functions, while the $HIPAM_IMO$ algorithm uses the getBestPamsamIMO and checkBranchLocalIMO functions.
For more details of HIPAM, see van der Laan et al. (2003), Wit et al. (2004) and the manual of the smida R package.
Value
A list with the following elements:
clustering: Final clustering that corresponds to the last level of the tree.
asw: The asw of the final clustering.
n.levels: Number of levels in the tree.
cases: Anthropometric cases (medoids of all of the clusters in the tree).
active: Activity status of each cluster (FALSE for every cluster of the final partition).
development: Matrix that indicates the ancestors of the final clusters.
num.of.clusters: Number of clusters in the final clustering.
metric: Dissimilarity used (called 'McCulloch' because the dissimilarity function used is that explained in McCulloch et al. (1998)).
Note
All the functions related to the HIPAM algorithm were originally created by E. Wit et al., and they are available freely on http://www.math.rug.nl/~ernst/book/smida.html. In order to develop the $HIPAM_MO$ and $HIPAM_IMO$ algorithms, we have used and adapted them.
Author(s)
Guillermo Vinue
References
Vinue, G., Leon, T., Alemany, S., and Ayala, G., (2013). Looking for representative fit models for apparel sizing, Decision Support Systems57, 22–33.
Wit, E., and McClure, J., (2004). Statistics for Microarrays: Design, Analysis and Inference. John Wiley & Sons, Ltd.
van der Laan, M. J., and Pollard, K. S., (2003). A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap, Journal of Statistical Planning and Inference117, 275–303.
Pollard, K. S., and van der Laan, M. J., (2002). A method to identify significant clusters in gene expression data. Vol. II of SCI2002 Proceedings, 318–325.
Irigoien, I., and Arenas, C., (2008). INCA: New statistic for estimating the number of clusters and identifying atypical units, Statistics in Medicine27, 2948–2973.
Irigoien, I., Sierra, B., and Arenas, C., (2012). ICGE: an R package for detecting relevant clusters and atypical units in gene expression, BMC Bioinformatics13, 1–29.
McCulloch, C., Paal, B., and Ashdown, S., (1998). An optimization approach to apparel sizing, Journal of the Operational Research Society49, 492–499.
European Committee for Standardization. Size designation of clothes. Part 3: Measurements and intervals. (2005).
Alemany, S., Gonzalez, J. C., Nacher, B., Soriano, C., Arnaiz, C., and Heras, H., (2010). Anthropometric survey of the Spanish female population aimed at the apparel industry. Proceedings of the 2010 Intl. Conference on 3D Body scanning Technologies, 307–315.
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(Anthropometry)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/Anthropometry/hipamAnthropom.Rd_%03d_medium.png", width=480, height=480)
> ### Name: hipamAnthropom
> ### Title: HIPAM algorithm for anthropometric data
> ### Aliases: hipamAnthropom
> ### Keywords: array
>
> ### ** Examples
>
> #FOR THE SIZES DEFINED BY THE EUROPEAN NORMATIVE:
> dataHipam <- sampleSpanishSurvey
> bust <- dataHipam$bust
> bustSizes <- bustSizesStandard(seq(74, 102, 4), seq(107, 131, 6))
>
> type <- "IMO"
> maxsplit <- 5 ; orness <- 0.7
> ah <- c(23, 28, 20, 25, 25)
>
> set.seed(2013)
> numSizes <- 1
> res_hipam <- computSizesHipamAnthropom(dataHipam, bust, bustSizes$bustCirc, numSizes,
+ maxsplit, orness, type, ah, FALSE)
>
> fitmodels <- anthrCases(res_hipam, numSizes)
> outliers <- trimmOutl(res_hipam, numSizes)
>
> #FOR ANY OTHER DEFINED SIZE:
> set.seed(1900)
> rand <- sample(1:600,20)
> dataComp <- sampleSpanishSurvey[rand, c(2, 3, 5)]
> numVar <- dim(dataComp)[2]
>
> type <- "IMO"
> maxsplit <- 5 ; orness <- 0.7
> ah <- c(28, 25, 25)
>
> dataMat <- as.matrix(dataComp)
> set.seed(2013)
> res_hipam_One <- list() ; class(res_hipam_One) <- "hipamAnthropom"
> res_hipam_One[[1]] <- hipamAnthropom(dataMat, maxsplit = maxsplit, orness = orness,
+ type = type, ah = ah, verbose = FALSE)
>
>
> #plotTreeHipamAnthropom(res_hipam_One, main="Proposed Hierarchical PAM Clustering \n")
>
> fitmodels_One <- anthrCases(res_hipam_One,1)
> outliers_One <- trimmOutl(res_hipam_One,1)
>
>
>
>
>
> dev.off()
null device
1
>