The S2 measure was proposed by Morlini and Zani (2012) and it is based on a transformed dataset, which contains only binary variables (dummy coding). Hierarchical clustering methods require a proximity (dissimilarity) matrix instead of a similarity matrix as an entry for the analysis; therefore, dissimilarity D is computed from similarity S according the equation 1/S-1.
The use and evaluation of clustering with this measure can be found e.g. in (Sulc and Rezankova, 2014) or (Sulc, 2015).
The Eskin similarity measure was proposed by Eskin et al. (2002). It is constructed to assign higher weights to mismatches on variables with more categories, see (Boriah et al., 2008). Hierarchical clustering methods require a proximity (dissimilarity) matrix instead of a similarity matrix as an entry for the analysis; therefore, dissimilarity D is computed from similarity S according the equation 1/S-1.
The use and evaluation of clustering with this measure can be found e.g. in (Sulc and Rezankova, 2014).
The function evaluates clustering results no matter which clustering method they were obtained by. The clusters are evaluated from a point of view of the within-cluster variability by the following indices: Within-cluster mutability coefficient (WCM), Within-cluster entropy coefficient (WCE), Pseudo tau coefficient (PSTau), Pseudo uncertainty coefficient (PSU) and Pseudo F, Indices based on the mutability (PSFM) and the entropy (PSFE).
The Nominal Clustering (nomclust) performs hierarchical cluster analysis (HCA) with objects characterized by nominal (categorical) variables. It performs a serie of cluster solutions, usually from two-cluster solution till six-cluster solution. It allows to choose one from 11 different similarity measures and one from 3 linkage methods. The function also contains an evaluation part. The created clusters are evaluated from a point of view of the within-cluster variability by the following indices: Within-cluster Mutability coefficient (WCM), Within-cluster entropy coefficient (WCE), Pseudo tau coefficient (PSTau), Pseudo uncertainty coefficient (PSU) and Pseudo F Indices based on the mutability (PSFM) and the entropy (PSFE).
The Goodall 4 similarity measure was firstly introduced in (Boriah et al., 2008). The measure ssigns higher similarity if the frequent categories match. When measuring similarity between two variables, this measure provides complement results of Goodall 3 to one. Hierarchical clustering methods require a proximity (dissimilarity) matrix instead of a similarity matrix as an entry for the analysis; therefore, dissimilarity D is computed from similarity S according the equation 1/S-1.
The use and evaluation of clustering with this measure can be found e.g. in (Sulc, 2015).
The Goodall 3 similarity measure was firstly introduced in (Boriah et al., 2008). The measure assigns higher similarity if the infrequent categories match regardless on frequencies of other categories. Hierarchical clustering methods require a proximity (dissimilarity) matrix instead of a similarity matrix as an entry for the analysis; therefore, dissimilarity D is computed from similarity S according the equation 1/S-1.
The use and evaluation of clustering with this measure can be found e.g. in (Sulc, 2015).
The Lin 1 similarity measure was firstly introduced in (Boriah et al., 2008). In has a complex system of weights. In case of mismatch, lower similarity is assigned if either the mismatching values are very frequent or their relative frequency is in between the relative frequencies of mismatching values. Higher similarity is assigned if the mismatched categories are infrequent and there are a few other infrequent categories. In case of match, lower similarity is given for matches on frequent categories or matches on categories that have many other values of the same frequency. Higher similarity is given to matches on infrequent categories.
The Goodall 1 similarity measure was mentioned e.g. in (Boriah et al., 2008). It is a simple modification of the original Goodall measure (Goodall, 1966). The measure assigns higher similarity to infrequent matches. Hierarchical clustering methods require a proximity (dissimilarity) matrix instead of a similarity matrix as an entry for the analysis; therefore, dissimilarity D is computed from similarity S according the equation 1/S-1.
The use and evaluation of clustering with this measure can be found e.g. in (Sulc, 2015).
of
(Package: nomclust) :
Occurence Frequency (OF) Measure
The OF (Occurrence Frequency) measure was originally constructed for the text mining, see (Sparck-Jones, 1972), later, it was adjusted for categorical variables. It assigns higher similarity to mismatches on less frequent values and otherwise. Hierarchical clustering methods require a proximity (dissimilarity) matrix instead of a similarity matrix as an entry for the analysis; therefore, dissimilarity D is computed from similarity S according the equation 1/S-1.
● Data Source:
CranContrib
● Keywords:
● Alias: of
●
0 images
sm
(Package: nomclust) :
Simple Matching Coefficient
The simple matching coefficient (Sokal, 1958) represents the simplest way for measuring of similarity. It does not impose any weigts. By a given variable, it assigns value 1 in case of match and value 0 otherwise. Hierarchical clustering methods require a proximity (dissimilarity) matrix instead of a similarity matrix as an entry for the analysis; therefore, dissimilarity D is computed from similarity S according the equation 1/S-1.
The use and evaluation of clustering with this measure can be found e.g. in (Sulc and Rezankova, 2014) or (Sulc, 2015).
● Data Source:
CranContrib
● Keywords:
● Alias: sm
●
0 images