Last data update: 2014.03.03

R: Calculate Dissimilarity Matrix for Mixed Attributes.
calcDissimMatR Documentation

Calculate Dissimilarity Matrix for Mixed Attributes.

Description

Takes in two data frames where first contains only qualitative attributes and the other contains only quantitative attributes. Function calculates the dissimilarity matrix based on the method proposed by Ahmad & Dey (2007).

Usage

calcDissimMat(myDataQuali, myDataQuant)

Arguments

myDataQuali

A data frame which includes only qualitative variables in columns.

myDataQuant

A data frame which includes only quantitative variables in columns.

Details

calcDissimMat is an implementtion of the method proposed by Ahmad & Dey (2007) to calculate the dissimilarity matrix at the presence of both qualitative and quantitative attributes. This approach finds dissimilarity of qualitative and quantitative attributes seperately and the final dissimilarity matrix is formed by combining both. See Ahmad & Dey (2007) for more datails.

Value

A dissimilarity matrix. This can be used as an input to pam, fanny, agnes and diana functions.

References

Ahmad, A., & Dey, L. (2007). A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering, 63(2), 503-527.

Examples

QualiVars <- data.frame(Qlvar1 = c("A","B","A","C","C","A"), Qlvar2 = c("Q","Q","R","Q","R","Q"))
QuantVars <- data.frame(Qnvar1 = c(1.5,3.2,4.9,5,2.8,3.1), Qnvar2 = c(4.8,2,1.1,5.8,3.1,2.2))
DisSimMatCalcd <- calcDissimMat(QualiVars, QuantVars)

agnesClustering <- cluster::agnes(DisSimMatCalcd, diss = TRUE, method = "ward")
silWidths <- cluster::silhouette(cutree(agnesClustering, k = 2), DisSimMatCalcd)
mean(silWidths[,3])
plot(agnesClustering)

PAMClustering <- cluster::pam(DisSimMatCalcd, k=2, diss = TRUE)
silWidths <- cluster::silhouette(PAMClustering, DisSimMatCalcd)
plot(silWidths)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(DisimForMixed)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/DisimForMixed/calcDissimMat.Rd_%03d_medium.png", width=480, height=480)
> ### Name: calcDissimMat
> ### Title: Calculate Dissimilarity Matrix for Mixed Attributes.
> ### Aliases: calcDissimMat
> 
> ### ** Examples
> 
> QualiVars <- data.frame(Qlvar1 = c("A","B","A","C","C","A"), Qlvar2 = c("Q","Q","R","Q","R","Q"))
> QuantVars <- data.frame(Qnvar1 = c(1.5,3.2,4.9,5,2.8,3.1), Qnvar2 = c(4.8,2,1.1,5.8,3.1,2.2))
> DisSimMatCalcd <- calcDissimMat(QualiVars, QuantVars)
> 
> agnesClustering <- cluster::agnes(DisSimMatCalcd, diss = TRUE, method = "ward")
> silWidths <- cluster::silhouette(cutree(agnesClustering, k = 2), DisSimMatCalcd)
> mean(silWidths[,3])
[1] 0.641198
> plot(agnesClustering)
> 
> PAMClustering <- cluster::pam(DisSimMatCalcd, k=2, diss = TRUE)
> silWidths <- cluster::silhouette(PAMClustering, DisSimMatCalcd)
> plot(silWidths)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>