Last data update: 2014.03.03

R: Combination of Factorial Methods and Cluster Analysis
FactoClassR Documentation

Combination of Factorial Methods and Cluster Analysis

Description

Performs the factorial analysis of the data and a cluster analysis using the nfcl first factorial coordinates

Usage

FactoClass( dfact, metodo, dfilu = NULL , nf = 2, nfcl = 10, k.clust = 3, 
            scanFC = TRUE , n.max = 5000 , n.clus = 1000 ,sign = 2.0,
            conso=TRUE , n.indi = 25,row.w = rep(1, nrow(dfact)) )
## S3 method for class 'FactoClass'
print(x, ...)
analisis.clus(X,W)

Arguments

dfact

object of class data.frame, with the data of active variables.

metodo

function of ade4 for ade4 factorial analysis, dudi.pca,Principal Component Analysis; dudi.coa, Correspondence Analysis; witwit.coa, Internal Correspondence Analysis; dudi.acm, Multiple Correspondence Analysis ...

dfilu

ilustrative variables (default NULL)

nf

number of axes to use into the factorial analysis (default 2)

nfcl

number of axes to use in the classification (default 10)

k.clust

number of classes to work (default 3)

scanFC

if is TRUE, it asks in the console the values nf, nfcl y k.clust

n.max

when rowname(dfact)>=n.max, k-means is performed previous to hierarchical clustering (default 5000)

n.clus

when rowname(fact)>=n.max, the previous k-means is performed with n.clus groups (default 1000)

sign

threshold test value to show the characteristic variables and modalities

conso

when conso is TRUE, the process of consolidating the classification is performed (default TRUE)

n.indi

number of indices to draw in the histogram (default 25)

row.w

vector containing the row weights if metodo<>dudi.coa

x

object of class FactoClass

...

further arguments passed to or from other methods

X

coordinates of the elements of a class

W

weights of the elements of a class

Details

Lebart et al. (1995) present a strategy to analyze a data table using multivariate methods, consisting of an intial factorial analysis according to the nature of the compiled data, followed by the performance of mixed clustering. The mixed clustering combines hierarchic clustering using the Ward's method with K-means clustering. Finally a partition of the data set and the characterization of each one of the classes is obtained, according to the active and illustrative variables, being quantitative, qualitative or frequency.

FactoClass is a function that connects procedures of the package ade4 to perform the analysis factorial of the data and from stats for the cluster analysis.

The function analisis.clus calculates the geometric characteristics of each class: size, inertia, weight and square distance to the origin.

For impression in LaTeX format see FactoClass.tex

To draw factorial planes with cluster see plotFactoClass

Value

object of class FactoClass with the following:

dudi

object of class dudi from ade4 with the specifications of the factorial analysis

nfcl

number of axes selected for the classification

k

number of classes

indices

table of indices obtained through WARD method

cor.clus

coordinates of the clusters

clus.summ

summary of the clusters

cluster

vector indicating the cluster of each element

carac.cate

cluster characterization by qualitative variables

carac.cont

cluster characterization by quantitative variables

carac.frec

cluster characterization by frequency active variables

Author(s)

Pedro Cesar del Campo pcdelcampon@unal.edu.co, Campo Elias Pardo cepardot@unal.edu.co http://www.docentes.unal.edu.co/cepardot, Ivan Diaz ildiazm@unal.edu.co, Mauricio Sadinle msadinleg@unal.edu.co

References

Lebart, L. and Morineau, A. and Piron, M. (1995) Statisitique exploratoire multidimensionnelle, Paris.

Examples


# Cluster analysis with Correspondence Analysis
data(ColorAdjective)
FC.col <-FactoClass(ColorAdjective, dudi.coa)
6
10
5

FC.col

FC.col$dudi


# Cluster analysis with Multiple Correspondence Analysis
data(BreedsDogs)

BD.act <- BreedsDogs[-7]  # active variables
BD.ilu <- BreedsDogs[7]   # ilustrative variables

FC.bd <-FactoClass( BD.act, dudi.acm, k.clust = 4,
                       scanFC = FALSE, dfilu = BD.ilu, nfcl = 10)

FC.bd

FC.bd$clus.summ
FC.bd$indices

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(FactoClass)
Loading required package: ade4
Loading required package: xtable
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/FactoClass/FactoClass.Rd_%03d_medium.png", width=480, height=480)
> ### Name: FactoClass
> ### Title: Combination of Factorial Methods and Cluster Analysis
> ### Aliases: FactoClass print.FactoClass analisis.clus
> ### Keywords: multivariate cluster
> 
> ### ** Examples
> 
> 
> # Cluster analysis with Correspondence Analysis
> data(ColorAdjective)
> FC.col <-FactoClass(ColorAdjective, dudi.coa)
Select the number of axes: 6
The number of retained axes for factorial analysis is  6 

Select the number of axes for clustering: 10
The number of axes for clustering is  10 

dev.new(): using pdf(file="Rplots837.pdf")
dev.new(): using pdf(file="Rplots838.pdf")
Look the histogram of 25 indexes 
Select the number of clusters: 5
Partition in  5  clusters
dev.new(): using pdf(file="Rplots839.pdf")
> 
> FC.col

 FactoClass: combination of factorial methods and cluster analysis

--------------------------------------------------------------------------
Object $dudi (Factorial analyis) 

Duality diagramm
class: coa dudi
$dudi$call: dudi.coa(df = ColorAdjective, scannf = TRUE, nf = 2)

$dudi$nf: 6 axis-components saved
$dudi$rank: 10
eigen values: 0.7507 0.6403 0.5323 0.5157 0.4598 ...
  vector    length mode    content       
1 $dudi$cw  11     numeric column weights
2 $dudi$lw  89     numeric row weights   
3 $dudi$eig 10     numeric eigen values  

  data.frame nrow ncol content             
1 $dudi$tab  89   11   modified array      
2 $dudi$li   89   6    row coordinates     
3 $dudi$l1   89   6    row normed scores   
4 $dudi$co   11   6    column coordinates  
5 $dudi$c1   11   6    column normed scores
other elements: $dudi$N 
--------------------------------------------------------------------------
 Number of axes for cluster:  10
 Number of clusters:  5 

  Object      Description                                                     
1 $indices    level indices for Hierarchical Clustering (WARD)                
2 $cor.clus   centroid cluster coordinates                                    
3 $clus.summ  partition changes due to consolidation process                  
4 $cluster    a vector indicating the cluster in which each point is allocated
5 $carac.cate cluster characterization by qualitative variables               
6 $carac.cont cluster characterization by  quantitative variables             
7 $carac.frec cluster characterization by  frequency active variables         
> 
> FC.col$dudi
Duality diagramm
class: coa dudi
$call: dudi.coa(df = ColorAdjective, scannf = TRUE, nf = 2)

$nf: 6 axis-components saved
$rank: 10
eigen values: 0.7507 0.6403 0.5323 0.5157 0.4598 ...
  vector length mode    content       
1 $cw    11     numeric column weights
2 $lw    89     numeric row weights   
3 $eig   10     numeric eigen values  

  data.frame nrow ncol content             
1 $tab       89   11   modified array      
2 $li        89   6    row coordinates     
3 $l1        89   6    row normed scores   
4 $co        11   6    column coordinates  
5 $c1        11   6    column normed scores
other elements: N 
> 
> 
> # Cluster analysis with Multiple Correspondence Analysis
> data(BreedsDogs)
> 
> BD.act <- BreedsDogs[-7]  # active variables
> BD.ilu <- BreedsDogs[7]   # ilustrative variables
> 
> FC.bd <-FactoClass( BD.act, dudi.acm, k.clust = 4,
+                        scanFC = FALSE, dfilu = BD.ilu, nfcl = 10)
The number of retained axes for factorial analysis is  2 

The number of axes for clustering is  10 

dev.new(): using pdf(file="Rplots840.pdf")
dev.new(): using pdf(file="Rplots841.pdf")
Look the histogram of 25 indexes 
Partition in  4  clusters
dev.new(): using pdf(file="Rplots842.pdf")
> 
> FC.bd

 FactoClass: combination of factorial methods and cluster analysis

--------------------------------------------------------------------------
Object $dudi (Factorial analyis) 

Duality diagramm
class: acm dudi
$dudi$call: dudi.acm(df = BD.act, row.w = c(0.037037037037037, 0.037037037037037, 
0.037037037037037, 0.037037037037037, 0.037037037037037, 0.037037037037037, 
0.037037037037037, 0.037037037037037, 0.037037037037037, 0.037037037037037, 
0.037037037037037, 0.037037037037037, 0.037037037037037, 0.037037037037037, 
0.037037037037037, 0.037037037037037, 0.037037037037037, 0.037037037037037, 
0.037037037037037, 0.037037037037037, 0.037037037037037, 0.037037037037037, 
0.037037037037037, 0.037037037037037, 0.037037037037037, 0.037037037037037, 
0.037037037037037), scannf = FALSE, nf = 2)

$dudi$nf: 2 axis-components saved
$dudi$rank: 10
eigen values: 0.4816 0.3847 0.211 0.1576 0.1501 ...
  vector    length mode    content       
1 $dudi$cw  16     numeric column weights
2 $dudi$lw  27     numeric row weights   
3 $dudi$eig 10     numeric eigen values  

  data.frame nrow ncol content             
1 $dudi$tab  27   16   modified array      
2 $dudi$li   27   2    row coordinates     
3 $dudi$l1   27   2    row normed scores   
4 $dudi$co   16   2    column coordinates  
5 $dudi$c1   16   2    column normed scores
other elements: $dudi$cr 
--------------------------------------------------------------------------
 Number of axes for cluster:  10
 Number of clusters:  4 

  Object      Description                                                     
1 $indices    level indices for Hierarchical Clustering (WARD)                
2 $cor.clus   centroid cluster coordinates                                    
3 $clus.summ  partition changes due to consolidation process                  
4 $cluster    a vector indicating the cluster in which each point is allocated
5 $carac.cate cluster characterization by qualitative variables               
6 $carac.cont cluster characterization by  quantitative variables             
7 $carac.frec cluster characterization by  frequency active variables         
> 
> FC.bd$clus.summ
      Bef.Size Aft.Size Bef.Inertia Aft.Inertia Bef.Weight Aft.Weight
1            7        7      0.1916      0.1916     0.2593     0.2593
2           10       10      0.3096      0.3096     0.3704     0.3704
3            5        5      0.1183      0.1183     0.1852     0.1852
4            5        5      0.1105      0.1105     0.1852     0.1852
TOTAL       27       27      0.7300      0.7300     1.0001     1.0001
      Bef.Dist_2 Aft.Dist_2
1         1.0992     1.0992
2         0.5029     0.5029
3         1.2503     1.2503
4         1.2639     1.2639
TOTAL         NA         NA
> FC.bd$indices
   Nodo Prim Benj     Indice
1    28    4   26 0.00000000
2    29    7   20 0.00000000
3    30   10   17 0.00000000
4    31    9   24 0.01236264
5    32   11   18 0.01236264
6    33   25   27 0.01236264
7    34   13   15 0.01236264
8    35    3   30 0.01648352
9    36   12   19 0.01759259
10   37    2   31 0.02060440
11   38   23   32 0.02060440
12   39   16   22 0.02918956
13   40    5   33 0.03118641
14   41   21   35 0.03250916
15   42    6   14 0.03266178
16   43    1   29 0.03296703
17   44   28   43 0.04697802
18   45   34   39 0.04776531
19   46   36   40 0.04939357
20   47   37   38 0.06913919
21   48    8   41 0.06934676
22   49   42   44 0.07898134
23   50   45   47 0.08516102
24   51   46   50 0.22780474
25   52   48   49 0.27570400
26   53   51   52 0.43314332
> 
> 
> 
> 
> 
> 
> dev.off()
png 
  2 
>