Last data update: 2014.03.03

R: INCA Test
INCAtestR Documentation

INCA Test

Description

Assume that n units are divided into k groups C1,...,Ck. Function INCAtest performs the typicality INCA test. Therein, the null hypothesis that a new unit x0 is a typical unit with respect to a previously fixed partition is tested versus the alternative hypothesis that the unit is atypical.

Usage

INCAtest(d, pert, d_test, np = 1000, alpha = 0.05, P = 1)

Arguments

d

a distance matrix or a dist object with distance information between units.

pert

an n-vector that indicates which group each unit belongs to. Note that the expected values of pert are numbers greater than or equal to 1 (for instance 1,2,3,4..., k). The default value indicates there is only one group in data.

d_test

an n-vector containing the distances from x0 to the other units.

np

sample size for the bootstrap sample for the bootstrap procedure.

alpha

fixed level for the test.

P

the bootstrap procedure is repeated 10*P times.

Value

A list with class "incat" containing the following components:

StatisticW0

value of the INCA statistic.

ProjectionsU

values of statistics measuring the projection from the specific object to each considered group.

Percentage_under_alpha

percentage of times the INCA test has been rejected for specified value of alpha.

alpha

specified value of the level of the test.

Note

To obtain the INCA statistic distribution, under the null hypothesis, the program can consume long time. For a correct geometrical interpretation it is convenient to verify whether the distance matrix d is Euclidean.

Author(s)

Itziar Irigoien itziar.irigoien@ehu.es; Konputazio Zientziak eta Adimen Artifiziala, Euskal Herriko Unibertsitatea (UPV-EHU), Donostia, Spain.

Conchita Arenas carenas@ub.edu; Departament d'Estadistica, Universitat de Barcelona, Barcelona, Spain.

References

Irigoien, I. and Arenas, C. (2008). INCA: New statistic for estimating the number of clusters and identifying atypical units. Statistics in Medicine, 27(15), 2948–2973.

Arenas, C. and Cuadras, C.M. (2002). Some recent statistical methods based on distances. Contributions to Science, 2, 183–191.

See Also

estW, INCAindex

Examples

#generate 3 clusters, each of them with 20 objects in dimension 5.
mu1 <- sample(1:10, 5, replace=TRUE)
x1 <- matrix(rnorm(20*5, mean = mu1, sd = 1),ncol=5, byrow=TRUE)
mu2 <- sample(1:10, 5, replace=TRUE)
x2 <- matrix(rnorm(20*5, mean = mu2, sd = 1),ncol=5, byrow=TRUE)
mu3 <- sample(1:10, 5, replace=TRUE)
x3 <- matrix(rnorm(20*5, mean = mu3, sd = 1),ncol=5, byrow=TRUE)
x <- rbind(x1,x2,x3)

# Euclidean distance between units in matrix x.
d <- dist(x)
# given the right partition
partition <- c(rep(1,20), rep(2,20), rep(3,20))

# x0 contains a unit from one group, as for example group 1.
x0 <-  matrix(rnorm(1*5, mean = mu1, sd = 1),ncol=5, byrow=TRUE)

# distances between x0 and the other units.
dx0 <- rep(0,60)
for (i in 1:60){
	dif <-x0-x[i,]
	dx0[i] <- sqrt(sum(dif*dif))
}

INCAtest(d, partition, dx0, np=10)


# x0 contains a unit from a new group.
x0 <-  matrix(rnorm(1*5, mean = sample(1:10, 5, replace=TRUE),
        sd = 1), ncol=5, byrow=TRUE)

# distances between x0 and the other units in matrix x.
dx0 <- rep(0,60)
for (i in 1:60){
	dif <-x0-x[i,]
	dx0[i] <- sqrt(sum(dif*dif))
}

INCAtest(d, partition, dx0, np=10)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ICGE)
Loading required package: MASS
Loading required package: cluster
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/ICGE/INCAtest.Rd_%03d_medium.png", width=480, height=480)
> ### Name: INCAtest
> ### Title: INCA Test
> ### Aliases: INCAtest print.incat summary.incat
> ### Keywords: multivariate cluster
> 
> ### ** Examples
> #generate 3 clusters, each of them with 20 objects in dimension 5.
> mu1 <- sample(1:10, 5, replace=TRUE)
> x1 <- matrix(rnorm(20*5, mean = mu1, sd = 1),ncol=5, byrow=TRUE)
> mu2 <- sample(1:10, 5, replace=TRUE)
> x2 <- matrix(rnorm(20*5, mean = mu2, sd = 1),ncol=5, byrow=TRUE)
> mu3 <- sample(1:10, 5, replace=TRUE)
> x3 <- matrix(rnorm(20*5, mean = mu3, sd = 1),ncol=5, byrow=TRUE)
> x <- rbind(x1,x2,x3)
> 
> # Euclidean distance between units in matrix x.
> d <- dist(x)
> # given the right partition
> partition <- c(rep(1,20), rep(2,20), rep(3,20))
> 
> # x0 contains a unit from one group, as for example group 1.
> x0 <-  matrix(rnorm(1*5, mean = mu1, sd = 1),ncol=5, byrow=TRUE)
> 
> # distances between x0 and the other units.
> dx0 <- rep(0,60)
> for (i in 1:60){
+ 	dif <-x0-x[i,]
+ 	dx0[i] <- sqrt(sum(dif*dif))
+ }
> 
> INCAtest(d, partition, dx0, np=10)
        INCA test    
 INCA statistic value = 1.496774 

 U projections values: 
   U_1 = 1.315937 
   U_2 = 107.946 
   U_3 = 90.67355 

 % of significative tests for alpha=  0.05  :  0 
> 
> 
> # x0 contains a unit from a new group.
> x0 <-  matrix(rnorm(1*5, mean = sample(1:10, 5, replace=TRUE),
+         sd = 1), ncol=5, byrow=TRUE)
> 
> # distances between x0 and the other units in matrix x.
> dx0 <- rep(0,60)
> for (i in 1:60){
+ 	dif <-x0-x[i,]
+ 	dx0[i] <- sqrt(sum(dif*dif))
+ }
> 
> INCAtest(d, partition, dx0, np=10)
        INCA test    
 INCA statistic value = 62.8023 

 U projections values: 
   U_1 = 78.77813 
   U_2 = 108.6131 
   U_3 = 45.62288 

 % of significative tests for alpha=  0.05  :  100 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>