Last data update: 2014.03.03

R: Heller-Heller-Gorfine (HHG) Tests of Independence and...
HHG-packageR Documentation

Heller-Heller-Gorfine (HHG) Tests of Independence and Equality of Distributions

Description

This R package implements the permutation test of independnece between two random vectors of arbitrary dimensions, and equality of two or more multivariate distributions, introduced in Heller et al. (2013), as well as the distribution-free tests of independence and equality of distribution between two univariate random variables introduced in Heller et al. (2014).

Details

Package: HHG
Type: Package
Version: 1.5.1
Date: 2015-07-13
License: GPL-2

The package contains five major functions:

hhg.test - the permutation test for independence of two multivariate (or univariate) vectors.

hhg.test.k.sample - the permutation test for equality of a multivariate (or univariate) distribution across K groups.

hhg.test.2.sample - the permutaiton test for equality of a multivariate (or univariate) distribution across 2 groups.

hhg.univariate.ind.combined.test - the distribution-free test for independence of two univariate random variables.

hhg.univariate.ks.combined.test - the distribution-free test for equality of a univariate distribution across K groups.

See vignette('HHG') for additional information.

Author(s)

Barak Brill & Shachar Kaufman, based in part on an earlier implementation of the original HHG test by Ruth Heller <ruheller@post.tau.ac.il> and Yair Heller <heller.yair@gmail.com>. Maintainer: Barak Brill <barakbri@mail.tau.ac.il>

References

Heller, R., Heller, Y., and Gorfine, M. (2013). A consistent multivariate test of association based on ranks of distances. Biometrika, 100(2), 503-510.

Heller, R., Heller, Y., Kaufman S., Brill B., and Gorfine, M. (2014). Consistent distribution-free K-sample and independence tests for univariate random variables arXiv:1410.6758.

Examples


## Not run: 

# Some examples, for more see the vignette('HHG') and specific help pages

#######################################
#1. Univariate Independence Example
#######################################

N = 30
data = hhg.example.datagen(N, 'Parabola')
X = data[1,]
Y = data[2,]
plot(X,Y)

#Option 1: Perform the ADP combined test
#using partitions sizes up to 4. see documentation for other parameters of the combined test 
#(it is recommended to use mmax >= 4, or the default parameter for large data sets)
combined = hhg.univariate.ind.combined.test(X,Y,nr.perm = 200,mmax=4)
combined


#Option 2: Perform the hhg test:

## Compute distance matrices, on which the HHG test will be based
Dx = as.matrix(dist((X), diag = TRUE, upper = TRUE))
Dy = as.matrix(dist((Y), diag = TRUE, upper = TRUE))

hhg = hhg.test(Dx, Dy, nr.perm = 1000)

hhg

#######################################
#2. Univariate K-Sample Example
#######################################

N0=50
N1=50
X = c(c(rnorm(N0/2,-2,0.7),rnorm(N0/2,2,0.7)),c(rnorm(N1/2,-1.5,0.5),rnorm(N1/2,1.5,0.5)))
Y = (c(rep(0,N0),rep(1,N1)))
#plot the two distributions by group index (0 or 1)
plot(Y,X)


#Option 1: Perform the Sm combined test


combined.test = hhg.univariate.ks.combined.test(X,Y)
combined.test


#Option 2: Perform the hhg K-sample test:


Dx = as.matrix(dist(X, diag = TRUE, upper = TRUE))

hhg = hhg.test.k.sample(Dx, Y, nr.perm = 1000)

hhg


#######################################
#3. Multivariate Independence Example:
#######################################

n=30 #number of samples
dimensions_x=5 #dimension of X matrix
dimensions_y=5 #dimension of Y matrix
X=matrix(rnorm(n*dimensions_x,mean = 0, sd = 1),nrow = n,ncol = dimensions_x) #generate noise
Y=matrix(rnorm(n*dimensions_y,mean =0, sd = 3),nrow = n,ncol = dimensions_y)

Y[,1] = Y[,1] + X[,1] + 4*(X[,1])^2 #add in the relations
Y[,2] = Y[,2] + X[,2] + 4*(X[,2])^2

#compute the distance matrix between observations.
#User may use other distance metrics.
Dx = as.matrix(dist((X)), diag = TRUE, upper = TRUE) 
Dy = as.matrix(dist((Y)), diag = TRUE, upper = TRUE)

#run test
hhg = hhg.test(Dx, Dy, nr.perm = 1000)

hhg


#######################################
#4. Multivariate K-Sample Example
#######################################

#multivariate k-sample, with k=3 groups
n=100 #number of samples in each group
x1 = matrix(rnorm(2*n),ncol = 2) #group 1
x2 = matrix(rnorm(2*n),ncol = 2) #group 2
x2[,2] = 1*x2[,1] + x2[,2]
x3 = matrix(rnorm(2*n),ncol = 2) #group 3
x3[,2] = -1*x3[,1] + x3[,2]
x= rbind(x1,x2,x3)
y=c(rep(0,n),rep(1,n),rep(2,n)) #group numbers, starting from 0 to k-1

plot(x[,1],x[,2],col = y+1,xlab = 'first component of X',ylab = 'second component of X',
     main = 'Multivariate K-Sample Example with K=3 \n Groups Marked by Different Colors')

Dx = as.matrix(dist(x, diag = TRUE, upper = TRUE)) #distance matrix

hhg = hhg.test.k.sample(Dx, y, nr.perm = 1000) 

hhg


## End(Not run)

Results