Last data update: 2014.03.03

R: The K-sample test statistics for all partition sizes
hhg.univariate.ks.statR Documentation

The K-sample test statistics for all partition sizes

Description

These statistics are used in the omnibus distribution-free test of equality of distributions among K groups, as described in Heller et al. (2014).

Usage

hhg.univariate.ks.stat(x, y,aggregation.type='sum',score.type='LikelihoodRatio',
mmax = max(4,round(min(table(y))/3)),mmin=2)

Arguments

x

a numeric vector of data values. Tied observations are broken at random.

y

for k groups, a vector of integers with values 0:(k-1) which specify the group each observation belongs to.

aggregation.type

a character string specifying the aggregation type, must be one of "sum" (default), "max", or "both".

score.type

a character string specifying the score type, must be one of "LikelihoodRatio" (default), "Pearson", or "both".

mmax

The maximum partition size of the ranked observations, default value is 1/3 the number of observations in the smallest group.

mmin

The minimum partition size of the ranked observations, default value is 2.

Details

For each partition size m= mmin,…,mmax, the function computes the scores in each of the paritions (according to score type), and aggregates all scores according to the aggregation type (see details in Heller et al. , 2014). If the score type is one of "LikelihoodRatio" or "Pearson", and the aggregation type is one of "sum" or "max", then the computed statistic will be in statistic, otherwise the computed statistics will be in the appropriate subset of sum.chisq, sum.lr, max.chisq, and max.lr.

Value

Returns a UnivariateStatistic class object, with the following entries:

statistic

The value of the computed statistic if the score type is one of "LikelihoodRatio" or "Pearson", and the aggregation type is one of "sum" or "max". One of sum.chisq, sum.lr, max.chisq, and max.lr.

sum.chisq

A vector of size mmax-mmin+1, where the m-mmin+1 entry is the average over all Pearson chi-squared statistics from all the K X m contingency tables considered, divided by the total number of observations.

sum.lr

A vector of size mmax-mmin+1, where the m-mmin+1 entry is the average over all LikelihoodRatio statistics from all the K X m contingency tables considered, divided by the total number of observations.

max.chisq

A vector of size mmax-mmin+1, where the m-mmin+1 entry is the maximum over all Pearson chi-squared statistics from all the K X m contingency tables considered.

max.lr

A vector of size mmax-mmin+1, where the m-mmin+1 entry is the maximum over all Pearson chi-squared statistics from all the K X m contingency tables considered.

type

"KSample".

stat.type

"KSample".

size

A vector of size K of the ordered group sample sizes.

score.type

The input score.type.

aggregation.type

The input aggregation.type.

mmin

The input mmin.

mmax

The input mmax.

Author(s)

Barak Brill and Shachar Kaufman.

References

Heller, R., Heller, Y., Kaufman S., Brill B, & Gorfine, M. (2014). Consistent distribution-free K-sample and independence tests for univariate random variables arXiv:1410.6758.

Examples

#Example of computing the test statisics for data from a two-sample problem:

#Two groups, each from a different normal mixture:
X = c(c(rnorm(25,-2,0.7),rnorm(25,2,0.7)),c(rnorm(25,-1.5,0.5),rnorm(25,1.5,0.5)))
Y = (c(rep(0,50),rep(1,50)))
plot(Y,X)


#I) Computing test statistics , with default parameters:
hhg.univariate.Sm.Likelihood.result = hhg.univariate.ks.stat(X,Y)

hhg.univariate.Sm.Likelihood.result

#II) Computing test statistics , with max aggregation type:
hhg.univariate.Mm.likelihood.result = hhg.univariate.ks.stat(X,Y,aggregation.type = 'max')

hhg.univariate.Mm.likelihood.result


Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(HHG)
HHG package for non parametric tests of independence and equality of distributions.
type vignette('HHG') or ?HHG for documentation, examples and a quickstart guide.
use suppressPackageStartupMessages(library(HHG)) to suppress this message.
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/HHG/hhg.univariate.ks.stat.Rd_%03d_medium.png", width=480, height=480)
> ### Name: hhg.univariate.ks.stat
> ### Title: The K-sample test statistics for all partition sizes
> ### Aliases: hhg.univariate.ks.stat
> 
> ### ** Examples
> 
> #Example of computing the test statisics for data from a two-sample problem:
> 
> #Two groups, each from a different normal mixture:
> X = c(c(rnorm(25,-2,0.7),rnorm(25,2,0.7)),c(rnorm(25,-1.5,0.5),rnorm(25,1.5,0.5)))
> Y = (c(rep(0,50),rep(1,50)))
> plot(Y,X)
> 
> 
> #I) Computing test statistics , with default parameters:
> hhg.univariate.Sm.Likelihood.result = hhg.univariate.ks.stat(X,Y)
> 
> hhg.univariate.Sm.Likelihood.result
HHG univariate ksample statistic of type: 
sum of Likelihood Ratio scores over possible partitions.

Minimum partition size: 2  Maximum partition size: 17 

Sample size, by groups:  
50 50

Statistics, by partition size: 
     S.m_2      S.m_3      S.m_4      S.m_5      S.m_6      S.m_7      S.m_8 
0.01786745 0.03948159 0.05714968 0.07142714 0.08338437 0.09381028 0.10322496 
     S.m_9     S.m_10     S.m_11     S.m_12     S.m_13     S.m_14     S.m_15 
0.11196524 0.12025032 0.12822457 0.13598417 0.14359392 0.15109777 0.15852568 
    S.m_16     S.m_17 
0.16589800 0.17322843 
> 
> #II) Computing test statistics , with max aggregation type:
> hhg.univariate.Mm.likelihood.result = hhg.univariate.ks.stat(X,Y,aggregation.type = 'max')
> 
> hhg.univariate.Mm.likelihood.result
HHG univariate ksample statistic of type: 
max of Likelihood Ratio scores over possible partitions.

Minimum partition size: 2  Maximum partition size: 17 

Sample size, by groups:  
50 50

Statistics, by partition size: 
    M.m_2     M.m_3     M.m_4     M.m_5     M.m_6     M.m_7     M.m_8     M.m_9 
 5.922115 12.399306 14.884659 20.463248 22.176300 24.539869 26.296383 28.080189 
   M.m_10    M.m_11    M.m_12    M.m_13    M.m_14    M.m_15    M.m_16    M.m_17 
30.373004 31.814459 33.764252 35.320614 37.115249 38.794282 40.235736 42.145279 
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>