The K-sample test statistics for all partition sizes
hhg.univariate.ks.statR Documentation

The K-sample test statistics for all partition sizes


These statistics are used in the omnibus distribution-free test of equality of distributions among K groups, as described in Heller et al. (2014).


hhg.univariate.ks.stat(x, y,aggregation.type='sum',score.type='LikelihoodRatio',
mmax = max(4,round(min(table(y))/3)),mmin=2)



a numeric vector of data values. Tied observations are broken at random.


for k groups, a vector of integers with values 0:(k-1) which specify the group each observation belongs to.


a character string specifying the aggregation type, must be one of "sum" (default), "max", or "both".


a character string specifying the score type, must be one of "LikelihoodRatio" (default), "Pearson", or "both".


The maximum partition size of the ranked observations, default value is 1/3 the number of observations in the smallest group.


The minimum partition size of the ranked observations, default value is 2.


For each partition size m= mmin,…,mmax, the function computes the scores in each of the paritions (according to score type), and aggregates all scores according to the aggregation type (see details in Heller et al. , 2014). If the score type is one of "LikelihoodRatio" or "Pearson", and the aggregation type is one of "sum" or "max", then the computed statistic will be in statistic, otherwise the computed statistics will be in the appropriate subset of sum.chisq,, max.chisq, and


Returns a UnivariateStatistic class object, with the following entries:


The value of the computed statistic if the score type is one of "LikelihoodRatio" or "Pearson", and the aggregation type is one of "sum" or "max". One of sum.chisq,, max.chisq, and


A vector of size mmax-mmin+1, where the m-mmin+1 entry is the average over all Pearson chi-squared statistics from all the K X m contingency tables considered, divided by the total number of observations.

A vector of size mmax-mmin+1, where the m-mmin+1 entry is the average over all LikelihoodRatio statistics from all the K X m contingency tables considered, divided by the total number of observations.


A vector of size mmax-mmin+1, where the m-mmin+1 entry is the maximum over all Pearson chi-squared statistics from all the K X m contingency tables considered.

A vector of size K of the ordered group sample sizes.


The input score.type.


The input aggregation.type.


The input mmin.


The input mmax.


Barak Brill and Shachar Kaufman.


Heller, R., Heller, Y., Kaufman S., Brill B, & Gorfine, M. (2014). Consistent distribution-free K-sample and independence tests for univariate random variables arXiv:1410.6758.


#Example of computing the test statisics for data from a two-sample problem:

#Two groups, each from a different normal mixture:
X = c(c(rnorm(25,-2,0.7),rnorm(25,2,0.7)),c(rnorm(25,-1.5,0.5),rnorm(25,1.5,0.5)))
Y = (c(rep(0,50),rep(1,50)))

#I) Computing test statistics , with default parameters:
hhg.univariate.Sm.Likelihood.result = hhg.univariate.ks.stat(X,Y)


#II) Computing test statistics , with max aggregation type:
hhg.univariate.Mm.likelihood.result = hhg.univariate.ks.stat(X,Y,aggregation.type = 'max')



> #Example of computing the test statisics for data from a two-sample problem:
> #Two groups, each from a different normal mixture:
> X = c(c(rnorm(25,-2,0.7),rnorm(25,2,0.7)),c(rnorm(25,-1.5,0.5),rnorm(25,1.5,0.5)))
> Y = (c(rep(0,50),rep(1,50)))
> plot(Y,X)
> #I) Computing test statistics , with default parameters:
> hhg.univariate.Sm.Likelihood.result = hhg.univariate.ks.stat(X,Y)
> hhg.univariate.Sm.Likelihood.result
HHG univariate ksample statistic of type: 
sum of Likelihood Ratio scores over possible partitions.

Minimum partition size: 2  Maximum partition size: 17 

Sample size, by groups:  
50 50

Statistics, by partition size: 
     S.m_2      S.m_3      S.m_4      S.m_5      S.m_6      S.m_7      S.m_8 
0.01786745 0.03948159 0.05714968 0.07142714 0.08338437 0.09381028 0.10322496 
     S.m_9     S.m_10     S.m_11     S.m_12     S.m_13     S.m_14     S.m_15 
0.11196524 0.12025032 0.12822457 0.13598417 0.14359392 0.15109777 0.15852568 
    S.m_16     S.m_17 
0.16589800 0.17322843 
> #II) Computing test statistics , with max aggregation type:
> hhg.univariate.Mm.likelihood.result = hhg.univariate.ks.stat(X,Y,aggregation.type = 'max')
> hhg.univariate.Mm.likelihood.result
HHG univariate ksample statistic of type: 
max of Likelihood Ratio scores over possible partitions.

Minimum partition size: 2  Maximum partition size: 17 

Sample size, by groups:  
50 50

Statistics, by partition size: 
    M.m_2     M.m_3     M.m_4     M.m_5     M.m_6     M.m_7     M.m_8     M.m_9 
 5.922115 12.399306 14.884659 20.463248 22.176300 24.539869 26.296383 28.080189 
   M.m_10    M.m_11    M.m_12    M.m_13    M.m_14    M.m_15    M.m_16    M.m_17 
30.373004 31.814459 33.764252 35.320614 37.115249 38.794282 40.235736 42.145279 
