R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Dependency detection between level k (k > 1) categorical...

ds_eqp_k

R Documentation

Dependency detection between level k (k > 1) categorical variable and continuous variable

Description

Dependency detection between level k (k > 1) categorical variable and continuous variable via dynamic slicing with O(n^{1/2})-resolution. The basic idea is almost the same as ds_k. The only different is that ds_eqp_k groups samples into approximate O(n^{1/2}) groups which contain approximate O(n^{1/2}) samples and performs dynamic slicing on their boundaries. This much faster version could reduce computation time substantially without too much power loss. Based on the strategy of ds_eqp_k, we recommend to apply it in large sample size problem and use ds_k for ordinary problem. For more details please refer to Jiang, Ye & Liu (2015). Results contains value of dynamic slicing statistic and slicing strategy. It could be applied for non-parametric K-sample hypothesis testing.

Usage

  ds_eqp_k(x, xdim, lambda, slice = FALSE)

Arguments

`x`	Vector: observations of categorical variable, 0,1,…,k-1 for level k categorical variable, should be ranked according to values of continuous variable in advanced, either ascending or descending.
`xdim`	Level of `x`, equals k.
`lambda`	Penalty for introducing an additional slice, which is used to avoid making too many slices. It corresponds to the type I error under the scenario that the two variables are independent. `lambda` should be greater than 0.
`slice`	Indicator for reporting slicing strategy or not.

Value

`dsval`	Value of dynamic slicing statistic. It is nonnegative. If it equals zero, the categorical variable and continuous variable will be treated as independent of each other, otherwise they will be treated as dependent.
`slices`	Slicing strategy that maximize dynamic slicing statistic based on currently ranked vector `x`. It will be reported if `slice` is true. Each row stands for a slice. Each column except the last one stands for the number of observations take each value in each slice. The last column is the number of observations in each slice i.e., the sum of the first column to the kth column.

References

Jiang, B., Ye, C. and Liu, J.S. Non-parametric K-sample tests via dynamic slicing. Journal of the American Statistical Association, 110(510): 642-653, 2015.

Examples

n <- 100
mu <- 0.5
y <- c(rnorm(n, -mu, 1), rnorm(n, mu, 1))
x <- c(rep("1", n), rep("2", n))
x <- relabel(x)
x <- x[order(y)]
xdim <- max(x) + 1
lambda <- 1.0
dsres <- ds_eqp_k(x, xdim, lambda, slice = TRUE)