R: Dependency detection between level k (k > 1) categorical...
ds_k
R Documentation
Dependency detection between level k (k > 1) categorical variable and continuous variable
Description
Dependency detection between level k (k > 1) categorical variable and continuous variable. The basic idea is that the different values of categorical variable correspond to different distribution of continuous variable if there exist dependency between this two varibles, otherwise the distributions of continuous variable do not show difference conditioning on the values of categorical variable. Statistic for this dynamic slicing method is a regularized likelihood-ratio calculated via a dynamic programming procedure. For more details please refer to Jiang, Ye & Liu (2015). Results contains value of dynamic slicing statistic and slicing strategy. It could be applied for non-parametric K-sample hypothesis testing.
Usage
ds_k(x, xdim, lambda, slice = FALSE)
Arguments
x
Vector: observations of categorical variable, 0,1,…,k-1 for level k categorical variable, should be ranked according to values of continuous variable in advanced, either ascending or descending.
xdim
Level of x, equals k.
lambda
Penalty for introducing an additional slice, which is used to avoid making too many slices. It corresponds to the type I error under the scenario that the two variables are independent. lambda should be greater than 0.
slice
Indicator for reporting slicing strategy or not.
Value
dsval
Value of dynamic slicing statistic. It is nonnegative. If it equals zero, the categorical variable and continuous variable will be treated as independent of each other, otherwise they will be treated as dependent.
slices
Slicing strategy that maximize dynamic slicing statistic based on currently ranked vector x. It will be reported if slice is true. Each row stands for a slice. Each column except the last one stands for the number of observations take each value in each slice. The last column is the number of observations in each slice i.e., the sum of the first column to the kth column.
References
Jiang, B., Ye, C. and Liu, J.S. Non-parametric K-sample tests via dynamic slicing. Journal of the American Statistical Association, 110(510): 642-653, 2015.
See Also
ds_eqp_k.
Examples
n <- 100
mu <- 0.5
y <- c(rnorm(n, -mu, 1), rnorm(n, mu, 1))
x <- c(rep("1", n), rep("2", n))
x <- relabel(x)
x <- x[order(y)]
xdim <- max(x) + 1
lambda <- 1.0
dsres <- ds_k(x, xdim, lambda, slice = TRUE)