R: Oja Ranks - Affine Equivariant Multivariate Ranks
ojaRank
R Documentation
Oja Ranks – Affine Equivariant Multivariate Ranks
Description
The function computes the Oja rank of a point x w.r.t. a data set X or, if no
point x is given, the Oja ranks of all points in X.
Usage
ojaRank(X, x = NULL, p = NULL, silent = FALSE, na.action = na.fail)
Arguments
X
numeric data.frame or matrix containing the data points as rows.
x
NULL or a numeric vector, the point for which the Oja rank should be computed.
p
NULL or a number between 0 and 1 which specifies the fraction of hyperplanes to be used for subsampling. If p = 1, no subsampling is done.
If p = NULL, the value of p is determined based on the size of the data set. See details.
silent
logical, if subsampling is done or the expected computation time is too long, a warning message will be printed unless silent is TRUE. The default is FALSE.
na.action
a function which indicates what should happen when the data
contain 'NA's. Default is to fail.
Details
The function computes the Oja rank of the point x w.r.t. the data set X or, if no x is specified, the Oja ranks of all data points in
X w.r.t. X. For a definition of Oja rank see reference below.
The matrix X needs to have at least as many rows as columns in order to give sensible results.
The vector x has to be of length ncol(X).
If x is specified, a vector of length ncol(X) is returned. Otherwise the return value is a matrix of the same dimensions as X where the i-th row contains the
Oja rank of the i-th row of X.
The function will also work for matrices X with only one column and also vectors. Then centered and normalized (univariate) ranks are returned.
For n = nrow(X) data points in R^k, where k = ncol(X), the computation of the Oja rank necessitates the evaluation of N = choose(n,k) hyperplanes in R^k.
Thus for large data sets the function offers a subsampling option in order to deliver (approximate) results within reasonable time. The subsampling fraction is controlled by the parameter
p:
If p < 1 is passed to the function, the computation is based on a random sample of only pN of all possible N hyperplanes. If p is not specified
(which defaults to p = NULL), it is automatically determined based on n and k to yield a sensible trade-off between accuracy and computing time. If
choose(n,k)*k^3 < 6e+06, the sample fraction p is set to 1 (no subsampling). Otherwise p is choosen such that the computation
(of one rank) usually takes around 20 seconds (on a 1.66 GHz CPU and 1 GB RAM). If all Oja ranks of X are requested, a hyperplane sample is drawn once,
all Oja ranks are then computed based on this sample.
Finally, subsampling is feasible. Even for very small p useable results can be expected, see e.g. the examples for the function ojaRCM.
Claudia K<c3><83><c2><b6>llmann is acknowledged for bug-fixing this function.
Value
either a numeric vector, the Oja rank of x, or
a matrix of the same dimensions as X containing the Oja ranks of X as rows.
Author(s)
Daniel Vogel, Jyrki M<c3><83><c2><b6>tt<c3><83><c2><b6>nen
References
Oja, H. (1999), Affine invariant multivariate sign and rank tests and corresponding estimates: A review,
Scand. J. Statist., 26, 319–343.
See Also
ojaSign, ojaRCM, hyperplane, ojaSignedRank
Examples
### ----<< Example 1 >>---- : 30 points in R^2
set.seed(123)
X <- rmvnorm(n = 30,mean = c(0,0)) # from package 'mvtnorm'
ojaRank(X)
ojaRank(X, x = c(100,100))
ojaRank(X, x = ojaMedian(X, alg="exact")) # close to zero
# The following two return the same (only in different time)
ojaRank(X)
t(apply(X, 1, function(y){ojaRank(X,y)}))
# but the following two do not (due to different subsampling).
# 1)
set.seed(123); ojaRank(X, p = 0.9, silent = TRUE)
# 2)
set.seed(123)
t(apply(X, 1, function(y){ojaRank(X, y, p = 0.9, silent = TRUE)}))
# In 1) one subsample for all ranks is drawn, whereas in 2)
# a different sample for each rank is drawn.
### ----<< Example 2 >>---- : three points in R^3: only one hyperplane
# The following commands return the same result.
ojaRank(X = diag(rep(1, 3)), x = c(0,0,0))
ojaRank(X = diag(rep(1, 3)), x = c(-100,-110,-5550))
hyperplane(X = diag(rep(1,3)))[1:3]
### ----<< Example 3 >>---- : 300 points in R^7
# Subsampling is done.
# The following example might take a bit longer:
## Not run:
set.seed(123)
X <- rmvnorm(n = 300, mean = rep(0, 7))
system.time(or <- ojaRank(x = 1:7, X = X))
# PLEASE NOTE: The computation of the Oja rank is based on a
# random sub-sample of less than 1% of all possible hyperplanes.
#
# user system elapsed
# 18.47 0.00 18.47
print(or,d=4)
# [1] 7.733 6.613 6.839 7.383 18.237 21.851 23.700
## End(Not run)
### ----<< Example 4 >>---- : univariate ranks
ojaRank(1:10)
ojaRank(X = 1:10, x = 5.5)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(OjaNP)
Loading required package: ICS
Loading required package: mvtnorm
Loading required package: ICSNP
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/OjaNP/ojaRank.Rd_%03d_medium.png", width=480, height=480)
> ### Name: ojaRank
> ### Title: Oja Ranks - Affine Equivariant Multivariate Ranks
> ### Aliases: ojaRank
> ### Keywords: multivariate nonparametric
>
> ### ** Examples
>
> ### ----<< Example 1 >>---- : 30 points in R^2
> set.seed(123)
> X <- rmvnorm(n = 30,mean = c(0,0)) # from package 'mvtnorm'
> ojaRank(X)
[,1] [,2]
[1,] -0.56228243 -0.17795924
[2,] 0.98048593 0.03176296
[3,] 0.37565777 0.84091824
[4,] 0.14494860 -0.72689765
[5,] -0.69952345 -0.41167858
[6,] 0.89831793 0.35024385
[7,] 0.27450234 0.16397828
[8,] -0.24716321 0.75351274
[9,] 0.07040030 -0.85071109
[10,] 0.52136138 -0.43239617
[11,] -0.99425381 -0.13395380
[12,] -0.87011671 -0.58489231
[13,] -0.47220638 -0.83785899
[14,] 0.75404098 0.16333734
[15,] -0.78834753 0.38087950
[16,] 0.26894568 -0.39500475
[17,] 0.69727577 0.72548046
[18,] 0.66539344 0.57634894
[19,] 0.47238119 -0.07565708
[20,] -0.25578192 -0.39135486
[21,] -0.74332050 -0.14158425
[22,] -0.58314496 0.58715787
[23,] 0.70740499 -0.52904203
[24,] -0.38067106 -0.51466130
[25,] 0.71958632 -0.09646419
[26,] 0.06103402 -0.04257788
[27,] 0.12542546 0.70658736
[28,] -0.07891634 0.72498679
[29,] -0.99274170 0.14495552
[30,] -0.06869210 0.19254429
> ojaRank(X, x = c(100,100))
[1] 0.8792920 0.7312051
> ojaRank(X, x = ojaMedian(X, alg="exact")) # close to zero
[1] 0.003498390 0.002613336
>
> # The following two return the same (only in different time)
> ojaRank(X)
[,1] [,2]
[1,] -0.56228243 -0.17795924
[2,] 0.98048593 0.03176296
[3,] 0.37565777 0.84091824
[4,] 0.14494860 -0.72689765
[5,] -0.69952345 -0.41167858
[6,] 0.89831793 0.35024385
[7,] 0.27450234 0.16397828
[8,] -0.24716321 0.75351274
[9,] 0.07040030 -0.85071109
[10,] 0.52136138 -0.43239617
[11,] -0.99425381 -0.13395380
[12,] -0.87011671 -0.58489231
[13,] -0.47220638 -0.83785899
[14,] 0.75404098 0.16333734
[15,] -0.78834753 0.38087950
[16,] 0.26894568 -0.39500475
[17,] 0.69727577 0.72548046
[18,] 0.66539344 0.57634894
[19,] 0.47238119 -0.07565708
[20,] -0.25578192 -0.39135486
[21,] -0.74332050 -0.14158425
[22,] -0.58314496 0.58715787
[23,] 0.70740499 -0.52904203
[24,] -0.38067106 -0.51466130
[25,] 0.71958632 -0.09646419
[26,] 0.06103402 -0.04257788
[27,] 0.12542546 0.70658736
[28,] -0.07891634 0.72498679
[29,] -0.99274170 0.14495552
[30,] -0.06869210 0.19254429
> t(apply(X, 1, function(y){ojaRank(X,y)}))
[,1] [,2]
[1,] -0.56228243 -0.17795924
[2,] 0.98048593 0.03176296
[3,] 0.37565777 0.84091824
[4,] 0.14494860 -0.72689765
[5,] -0.69952345 -0.41167858
[6,] 0.89831793 0.35024385
[7,] 0.27450234 0.16397828
[8,] -0.24716321 0.75351274
[9,] 0.07040030 -0.85071109
[10,] 0.52136138 -0.43239617
[11,] -0.99425381 -0.13395380
[12,] -0.87011671 -0.58489231
[13,] -0.47220638 -0.83785899
[14,] 0.75404098 0.16333734
[15,] -0.78834753 0.38087950
[16,] 0.26894568 -0.39500475
[17,] 0.69727577 0.72548046
[18,] 0.66539344 0.57634894
[19,] 0.47238119 -0.07565708
[20,] -0.25578192 -0.39135486
[21,] -0.74332050 -0.14158425
[22,] -0.58314496 0.58715787
[23,] 0.70740499 -0.52904203
[24,] -0.38067106 -0.51466130
[25,] 0.71958632 -0.09646419
[26,] 0.06103402 -0.04257788
[27,] 0.12542546 0.70658736
[28,] -0.07891634 0.72498679
[29,] -0.99274170 0.14495552
[30,] -0.06869210 0.19254429
> # but the following two do not (due to different subsampling).
> # 1)
> set.seed(123); ojaRank(X, p = 0.9, silent = TRUE)
[,1] [,2]
[1,] -0.55802960 -0.15266959
[2,] 0.95149804 0.01557666
[3,] 0.36492913 0.85347230
[4,] 0.15544607 -0.73120690
[5,] -0.70292529 -0.38090489
[6,] 0.86221221 0.34550849
[7,] 0.25435541 0.17666418
[8,] -0.22770509 0.76311443
[9,] 0.09043469 -0.85178895
[10,] 0.52125782 -0.42575411
[11,] -0.97219538 -0.12312685
[12,] -0.86403094 -0.58119067
[13,] -0.47721156 -0.83665374
[14,] 0.71443743 0.15646975
[15,] -0.75917951 0.38658504
[16,] 0.27521202 -0.37978103
[17,] 0.64835631 0.74712446
[18,] 0.61921478 0.59199352
[19,] 0.45785554 -0.07644911
[20,] -0.25531108 -0.37217913
[21,] -0.72017945 -0.12182411
[22,] -0.56036043 0.59334621
[23,] 0.69097806 -0.54700704
[24,] -0.37151514 -0.50083834
[25,] 0.68576666 -0.10239716
[26,] 0.05612722 -0.02715184
[27,] 0.11098299 0.71145994
[28,] -0.08027701 0.72946955
[29,] -0.96704771 0.15048485
[30,] -0.08554257 0.22117992
> # 2)
> set.seed(123)
> t(apply(X, 1, function(y){ojaRank(X, y, p = 0.9, silent = TRUE)}))
[,1] [,2]
[1,] -0.55802960 -0.15266959
[2,] 0.99218753 0.05502594
[3,] 0.39624887 0.84608702
[4,] 0.15678627 -0.72498381
[5,] -0.67453509 -0.41070044
[6,] 0.86651591 0.35037156
[7,] 0.27575847 0.16208564
[8,] -0.25591875 0.75496747
[9,] 0.07808861 -0.86158873
[10,] 0.55759710 -0.42828275
[11,] -0.97554779 -0.10002450
[12,] -0.86590098 -0.57683292
[13,] -0.46155176 -0.84726202
[14,] 0.72618622 0.15258497
[15,] -0.77951527 0.38306997
[16,] 0.25237210 -0.40842626
[17,] 0.70421832 0.73208227
[18,] 0.67453583 0.57203996
[19,] 0.45476981 -0.06638932
[20,] -0.25739940 -0.39966981
[21,] -0.75137051 -0.12424319
[22,] -0.57129219 0.59650076
[23,] 0.68232088 -0.56446510
[24,] -0.35339958 -0.50187433
[25,] 0.73201306 -0.08733235
[26,] 0.07384314 -0.03424140
[27,] 0.13295764 0.68870267
[28,] -0.06923768 0.70016491
[29,] -1.02213127 0.13234891
[30,] -0.08592449 0.18621720
> # In 1) one subsample for all ranks is drawn, whereas in 2)
> # a different sample for each rank is drawn.
>
>
> ### ----<< Example 2 >>---- : three points in R^3: only one hyperplane
> # The following commands return the same result.
> ojaRank(X = diag(rep(1, 3)), x = c(0,0,0))
[1] -1 -1 -1
> ojaRank(X = diag(rep(1, 3)), x = c(-100,-110,-5550))
[1] -1 -1 -1
> hyperplane(X = diag(rep(1,3)))[1:3]
[1] -1 -1 -1
>
>
>
> ### ----<< Example 3 >>---- : 300 points in R^7
> # Subsampling is done.
> # The following example might take a bit longer:
> ## Not run:
> ##D set.seed(123)
> ##D X <- rmvnorm(n = 300, mean = rep(0, 7))
> ##D system.time(or <- ojaRank(x = 1:7, X = X))
> ##D # PLEASE NOTE: The computation of the Oja rank is based on a
> ##D # random sub-sample of less than 1% of all possible hyperplanes.
> ##D #
> ##D # user system elapsed
> ##D # 18.47 0.00 18.47
> ##D print(or,d=4)
> ##D # [1] 7.733 6.613 6.839 7.383 18.237 21.851 23.700
> ## End(Not run)
>
>
> ### ----<< Example 4 >>---- : univariate ranks
> ojaRank(1:10)
[,1]
[1,] -0.9
[2,] -0.7
[3,] -0.5
[4,] -0.3
[5,] -0.1
[6,] 0.1
[7,] 0.3
[8,] 0.5
[9,] 0.7
[10,] 0.9
> ojaRank(X = 1:10, x = 5.5)
[1] -1.110223e-17
>
>
>
>
>
> dev.off()
null device
1
>