Last data update: 2014.03.03

R: Oja Median
ojaMedianR Documentation

Oja Median

Description

Function to compute the Oja median. Several algorithms are possible.

Usage

ojaMedian(X, alg = "evolutionary", sp = 1, na.action = na.fail, 
          control = ojaMedianControl(...), ...)
          
ojaMedianEvo(X, control = ojaMedianControl(...), ...)
ojaMedianGrid(X, control = ojaMedianControl(...), ...)
ojaMedianEx(X, control = ojaMedianControl(...), ...)
ojaMedianExB(X, control = ojaMedianControl(...), ...)

Arguments

X

numeric data.frame or matrix.

alg

character string denoting the algorithm to be used for computing the Oja median. Options are "exact", "bounded_exact", "evolutionary" and "grid". Default is
"evolutionary". See Details.

sp

number of runs to average over.

na.action

a function which indicates what should happen when the data contain 'NA's. Default is to fail.

control

a list specifying the control parameters of the different algorithms; use the function ojaMedianControl and see its help page.

...

can be used to specify control parameters directly instead of via control.

Details

There are four possible algorithms to calculate the Oja median. The exact algorithm uses a gradient method. It follows intersection lines of hyperplanes until it reaches the minimum of an objective function. It is computationally a very intensive algorithm and it calculates the Oja median in acceptable time in the bivariate case for at least 1200 datapoints. For a 7-dimensional dataset it is possible to calculate it for 24 datapoints.

The bounded exact algorithm modifies the exact algorithm by employing bounded regions which contain the median. The regions are built using the centered rank function. The new algorithm is faster and has less complexity. Parameter volume is the desired size of the bounded region, which is selected as a part of the original volume. Here the volume is calculated as the volume of a minimal multivariate circumscribed rectangle with edges parallel to the coordinate axes. Setting parameter boundedExact to FALSE stops the algorithm after the bounded region is found, and its center is reported as an approximation of the median.

With the evolutionary algorithm it is possible to calculate an approximative solution. It starts with a random point and mutates this temporary best solution in order to gain a better one. There are several options to control the mutation process. If you are interested in a fast calculation of the Oja median and you tolerate a higher error rate, you should set sigmaAdaption to 1. As a second possibility you could limit the number of subsets used to a small number. If you use all subsets, there are in total n choose k, with n number of datapoints and k dimensions. If you are interested in a precise solution, the following options have turned out to be useful: initialSigma: 0.5, sigmaAdaptation: 20, adaptationFactor: 0.5, sigmaLog20Decrease: 10. Tests have been made in the bivariate case, but these values should work for every dimension. In the bivariate case it is possible to calculate the Oja median for more than 22*10^6 datapoints. In the 10-dimensional case the algorithm is still able to calculate an approximative solution for 10^6 datapoints. Before the algorithm starts itself we transform the data with ICS in order to get a more stable version (with respect to the location of the data) and improve the quality of the approximation. Another reason for this was to get an affine invariant way of the approximation.

The fourth algorithm calculates the Oja median by means of a grid. The grid points are possible approximations of the Oja median. Every grid point is tested to be the Oja median. If the test results are not unique the algorithm will take a bigger sample of subsets into account and test it again. In comparison to the evolutionary algorithm it is slower and less precise. Only in special data situations it might be useful. The algorithm constitutes an earlier heuristical solution to the Oja median problem and is included mainly for historical reasons.

The exact algorithm and the grid algorithm are also described in Ronkainen et al. (2002). The bounded search algorithm is described in Mosler and Pokotylo (2015).

A lot of calculation time in the ojaMedian function might be spend for checking the input and for transforming it. So if you do time-critical calculations, e.g. with loops, you might want to take the variants ojaMedianEx, ojaMedianExB, ojaMedianEvo or ojaMedianGrid. Please use this only if you know what you are doing, because there are no checks, just the .Call to the algorithm itself.

If the dimension of your data is too big or if there are too many observations, it is possible that the exact algorithm will crash R. On a common PC with a 32-bit operating system the following combinations of dimension:amount will work fine: 2:1200, 3:300, 4:100, 5:63, 6:38, 7:24. Bigger datasets might be possible, depending on your system.

Another general restriction with this function is that there should be more data points than dimensions.

There is a demo available which demonstrates graphically the Oja median in simple data situations in the bivariate case. To view the demo run demo(ojaMedianDemo).

Value

a numeric vector containing the Oja median.

Author(s)

Ported to R by Daniel Fischer. Original C++-code by Thorsten Bernholt, Robin Nunkesser and Tommi Ronkainen. Bounded search algorithm by Oleksii Pokotylo.

References

Oja, H. (1983), Descriptive statistics for multivariate distributions, Statistics and Probability Letters, 1, 327–332.

Ronkainen, T., Oja, H. and Orponen, P. (2002), Computation of the multivariate Oja median, in Dutter R., Filzmoser P.,Gather U. and Rousseeuw, P. J.: Developments in Robust Statistics, Heidelberg: Springer, 344–359.

Fischer, D. (2008), Diplomarbeit, Statistische Eigenschaften des Oja-Medians mit einer algorithmischen Betrachtung, Dortmund: Technische Universit<c3><83><c2><a4>t Dortmund. In German.

Mosler, K. and Pokotylo, O. (2015), "Computation of the Oja Median by Bounded Search." Modern Nonparametric, Robust and Multivariate Methods. Springer International Publishing, 185–203.

Examples


data(biochem)
X <- as.matrix(biochem[,1:2])
ojaMedian(X)
ojaMedian(X, alg = "evo")
ex <-ojaMedian(X, alg = "exact")
exb<-ojaMedian(X, alg = "bounded_exact")

ojaMedianFn(X, ex)
ojaMedianFn(X, exb)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(OjaNP)
Loading required package: ICS
Loading required package: mvtnorm
Loading required package: ICSNP
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/OjaNP/ojaMedian.Rd_%03d_medium.png", width=480, height=480)
> ### Name: ojaMedian
> ### Title: Oja Median
> ### Aliases: ojaMedian ojaMedianEvo ojaMedianGrid ojaMedianEx ojaMedianExB
> ### Keywords: multivariate nonparametric
> 
> ### ** Examples
> 
> 
> data(biochem)
> X <- as.matrix(biochem[,1:2])
> ojaMedian(X)
   comp.1    comp.2 
1.1488645 0.4337275 
> ojaMedian(X, alg = "evo")
   comp.1    comp.2 
1.1528800 0.4242399 
> ex <-ojaMedian(X, alg = "exact")
> exb<-ojaMedian(X, alg = "bounded_exact")
> 
> ojaMedianFn(X, ex)
[1] 1.511852
> ojaMedianFn(X, exb)
[1] 1.510338
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>