A vector, matrix, or data frame consisting of numeric and/or categorical
variables.
maxrows
If the number of observations is greater than maxrows,
HDoutliers reduces the number used in nearest-neighbor
computations to a set of exemplars. The default value is 10000.
radius
Threshold for determining membership in the exemplars's lists
(used only when the number of observations is greater than maxrows).
An observation is added to an exemplars' lists if its distance
to that exemplar is less than radius.
The default value is .1/(log n)^(1/p), where n is the
number of observations and p is the dimension of the data.
alpha
Threshold for determining the cutoff for outliers.
Observations are considered outliers
outliers if they fall in the (1- alpha) tail of the distribution
of the nearest-neighbor distances between exemplars.
Details
If the number of observations exceeds maxrows,
the data is first partitioned into lists associated with exemplars
and their members within radius of each exemplar,
to reduce the number of nearest-neighbor computations required for
outlier detection.
An exponential distribution is then fitted to the upper tail of the
nearest-neighbor distances between exemplars.
Observations are considered
outliers if they fall in the (1- alpha) tail of the fitted CDF.
Value
The indexes of the observations determined to be outliers.
References
Wilkinson, L. (2016). Visualizing Outliers.
See Also
getHDmembers,
getHDoutliers
Examples
data(dots)
out.W <- HDoutliers(dots$W)
## Not run:
plotHDoutliers(dots$W,out.W)
## End(Not run)
data(ex2D)
out.ex2D <- HDoutliers(ex2D)
## Not run:
plotHDoutliers(ex2D,out.ex2D)
## End(Not run)
## Not run:
n <- 100000 # number of observations
set.seed(3)
x <- matrix(rnorm(2*n),n,2)
nout <- 10 # number of outliers
x[sample(1:n,size=nout),] <- 10*runif(2*nout,min=-1,max=1)
out.x <- HDoutliers(x)
## End(Not run)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(HDoutliers)
Loading required package: FNN
Loading required package: FactoMineR
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/HDoutliers/HDoutliers.Rd_%03d_medium.png", width=480, height=480)
> ### Name: HDoutliers
> ### Title: Leland Wilkinson's HDoutliers Algorithm for Outlier Detection
> ### Aliases: HDoutliers
> ### Keywords: cluster
>
> ### ** Examples
>
>
> data(dots)
> out.W <- HDoutliers(dots$W)
> ## Not run:
> ##D plotHDoutliers(dots$W,out.W)
> ## End(Not run)
>
> data(ex2D)
> out.ex2D <- HDoutliers(ex2D)
> ## Not run:
> ##D plotHDoutliers(ex2D,out.ex2D)
> ## End(Not run)
>
> ## Not run:
> ##D n <- 100000 # number of observations
> ##D set.seed(3)
> ##D x <- matrix(rnorm(2*n),n,2)
> ##D nout <- 10 # number of outliers
> ##D x[sample(1:n,size=nout),] <- 10*runif(2*nout,min=-1,max=1)
> ##D
> ##D out.x <- HDoutliers(x)
> ## End(Not run)
>
>
>
>
>
> dev.off()
null device
1
>