Last data update: 2014.03.03

R: Outlier Detection Stage of the HD Outliers Algorithm
getHDoutliersR Documentation

Outlier Detection Stage of the HD Outliers Algorithm

Description

Detects outliers based on a probability model.

Usage

getHDoutliers(data, memberLists, alpha = 0.05) 

Arguments

data

A vector, matrix, or data frame consisting of numeric and/or categorical variables.

memberLists

A list following the structure of the output to getHDmembers, in which each component is a vector of observation indexes. The first index in each list is the index of the exemplar representing that list, and any remaining indexes are the associated members, considered ‘close to’ the exemplar.

alpha

Threshold for determining the cutoff for outliers. Observations are considered outliers outliers if they fall in the (1- alpha) tail of the distribution of the nearest-neighbor distances between exemplars.

Details

An exponential distribution is fitted to the upper tail of the nearest-neighbor distances between exemplars (the observations considered representatives of each component of memberLists). Observations are considered outliers if they fall in the (1- alpha) tail of the fitted CDF.

Value

The indexes of the observations determined to be outliers.

References

Wilkinson, L. (2016). Visualizing Outliers.

Note

A call to getHDoutliers in which membersLists result from a call to getHDmembers is equivalent to calling HDoutliers.

See Also

HDoutliers, getHDmembers

Examples


data(dots)
mem.W <- getHDmembers(dots$W)
out.W <- getHDoutliers(dots$W,mem.W)
## Not run: 
plotHDoutliers( dots.W, out.W)
## End(Not run)

data(ex2D)
mem.ex2D <- getHDmembers(ex2D)
out.ex2D <- getHDoutliers( ex2D, mem.ex2D)
## Not run: 
plotHDoutliers( ex2D, out.ex2D)
## End(Not run)

## Not run: 
n <- 100000 # number of observations
set.seed(3)
x <- matrix(rnorm(2*n),n,2)
nout <- 10 # number of outliers
x[sample(1:n,size=nout),] <- 10*runif(2*nout,min=-1,max=1)

mem.x <- getHDmembers(x)
out.x <- getHDoutliers(x)
## End(Not run)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(HDoutliers)
Loading required package: FNN
Loading required package: FactoMineR
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/HDoutliers/getHDoutliers.Rd_%03d_medium.png", width=480, height=480)
> ### Name: getHDoutliers
> ### Title: Outlier Detection Stage of the HD Outliers Algorithm
> ### Aliases: getHDoutliers
> ### Keywords: cluster
> 
> ### ** Examples
> 
> 
> data(dots)
> mem.W <- getHDmembers(dots$W)
> out.W <- getHDoutliers(dots$W,mem.W)
> ## Not run: 
> ##D plotHDoutliers( dots.W, out.W)
> ## End(Not run)
> 
> data(ex2D)
> mem.ex2D <- getHDmembers(ex2D)
> out.ex2D <- getHDoutliers( ex2D, mem.ex2D)
> ## Not run: 
> ##D plotHDoutliers( ex2D, out.ex2D)
> ## End(Not run)
> 
> ## Not run: 
> ##D n <- 100000 # number of observations
> ##D set.seed(3)
> ##D x <- matrix(rnorm(2*n),n,2)
> ##D nout <- 10 # number of outliers
> ##D x[sample(1:n,size=nout),] <- 10*runif(2*nout,min=-1,max=1)
> ##D 
> ##D mem.x <- getHDmembers(x)
> ##D out.x <- getHDoutliers(x)
> ## End(Not run)
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>