A vector, matrix, or data frame consisting of numeric and/or categorical
variables.
maxrows
If the number of observations is greater than maxrows,
HDoutliers reduces the number used in nearest-neighbor
computations to a set of exemplars. The default value is 10000.
radius
Threshold for determining membership in the exemplars's lists
(used only when the number of observations is greater than maxrows).
An observation is added to an exemplars' list if its distance
to that exemplar is less than radius.
The default value is .1/(log n)^(1/p), where n is the
number of observations and p is the dimension of the data.
Details
If the number of observations exceeds maxrows, the data is
partitioned into lists corresponding to exemplars
and their members within radius of each exemplar,
to reduce the number of nearest-neighbor computations required for
outlier detection.
When there are fewer observations, the result is a list whose elements are
the individual observations (each observation is an exemplar, with no
other members).
Value
A list in which each component is a vector of observation indexes.
The first index in each list is the index of the exemplar
defining that list, and any remaining indexes are the
associated members, within radius of the exemplar.
References
Wilkinson, L. (2016). Visualizing Outliers.
See Also
HDoutliers,
getHDoutliers
Examples
data(dots)
mem.W <- getHDmembers(dots$W)
out.W <- getHDoutliers(dots$W,mem.W)
data(ex2D)
mem.ex2D <- getHDmembers(ex2D)
out.ex2D <- getHDoutliers(ex2D,mem.ex2D)
## Not run:
n <- 100000 # number of observations
set.seed(3)
x <- matrix(rnorm(2*n),n,2)
nout <- 10 # number of outliers
x[sample(1:n,size=nout),] <- 10*runif(2*nout,min=-1,max=1)
mem.x <- getHDmembers(x)
out.x <- getHDoutliers(x,mem.x)
## End(Not run)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(HDoutliers)
Loading required package: FNN
Loading required package: FactoMineR
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/HDoutliers/getHDmembers.Rd_%03d_medium.png", width=480, height=480)
> ### Name: getHDmembers
> ### Title: Partitioning Stage of the HDoutliers Algorithm
> ### Aliases: getHDmembers
> ### Keywords: cluster
>
> ### ** Examples
>
>
> data(dots)
> mem.W <- getHDmembers(dots$W)
> out.W <- getHDoutliers(dots$W,mem.W)
>
> data(ex2D)
> mem.ex2D <- getHDmembers(ex2D)
> out.ex2D <- getHDoutliers(ex2D,mem.ex2D)
>
> ## Not run:
> ##D n <- 100000 # number of observations
> ##D set.seed(3)
> ##D x <- matrix(rnorm(2*n),n,2)
> ##D nout <- 10 # number of outliers
> ##D x[sample(1:n,size=nout),] <- 10*runif(2*nout,min=-1,max=1)
> ##D
> ##D mem.x <- getHDmembers(x)
> ##D out.x <- getHDoutliers(x,mem.x)
> ## End(Not run)
>
>
>
>
>
>
> dev.off()
null device
1
>