Last data update: 2014.03.03

R: Tuning of the k-NN regression with Euclidean or...
knnreg.tuneR Documentation

Tuning of the k-NN regression with Euclidean or (hyper-)spherical response and or predictor variables

Description

Tuning of the k-NN regression with Euclidean or (hyper-)spherical response and or predictor variables. It estimates the percentage of correct classification via an m-fold cross valdiation. The bias is estimated as well using the algorithm suggested by Tibshirani and Tibshirani (2009) and is subtracted.

Usage

knnreg.tune(y, x, M = 10, A = 10, ncores = 1, res = "eucl", type = "euclidean",
estim = "arithmetic", mat = NULL, graph = FALSE)

Arguments

y

The currently available data, the response variables values. A matrix with either euclidean (univariate or multivariate) or (hyper-)spherical data. If you have a circular response, say u, transform it to a unit vector via (cos(u), sin(u)).

x

The currently available data, the predictor variables values. A matrix with either euclidean (univariate or multivariate) or (hyper-)spherical data. If you have a circular response, say u, transform it to a unit vector via (cos(u), sin(u)).

M

The number of folds for the m-fold cross validation, set to 10 by default.

A

The maximum number of nearest neighbours, set to 5 by default. The maximum is actually A + 1, since 1 nearest neighbour is not used.

ncores

How many cores to use. This is taken into account only when the predictor variables are spherical.

res

The type of the response variable. If it is Euclidean, set this argument equal to "res". If it is a unit vector set it to res="spher".

type

The type of distance to be used. This determines the nature of the predictor variables. This is actually the argument "method" of the distance function in R. The default value is "euclidean". R has several options the type of the distance. Just type ?dist in R and see the methods. Any method can be given here. If you have unit vectors in general, you should put type="angular", so that the cosinus distance is calculated.

estim

Once the k observations whith the smallest distance are discovered, what should the prediction be? The arithmetic average of the corresponding y values be used estim="arithmetic" or their harmonic average estim="harmonic".

mat

You can specify your own folds by giving a mat, where each column is a fold. Each column contains indices of the observations. You can also leave it NULL and it will create folds.

graph

If this is TRUE a graph with the results will appear.

Details

Tuning of the k-NN regression with Euclidean or (hyper-)spherical response and or predictor variables. It estimates the percentage of correct classification via an m-fold cross valdiation. The bias is estimated as well using the algorithm suggested by Tibshirani and Tibshirani (2009) and is subtracted. The sum of squares of prediction is used in the case of Euclidean responses. In the case of spherical responses the ∑_{hat{y}_i^T}y_i is calculated.

Value

A list including:

crit

The value of the criterion to minimise/maximise for all values of the nearest neighbours.

best_k

The best value of the nearest neighbours.

performance

The bias corrected optimal value of the criterion, along wit the estimated bias. For the case of Euclidean reponse this will be higher than the crit and for the case or spherical responses it will be lower than crit.

runtime

The run time of the algorithm. A numeric vector. The first element is the user time, the second element is the system time and the third element is the elapsed time.

Author(s)

Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Giorgos Athineou <athineou@csd.uoc.gr>

References

Tibshirani, Ryan J., and Robert Tibshirani. A bias correction for the minimum error rate in cross-validation. The Annals of Applied Statistics (2009), 3(2): 822-829.

See Also

knn.reg, spher.reg, dirknn.tune

Examples

y <- iris[, 1]
x <- iris[, 2:4]
x <- x/ sqrt( rowSums(x^2) )  ## Euclidean response and spherical predictors
knnreg.tune(y, x, A = 5, res = "eucl", type = "spher", estim = "arithmetic",
mat = NULL, graph = TRUE)

y <- iris[, 1:3]
y <- y/ sqrt( rowSums(y^2) )  ## Spherical response and Euclidean predictor
x <- iris[, 2]
knnreg.tune(y, x, A = 5, res = "eucl", type = "euclidean", estim = "arithmetic",
mat = NULL, graph = TRUE)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(Directional)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/Directional/knnreg.tune.Rd_%03d_medium.png", width=480, height=480)
> ### Name: knnreg.tune
> ### Title: Tuning of the k-NN regression with Euclidean or
> ###   (hyper-)spherical response and or predictor variables
> ### Aliases: knnreg.tune
> ### Keywords: k-NN regression Cross-validation Euclidean data Spherical
> ###   data
> 
> ### ** Examples
> 
> y <- iris[, 1]
> x <- iris[, 2:4]
> x <- x/ sqrt( rowSums(x^2) )  ## Euclidean response and spherical predictors
> knnreg.tune(y, x, A = 5, res = "eucl", type = "spher", estim = "arithmetic",
+ mat = NULL, graph = TRUE)
$crit
[1] 0.4541000 0.5301111 0.5655458 0.5927680

$best_k
[1] 2

$performance
[1] 0.6089202 0.0161522

$runtime
   user  system elapsed 
  0.048   0.000   0.049 

> 
> y <- iris[, 1:3]
> y <- y/ sqrt( rowSums(y^2) )  ## Spherical response and Euclidean predictor
> x <- iris[, 2]
> knnreg.tune(y, x, A = 5, res = "eucl", type = "euclidean", estim = "arithmetic",
+ mat = NULL, graph = TRUE)
$crit
       k=2        k=3        k=4        k=5 
0.11030545 0.10540084 0.10205249 0.09523095 

$best_k
k=5 
  5 

$performance
          mspe estimated bias 
    0.09523095     0.00000000 

$runtime
   user  system elapsed 
  0.060   0.004   0.065 

> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>