R Graphical Manual

Browse All

Last data update: 2014.03.03

R: KNN Prediction Probability Routine using Pre-Calculated...

knn.probability

R Documentation

KNN Prediction Probability Routine using Pre-Calculated Distances

Description

K-Nearest Neighbor prediction probability method which uses the distances calculated by knn.dist. For predictions (not probabilities) see knn.predict.

Usage

knn.probability(train, test, y, dist.matrix, k=1, ties.meth="min")

Arguments

`train`	indexes which specify the rows of the `dist.matrix` to use as training set.
`test`	indexes which specify the rows of the `dist.matrix` to use as test set.
`y`	a vector of labels.
`dist.matrix`	the output from a call to `knn.dist`.
`k`	the number of nearest neighbors to consider.
`ties.meth`	method to handle ties for the k-th neighbor, the default is `"min"` which uses all ties, alternatives include `"max"` which uses none if there are ties for the k-th nearest neighbor, `"random"` which selects among the ties randomly and `"first"` which uses the ties in their order in the data.

Details

Predictions are calculated for each test case by aggregating the responses of the k-nearest neighbors among the training cases and using the classprob. k may be specified to be any positive integer less than the number of training cases, but is generally between 1 and 10. The indexes for the training and test cases are in reference to the order of the entire data set as it was passed to knn.dist. The ties are handled using the rank function. Further information may be found by examining the ties.method there.

Value

A matrix of prediction probabilities whose number of colimns is the number of test cases and the number of rows is the number of levels in the responses.

Author(s)

Atina Dunlap Brooks

Examples

# the iris example used by knn(class)
library(class)
data(iris3)
train <- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])
test <- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])
cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))
# how to get predictions from knn(class)
pred <- knn(train, test, cl, k = 3, prob=TRUE)
# display the confusion matrix
table(pred,cl)
# view probabilities (only the highest probability is returned)
attr(pred,"prob")
# how to get predictions with knn.dist and knn.predict
x <- rbind(train,test)
kdist <- knn.dist(x)
pred <- knn.predict(1:75, 76:150, cl, kdist, k=3)
# display the confusion matrix
table(pred,cl)
# view probabilities (all class probabilities are returned)
knn.probability(1:75, 76:150, cl, kdist, k=3)
# to compare probabilites, rounding done for display purposes
p1 <- knn(train, test, cl, k = 3, prob=TRUE)
p2 <- round(knn.probability(1:75, 76:150, cl, kdist, k=3), digits=2)
table( round(attr(p1,"prob"), digits=2), apply(p2,2,max) )
# note any small differences in predictions are a result of
# both methods breaking ties in majority class randomly

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(KODAMA)
Loading required package: e1071
Loading required package: plsgenomics
Loading required package: MASS
Loading required package: boot
Loading required package: parallel
Loading required package: class

Attaching package: 'KODAMA'

The following object is masked from 'package:plsgenomics':

    transformy

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/KODAMA/knn.probability.Rd_%03d_medium.png", width=480, height=480)
> ### Name: knn.probability
> ### Title: KNN Prediction Probability Routine using Pre-Calculated
> ###   Distances
> ### Aliases: knn.probability
> ### Keywords: probability
> 
> ### ** Examples
> 
> # the iris example used by knn(class)
> library(class)
> data(iris3)
> train <- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])
> test <- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])
> cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))
> # how to get predictions from knn(class)
> pred <- knn(train, test, cl, k = 3, prob=TRUE)
> # display the confusion matrix
> table(pred,cl)
    cl
pred  c  s  v
   c 23  0  3
   s  0 25  0
   v  2  0 22
> # view probabilities (only the highest probability is returned)
> attr(pred,"prob")
 [1] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
 [8] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
[15] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
[22] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.6666667
[29] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.6666667 1.0000000
[36] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
[43] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
[50] 1.0000000 1.0000000 0.6666667 0.7500000 1.0000000 1.0000000 1.0000000
[57] 1.0000000 1.0000000 0.5000000 1.0000000 1.0000000 1.0000000 1.0000000
[64] 0.6666667 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
[71] 1.0000000 0.6666667 1.0000000 1.0000000 0.6666667
> # how to get predictions with knn.dist and knn.predict
> x <- rbind(train,test)
> kdist <- knn.dist(x)
> pred <- knn.predict(1:75, 76:150, cl, kdist, k=3)
> # display the confusion matrix
> table(pred,cl)
    cl
pred  c  s  v
   c 23  0  4
   s  0 25  0
   v  2  0 21
> # view probabilities (all class probabilities are returned)
> knn.probability(1:75, 76:150, cl, kdist, k=3)
  76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
c  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0
s  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1   1
v  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0
  101 102       103 104 105 106 107 108       109 110 111 112 113 114 115 116
c   1   1 0.3333333   1   1   1   1   1 0.3333333   1   1   1   1   1   1   1
s   0   0 0.0000000   0   0   0   0   0 0.0000000   0   0   0   0   0   0   0
v   0   0 0.6666667   0   0   0   0   0 0.6666667   0   0   0   0   0   0   0
  117 118 119 120 121 122 123 124 125 126       127  128 129 130 131 132 133
c   1   1   1   1   1   1   1   1   1   0 0.6666667 0.75   0   0   0   0   0
s   0   0   0   0   0   0   0   0   0   0 0.0000000 0.00   0   0   0   0   0
v   0   0   0   0   0   0   0   0   0   1 0.3333333 0.25   1   1   1   1   1
  134 135 136 137 138       139 140 141 142 143 144 145 146       147 148 149
c 0.5   0   0   0   0 0.6666667   0   0   0   0   0   0   0 0.3333333   0   0
s 0.0   0   0   0   0 0.0000000   0   0   0   0   0   0   0 0.0000000   0   0
v 0.5   1   1   1   1 0.3333333   1   1   1   1   1   1   1 0.6666667   1   1
        150
c 0.3333333
s 0.0000000
v 0.6666667
> # to compare probabilites, rounding done for display purposes
> p1 <- knn(train, test, cl, k = 3, prob=TRUE)
> p2 <- round(knn.probability(1:75, 76:150, cl, kdist, k=3), digits=2)
> table( round(attr(p1,"prob"), digits=2), apply(p2,2,max) )
      
       0.5 0.67 0.75  1
  0.5    1    0    0  0
  0.67   0    6    0  0
  0.75   0    0    1  0
  1      0    0    0 67
> # note any small differences in predictions are a result of
> # both methods breaking ties in majority class randomly
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>