numeric vector of non-negative integers, representing the observed frequency of each species.
conf
confidence factor, as a quantile of the standard normal distribution, used to decide for what values the log-linear relationship between frequencies and frequencies of frequencies is acceptable.
counts
matrix of counts
Details
Observed counts are assumed to be Poisson distributed.
Using an non-parametric empirical Bayes strategy, the algorithm evaluates the posterior expectation of each species mean given its observed count.
The posterior means are then converted to proportions.
In the empirical Bayes step, the counts are smoothed by assuming a log-linear relationship between frequencies and frequencies of frequencies.
The fundamentals of the algorithm are from Good (1953).
Gale and Sampson (1995) proposed a simplied algorithm with a rule for switching between the observed and smoothed frequencies, and it is Gale and Sampson's simplified algorithm that is implemented here.
The number of zero values in x are not used in the algorithm, but is returned by this function.
Gale, WA, and Sampson, G (1995).
Good-Turing frequency estimation without tears.
Journal of Quantitative Linguistics 2, 217-237.
Examples
# True means of observed species
lambda <- rnbinom(10000,mu=2,size=1/10)
lambda <- lambda[lambda>1]
# Oberved frequencies
Ntrue <- length(lambda)
x <- rpois(Ntrue, lambda=lambda)
freq <- goodTuring(x)
goodTuringPlot(x)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(edgeR)
Loading required package: limma
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/edgeR/goodTuring.Rd_%03d_medium.png", width=480, height=480)
> ### Name: goodTuring
> ### Title: Good-Turing Frequency Estimation
> ### Aliases: goodTuring goodTuringPlot goodTuringProportions
> ### Keywords: models
>
> ### ** Examples
>
> # True means of observed species
> lambda <- rnbinom(10000,mu=2,size=1/10)
> lambda <- lambda[lambda>1]
>
> # Oberved frequencies
> Ntrue <- length(lambda)
> x <- rpois(Ntrue, lambda=lambda)
> freq <- goodTuring(x)
> goodTuringPlot(x)
>
>
>
>
>
> dev.off()
null device
1
>