R: Expected Frequency Spectrum by Binomial Interpolation (zipfR)
spc.interp
R Documentation
Expected Frequency Spectrum by Binomial Interpolation (zipfR)
Description
spc.interp computes the expected frequency spectrum for a
random sample of specified size N, taken from a data set
described by the frequency spectrum object obj.
Usage
spc.interp(obj, N, m.max=max(obj$m), allow.extrapolation=FALSE)
Arguments
obj
an object of class spc, representing the frequency
spectrum of the data set from which samples are taken
N
a single non-negative integer specifying the sample size for
which the expected frequency spectrum is calculated
m.max
number of spectrum elements listed in the expected
frequency spectrum. By default, as many spectrum elements are
included as the spectrum obj contains, since the expectations
of higher spectrum elements will always be 0 in the binomial
interpolation. See note in section "Details" below.
allow.extrapolation
if TRUE, the requested sample size
N may be larger than the sample size of the frequency spectrum
obj, for binomial extrapolation. This obtion should
be used with great caution (see EVm.spc for details).
Details
See the EVm.spc manpage for more information, especially
concerning binomial extrapolation.
For large frequency spectra, the default value of m.max may
lead to very long computation times. It is therefore recommended to
specify m.max explicitly and calculate only as many spectrum
elements as are actually required.
Value
An object of class spc, representing the expected frequency
spectrum for a random sample of size N taken from the data set
that is described by obj.
See Also
spc for more information about frequency spectra and
links to relevant functions
The implementation of spc.interp is based on the functions
EV.spc and EVm.spc. See the respective
manpages for technical details.
vgc.interp computes expected vocabulary growth curves by
binomial interpolation from a frequency spectrum
sample.spc takes a single concrete random
subsample from a spectrum and returns the spectrum of the subsample,
unlike spc.interp, that computes the expected
frequency spectrum for random subsamples of size N by
binomial interpolation.
Examples
## load the Tiger NP expansion spectrum
## (sample size: about 109k tokens)
data(TigerNP.spc)
## interpolated expected frequency subspectrum of 50k tokens
TigerNP.sub.spc <- spc.interp(TigerNP.spc,5e+4)
summary(TigerNP.sub.spc)
## previous is slow since it calculates all expected spectrum
## elements; suppose we only need the first 10 expected
## spectrum element frequencies; then we can do:
TigerNP.sub.spc <- spc.interp(TigerNP.spc,5e+4,m.max=10) # much faster!
summary(TigerNP.sub.spc)