R: Calculate vocabulary growth curve and vocabulary richness...
growth.fnc
R Documentation
Calculate vocabulary growth curve and vocabulary richness measures
Description
This function calculates, for an increasing sequence of text sizes,
the observed number of types, hapax legomena, dis legomena, tris legomena,
and selected measures of lexical richness.
An integer giving the size of a text chunk
when the text is to be split into a series of equally-sized text chunks.
nchunks
An integer denoting the number of desired equally-sized
text chunks.
chunks
An integer vector denoting the token sizes for which growth
measures are required. When chunks is specified, size and
nchunks are ignored.
Value
A growth object with methods for plotting, printing.
As running this function on large texts may take some time,
a period is printed on the output device for each completed chunk
to indicate progress.
The data frame with the actual measures, which can be extracted with
object.name@data$data, has the following columns.
Chunk
a numeric vector with chunk numbers.
Tokens
a numeric vector with the number of tokens up to
and including the current chunk.
Types
a numeric vector with the number of types up to and
including the current chunk.
HapaxLegomena
a numeric vector with the corresponding count
of hapax legomena.
DisLegomena
a numeric vector with the corresponding count
of dis legomena.
TrisLegomena
a numeric vector with the corresponding count
of tris legomena.
Yule
a numeric vector with Yule's K.
Zipf
a numeric vector with the slope of Zipf's rank-frequency
curve in the double-logarithmic plane.
TypeTokenRatio
a numeric vector with the ratio of types to
tokens.
Herdan
a numeric vector with Herdan's C.
Guiraud
a numeric vector with Guiraud's R.
Sichel
a numeric vector with Sichel's S.
Lognormal
a numeric vector with mean log frequency.
Author(s)
R. H. Baayen
References
R. H. Baayen (2001) Word Frequency Distributions,
Dordrecht: Kluwer Academic Publishers.
Tweedie, F. J. & Baayen, R. H. (1998) How variable may a constant be?
Measures of lexical richness in perspective, Computers and the
Humanities, 32, 323-352.