Objects of classes spc and vgc that
contain frequency data for various subsets of words from the Brown
corpus (see Kucera and Francis 1967).
Details
BrownAdj.spc, BrownNoun.spc and BrownVer.spc
are frequency spectra of all the Brown corpus words tagged as
adjectives, nouns and verbs, respectively. BrownAdj.emp.vgc,
BrownNoun.emp.vgc and BrownVer.emp.vgc are the
corresponding observed vocabulary growth curves (tracking the
development of V and V(1), like all the files with
suffix .emp.vgc below).
BrownImag.spc and BrownInform.spc are frequency
spectra of the Brown corpus words subdivided into the two main
stylistic partitions of the corpus, i.e., imaginative and
informative prose, respectively. BrownImag.emp.vgc and
BrownInform.emp.vgc are the corresponding observed vocabulary
growth curves.
Brown100k.spc is the spectrum of the first 100,000 tokens in
the Brown (useful, e.g., for extrapolation experiments in which we
want to estimate a lnre model on a subset of the data
available). The corresponding observed growth curve can be easily
obtained from the one for the whole Brown (Brown.emp.vgc).
Notice that we removed numbers and other forms of non-linguistic
material before collecting any data from the Brown.
References
Kucera, H. and Francis, W.N. (1967). Computational analysis of
present-day American English. Brown University Press, Providence.
See Also
The data described in Brown pertain to the Brown as a
whole.