Objects of classes tfl, spc and
vgc that contain frequency data for the syntactic
expansions of Noun Phrases (NP) and Prepositional Phrases (PP) in
the Tiger German treebank.
Details
In this dataset, types are not words, but syntactic expansions,
i.e., sequences of syntactic categories that form NPs (in
TigerNP) or PPs (in TigerPP), according to the Tiger
annotation scheme for German. Thus, for example, among the expansion
types in the TigerNP dataset, we find ART_NN and
ART_ADJA_NN, whereas among the PP expansions in
TigerPP we find APPR_ART_NN and APPR_NN
(APPR is the tag for prepositions in the Tiger tagset).
The Tiger treebank contains about 900,000 tokens (50,000 sentences)
of German newspaper text from the Frankfurter Rundschau. The token
frequencies of the expansion types are taken from this corpus.
TigerNP.tfl and TigerPP.tfl are the type frequency
lists. TigerNP.spc and TigerPP.spc are frequency
spectra. TigerNP.emp.vgc and TigerPP.emp.vgc are the
corresponding observed vocabulary growth curves (tracking the
development of V and V(1) in the original order of
occurrence of the expansion tokens in the source corpus).