These functions compute summary statistics of a corpus.
word.counts computes the word counts for a set of documents,
while documents.length computes the length of the documents in
a corpus.
A list of matrices specifying the corpus. See
lda.collapsed.gibbs.sampler for details on the
format of this variable.
vocab
An optional character vector specifying the levels (i.e., labels) of
the vocabulary words. If unspecified (or NULL), the levels
will be automatically inferred from the corpus.
Value
word.counts returns an object of class table which
contains counts for the number of times each word appears in the input
corpus. If vocab is specified, then the levels of the table
will be set to vocab. Otherwise, the levels are automatically
inferred from the corpus (typically integers 0:(V-1), where
V indicates the number of unique words in the corpus).
documents.length returns a integer vector of length
length(docs), each entry of which corresponds to the
length (sum of the counts of all features) of each document in
the corpus.