newsgroup.train.documents and newsgroup.test.documents
comprise a corpus of 20,000 newsgroup documents conforming to the LDA format,
partitioned into 11269 training and 7505 training and test cases evenly distributed
across 20 classes.
newsgroup.train.labels is a numeric vector of length 11269 which gives
a class label from 1 to 20 for each training document in the corpus.
newsgroup.test.labels is a numeric vector of length 7505 which gives
a class label from 1 to 20 for each training document in the corpus.
newsgroup.vocab is the vocabulary of the corpus.
newsgroup.label.map maps the numeric class labels to actual class names.
Source
http://qwone.com/~jason/20Newsgroups/
See Also
lda.collapsed.gibbs.sampler for the format of the
corpus.