R: A shortened collection of newsgroup messages with the first 3...
newsgroups
R Documentation
A shortened collection of newsgroup messages with the first 3 classes.
Description
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents,
partitioned (nearly) evenly across 20 different newsgroups. We use in this package only
its first 3 classes for demonstration purposes.
newsgroup.train.documents and newsgroup.test.documents
comprise a corpus of 2731 newsgroup documents partitioned into 1633 training
and 1098 test cases evenly distributed across 3 classes.
newsgroup.train.labels is a numeric vector of length 1633 which gives
a class label from 1 to 3 for each training document in the corpus.
newsgroup.test.labels is a numeric vector of length 1098 which gives
a class label from 1 to 3 for each test document in the corpus.
newsgroup.vocab is the vocabulary of the corpus.
stopwords English stopwords extracted from the tm package.