R: A subset of the Cora dataset of scientific documents.
A subset of the Cora dataset of scientific documents.
Description
A collection of 2410 scientific documents in LDA format with links and titles
from the Cora search engine.
Usage
data(cora.documents)
data(cora.vocab)
data(cora.cites)
data(cora.titles)
Format
cora.documents and cora.vocab
comprise a corpus of 2410 documents conforming to the LDA format.
cora.titles is a character vector of titles for each
document (i.e., each entry of cora.documents ).
cora.cites is a list representing the citations between the
documents in the collection (see related for format).
Source
Automating the construction of internet protals with machine
learning. McCallum et al. Information Retrieval. 2000.
See Also
lda.collapsed.gibbs.sampler for the format of the
corpus.
rtm.collapsed.gibbs.sampler for the format of the
citation links.
Examples
data(cora.documents)
data(cora.vocab)
data(cora.links)
data(cora.titles)
Results
|