This package contains functions to read in text corpora, fit LDA-type models to them, and use the fitted models to explore the data and make predictions.
● Data Source:
CranContrib
● Keywords: package
● Alias: lda, lda-package
●
0 images
|
This function takes a model fitted using lda.collapsed.gibbs.sampler and returns a matrix of the top words in each topic.
● Data Source:
CranContrib
● Keywords: utilities
● Alias: top.topic.documents, top.topic.words
●
0 images
|
concatenate.documents concatenates a set of documents. filter.words removes references to certain words from a collection of documents. shift.word.indices adjusts references to words by a fixed amount.
● Data Source:
CranContrib
● Keywords: utilities
● Alias: concatenate.documents, filter.words, shift.word.indices
●
0 images
|
This function takes a fitted LDA-type model and computes a predictive distribution for new words in a document. This is useful for making predictions about held-out words.
● Data Source:
CranContrib
● Keywords: utilities
● Alias: predictive.distribution
●
0 images
|
These functions compute summary statistics of a corpus. word.counts computes the word counts for a set of documents, while documents.length computes the length of the documents in a corpus.
● Data Source:
CranContrib
● Keywords: utilities
● Alias: document.lengths, word.counts
●
0 images
|
These functions use a collapsed Gibbs sampler to fit three different models: latent Dirichlet allocation (LDA), the mixed-membership stochastic blockmodel (MMSB), and supervised LDA (sLDA). These functions take sparsely represented input documents, perform inference, and return point estimates of the latent parameters using the state at the last iteration of Gibbs sampling. Multinomial logit for sLDA is supported using the multinom function from nnet package .
● Data Source:
CranContrib
● Keywords: models
● Alias: lda.collapsed.gibbs.sampler, lda.cvb0, mmsb.collapsed.gibbs.sampler, slda.em
●
0 images
|
Fit a generative topic model which accounts for both the words which occur in a collection of documents as well as the links between the documents.
● Data Source:
CranContrib
● Keywords: models
● Alias: rtm.collapsed.gibbs.sampler, rtm.em
●
0 images
|
Fit a NUBBI model, which takes as input a collection of entities with corresponding textual descriptions as well as a set of descriptions for pairs of entities. The NUBBI model the produces a latent space description of both the entities and the relationships between them.
● Data Source:
CranContrib
● Keywords: models
● Alias: nubbi.collapsed.gibbs.sampler
●
0 images
|
These functions read in the document and vocabulary files associated with a corpus. The format of the files is the same as that used by LDA-C (see below for details). The return value of these functions can be used by the inference procedures defined in the lda package.
● Data Source:
CranContrib
● Keywords: file
● Alias: read.documents, read.vocab
●
0 images
|
This function takes as input a collection of links (as used/described by the model fitting functions in this package) and reproduces the links as a matrix.
● Data Source:
CranContrib
● Keywords: utilities
● Alias: links.as.edgelist
●
0 images
|