R: Collapsed Gibbs Sampling for the Networks Uncovered By...
nubbi.collapsed.gibbs.sampler
R Documentation
Collapsed Gibbs Sampling for the Networks Uncovered By Bayesian
Inference (NUBBI) Model.
Description
Fit a NUBBI model, which takes as input a collection of entities with
corresponding textual descriptions as well as a set of descriptions
for pairs of entities. The NUBBI model the produces a latent space
description of both the entities and the relationships between them.
Usage
nubbi.collapsed.gibbs.sampler(contexts, pair.contexts, pairs, K.individual,
K.pair, vocab, num.iterations, alpha, eta, xi)
Arguments
contexts
The set of textual descriptions (i.e., documents) for individual
entities in LDA format (see
lda.collapsed.gibbs.sampler for details).
pair.contexts
A set of textual descriptions for pairs of entities, also in LDA format.
pairs
Labelings as to which pair each element of pair.contexts
refer to. This parameter should be an integer matrix with two columns
and the same number of rows as pair.contexts. The two
elements in each row of pairs are 0-indexed indices into
contexts indicating which two entities that element of
pair.contexts describes.
K.individual
A scalar integer representing the number of topics for the individual entities.
K.pair
A scalar integer representing the number of topics for entity pairs.
vocab
A character vector specifying the vocabulary words associated with
the word indices used in contexts and pair.contexts.
num.iterations
The number of sweeps of Gibbs sampling over the entire corpus to make.
alpha
The scalar value of the Dirichlet hyperparameter for
topic proportions.
eta
The scalar value of the Dirichlet hyperparamater for topic
multinomials.
xi
The scalar value of the Dirichlet hyperparamater for source
proportions.
Details
The NUBBI model is a switching model wherein the description of each
entity-pair can be ascribed to either the first entity of the pair,
the second entity of the pair, or their relationship. The NUBBI model
posits a latent space (i.e., topic model) over the individual entities, and a different
latent space over entity relationships.
The collapsed Gibbs sampler used in this model is different than the
variational inference method proposed in the paper and is highly experimental.
Value
A fitted model as a list with the same components as returned by
lda.collapsed.gibbs.sampler with the following additional components:
source_assignments
A list of length(pair.contexts) whose
elements source_assignments[[i]] are of the same length as
pair.contexts[[i]] where each entry is either 0 if the
sampler assigned the word to the first entity, 1 if the sampler
assigned the word to the second entity, or 2 if the sampler assigned
the word to the relationship between the two.
document_source_sums
A matrix with three columns and
length(pair.contexts) rows where each row indicates how many
words were assigned to the first entity of the pair, the second
entity of the pair, and the relationship between the two,
respectively.
document_sums
Semantically similar to the entry in
lda.collapsed.gibbs.sampler, except that it is a list whose
first length(contexts) correspond to the columns of the entry
in lda.collapsed.gibbs.sampler for the individual contexts,
and the remaining length(pair.contexts) entries correspond to
the columns for the pair contexts.
topics
Like the entry in lda.collapsed.gibbs.sampler,
except that it contains the concatenation of the K.individual
topics and the K.pair topics.
Note
The underlying sampler is quite general and could potentially be used
for other models such as the author-topic model (McCallum et al.) and the citation
influence model (Dietz et al.). Please examine the source code
and/or contact the author(s) for further details.