R: Functions to manipulate text corpora in LDA format.
filter.words
R Documentation
Functions to manipulate text corpora in LDA format.
Description
concatenate.documents concatenates a set of documents.
filter.words removes references to certain words
from a collection of documents.
shift.word.indices adjusts references to words by a fixed amount.
For concatenate.documents, the set of corpora to be merged. All
arguments to ... must be corpora of the same length. The
documents in the same position in each of the arguments will be
concatenated, i.e., the new document 1 will be the concatenation of
document 1 from argument 1, document 2 from argument 1, etc.
documents
For filter.words and shift.word.indices, the corpus to
be operated on.
to.remove
For filter.words, an integer vector of words to filter.
The words in each document which also exist in to.remove will be removed.
amount
For shift.word.indices, an integer scalar by which to shift
the vocabulary in the corpus. amount will be added to each
entry of the word field in the corpus.
Value
A corpus with the documents merged/words filtered/words shifted. The format of the
input and output corpora is described in lda.collapsed.gibbs.sampler.