Last data update: 2014.03.03

R: FindTopicsNumber
FindTopicsNumberR Documentation

FindTopicsNumber

Description

Calculates different metrics to estimate the most preferable number of topics for LDA model.

Usage

FindTopicsNumber(dtm, topics = seq(10, 40, by = 10),
  metrics = "Griffiths2004", method = "Gibbs", control = list(),
  mc.cores = 1L, verbose = FALSE)

Arguments

dtm

An object of class "DocumentTermMatrix" with term-frequency weighting or an object coercible to a "simple_triplet_matrix" with integer entries.

topics

Vvector with number of topics to compare different models.

metrics

String or vector of possible metrics: "Griffiths2004", "CaoJuan2009", "Arun2010", "Deveaud2014".

method

The method to be used for fitting; see LDA.

control

A named list of the control parameters for estimation or an object of class "LDAcontrol".

mc.cores

Integer; The number of CPU cores to processes models simultaneously (using mclapply).

verbose

If false (default), supress all warnings and additional information.

Value

Data-frame with one or more metrics. numbers of topics and corresponding values of metric (higher is better). Can be directly used by FindTopicsNumber_plot to draw a plot.

Examples

library(topicmodels)
data("AssociatedPress", package="topicmodels")
dtm <- AssociatedPress[1:10, ]
FindTopicsNumber(dtm, topics = 2:10, metrics = "Arun2010", mc.cores = 1L)

Results