R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Import text documents into Mallet format

mallet.import

R Documentation

Import text documents into Mallet format

Description

This function takes an array of document IDs and text files (as character strings) and converts them into a Mallet instance list.

Usage

mallet.import(id.array, text.array, stoplist.file, preserve.case, token.regexp)

Arguments

`id.array`	An array of document IDs.
`text.array`	An array of text strings to use as documents. The type of the array must be `character`.
`stoplist.file`	The name of a file containing stopwords (words to ignore), one per line. If the file is not in the current working directory, you may need to include a full path.
`preserve.case`	By default, the input text is converted to all lowercase.
`token.regexp`	A quoted string representing a regular expression that defines a token. The default is one or more unicode letter: "[\p{L}]+". Note that special characters must have double backslashes.

Examples

## Not run: 
mallet.instances <- mallet.import(documents$id, documents$text, "en.txt",
		    		token.regexp = "\p{L}[\p{L}\p{P}]+\p{L}")

## End(Not run)

Import text documents into Mallet format

Description

Usage

Arguments

See Also

Examples

Results