AlphabeticTokenizer(x, control = NULL)
NGramTokenizer(x, control = NULL)
WordTokenizer(x, control = NULL)
Arguments
x
a character vector with strings to be tokenized.
control
an object of class Weka_control, or a
character vector of control options, or NULL (default).
Available options can be obtained on-line using the Weka Option
Wizard WOW, or the Weka documentation.
Details
AlphabeticTokenizer is an alphabetic string tokenizer, where
tokens are to be formed only from contiguous alphabetic sequences.
NGramTokenizer splits strings into n-grams with given
minimal and maximal numbers of grams.