Last data update: 2014.03.03

Data Source

R Release (3.2.3)
CranContrib
BioConductor
All

Data Type

Packages
Functions
Images
Data set

Classification

Results 1 - 4 of 4 found.
[1] < 1 > [1]  Sort:

specialchars (Package: lsa) : List of special character html entities and their character replacement

This list contains entities (specialchars$entities) and their replacement character (specialchars$replacement), as used by textvector to cleanup html code: for example, this is used to replace the html entity &auml; with the character ae. You can use this data set with data(specialchars).
● Data Source: CranContrib
● Keywords: datasets
● Alias: specialchars
● 0 images

corpora (Package: lsa) : Corpora (Essay Scoring)

This data sets contain example corpora for essay scoring. A training textmatrix contains files to construct a latent semantic space apt for grading student essays provided in the essay textmatrix. In a separate data set, the original human scores are noted down with which the student essays were graded by a human assessor. The corpora (and human scores) can be loaded by calling data(corpus_training), data(corpus_essays), or data(corpus_scores). The objects must already exist before being handed over to e.g. lsa().
● Data Source: CranContrib
● Keywords: array, datasets
● Alias: corpus_essays, corpus_scores, corpus_training
● 0 images

alnumx (Package: lsa) : Regular expression for removal of non-alphanumeric characters (saving special characters)

This character string contains a regular expression for use in gsub deployed in textvector that identifies all alphanumeric characters (including language specific special characters not included in [:alnum:], currently only the ones found in German and Polish. You can use this expression by loading it with data(alnumx).
● Data Source: CranContrib
● Keywords: datasets
● Alias: alnumx
● 0 images

stopwords (Package: lsa) : Stopwordlists in German, English, Dutch, French, Polish, and Arab

This data sets contain very common lists of words that want to be ignored when building up a document-term matrix. The stop word lists can be loaded by calling data(stopwords_en), data(stopwords_de), data(stopwords_nl), data(stopwords_ar), etc. The objects stopwords_de, stopwords_en, stopwords_nl, stopwords_ar, etc. must already exist before being handed over to textmatrix().
● Data Source: CranContrib
● Keywords: array, datasets
● Alias: stopwords_ar, stopwords_de, stopwords_en, stopwords_fr, stopwords_nl, stopwords_pl
● 0 images