R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Create and manipulate 'ZIClass' objects

ZIClass

R Documentation

Create and manipulate 'ZIClass' objects

Description

'ZIClass' objects are key items in ZooImage. They contain all what is required for automatically classify plancton from .zid files. They can be used as blackboxes by all users (but require users trained in machine learning techniques to build them). Hence, ZooImage is made very simple for biologists that just want to use classifiers but do not want to worry about all the complexities of what is done inside the engine!

Usage

ZIClass(formula, data, method = getOption("ZI.mlearning", "mlRforest"),
    calc.vars = getOption("ZI.calcVars", calcVars), drop.vars = NULL,
    drop.vars.def = dropVars(), cv.k = 10, cv.strat = TRUE,
    ..., subset, na.action = na.omit)

## S3 method for class 'ZIClass'
print(x, ...)
## S3 method for class 'ZIClass'
summary(object, sort.by = "Fscore", decreasing = TRUE,
    na.rm = FALSE, ...)
## S3 method for class 'ZIClass'
predict(object, newdata, calc = TRUE, class.only = TRUE,
    type = "class", ...)
## S3 method for class 'ZIClass'
confusion(x, y = response(x), labels = c("Actual", "Predicted"),
    useNA = "ifany", prior, use.cv = TRUE, ...)

Arguments

`formula`	a formula with left member being the class variable and the right member being a list of predicting variables separated by a '+' sign. Since `data` is supposed to be previously filtered using `calc.vars` and the class variable in 'ZITrain' object is always named `Class`, the formula almost always reduces to `Class ~ .`
`data`	a data frame (a 'ZITrain' object usually), containing both measurement and manual classification (a factor variables usually named 'Class').
`method`	the machine learning method to use. It should produce results compatible with `mlearning` objects as returned by the various `mlXXX()` functions in the `mlearning` package. By default, the random forest algorithm is used (it is among the ones that give best result with plankton).
`calc.vars`	a function to use to calculate variables from the original data frame.
`drop.vars`	a character vector with names of variables to drop for the classification, or `NULL` (by default) to keep them all.
`drop.vars.def`	a second list of variables to drop contained in a character vector. That list is supposed to match the name of variables that are obviously non informative and are dropped by default. It can be gathered automatically using `dropVars()`. See `?calcVars` for more details.
`cv.k`	the k times for cross-validation.
`cv.strat`	do we use a stratified sampling for cross-validation? (recommended).
`...`	further arguments to pass to the classification algorithm (see help of that particular function).
`subset`	an expression for subsetting to original data frame.
`na.action`	the function to filter the initial data frame for missing values. Althoung the default in R is `na.fail`, leading to failure if at least one `NA` is found in the data frame, the default here is `na.omit` which leads to elimination of all lines containing at least one `NA`. Take care about how many items remain, if you encounter many `NA`s in your dataset!
`x`	a 'ZIClass' object.
`object`	a 'ZIClass' object.
`newdata`	a 'ZIDat' object, or a 'data.frame' to use for prediction.
`sort.by`	the statistics to use to sort the table (by default, F-score).
`decreasing`	do we sort in increasing or decreasing order?
`na.rm`	do we eliminate entries with missing data first (using `na.omit()`)?
`calc`	a boolean indicating if variables have to be recalculated before running the prediction.
`class.only`	if TRUE, return just a vector with classification, otherwise, return the 'ZIDat' object with 'Predicted' column appended to it.
`type`	the type of result to return, `"class"` by default. No other value is permitted if class.only is `FALSE`.
`y`	a factor with reference classes.
`labels`	labels to use for, respectively, the reference class and the predicted class.
`useNA`	do we keep NAs as a separate category? The default `"ifany"` creates this category only if there are missing values. Other possibilities are `"no"`, or `"always"`. The default is suitable for test sets because unclassified items (those in the "_" directory or one of its subdirectories) get `NA` for Class.
`prior`	class frequencies to use for first classifier that is tabulated in the rows of the confusion matrix. This is either a single positive numeric to set all class frequencies to this value (use 1 for relative frequencies and 100 for relative freqs in percent), or a vector of positive numbers of the same length as the levels in the object. If the vector is named, names must match levels. Alternatively, providing `NULL` or an object of null length resets row class prefencies into their initial values.
`use.cv`	the predicted values extracted from the 'ZIClass' object can either be the predicted values from the training set, or the cross-validated predictions (by default). Most of the time, you want the cross-validated predictions, which allows for not (or less) biased evaluation of the classifier prediction... So, if you don't know, you are probably better leaving the default value.

Value

ZIClass() is the constructor that build the 'ZIClass' object. print(), summary() and predict()) are the methods to print the object, to calculate statistics on this classifier based on the confusion matrix and to predict groups for ZooImage samples, using one 'ZIClass' object.

Note

Always analyze carefully the properties, performances and limitations of a 'ZIClass' object before using it to classify objects of one series. For instance, you can use confusion() to compare two classifiers, or an automatic classifier with a manual classification done by a taxonomists. Always respect the limitations in the use of a 'ZIClass' object (for instance, a classifier specific of one given series should not be used to classify items in a different series)! It is a good practice to make a report, documenting a 'ZIClass' object, together with the comments of taxonomists that made the reference training set, and with details on the analysis of the performances of the classifier.

Author(s)

Philippe Grosjean <Philippe.Grosjean@umons.ac.be>

Examples

##TODO...