R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Generic k-fold cross-validation

generic.cv

R Documentation

Generic k-fold cross-validation

Description

Performs k-fold cross-validation 'n' times for any specified algorithm, using two of many metrics(test error, AUC, precision,...)

Usage

generic.cv(X, Y, 
nTimes = 1, 
k = 10, 
seed = 2014, 
regression = TRUE, 
genericAlgo = NULL, 
specificPredictFunction = NULL, 
metrics = c("none", "AUC", "precision", "F-score", "L1", "geometric mean", 
"geometric mean (precision)"))

Arguments

`X`	a matrix or dataframe of observations
`Y`	a vector (a factor for classification) for the observed data.
`nTimes`	number of times that k-fold cross-validation need to be performed.
`k`	how many folds ?
`seed`	the seed for reproducibility.
`regression`	if TRUE, performs regression.
`genericAlgo`	wrapper function to embed the algorithm that one needs to assess. One can eventually add options. NULL is only for convenience. Wrapper function is needed to assess cross-validation.
`specificPredictFunction`	if the assessed model does not support the R generic method 'predict', one has to define here, with a function, how predictions have to be generated.
`metrics`	One of many other metrics one can call with the standard one, test error (or MSE for regression).

Value

a list with the following components :

`testError`	the values of test error.
`avgError`	mean of test error.
`stdDev`	standard deviation of test error.
`metric`	values of the other chosen metric.

Author(s)

Saip Ciss saip.ciss@wanadoo.fr

Examples

## not run
# data(iris)
# Y <- iris$Species
# X <- iris[,-which(colnames(iris) == "Species")]

## 10-fold cross-validation for the randomUniformForest algorithm:

## create the wrapper function (setting 'threads = 1' since data are small)
# genericAlgo.ruf <- function(X, Y) randomUniformForest(X, Y, 
# OOB = FALSE, importance = FALSE, threads = 1)

## run
# rUF.10cv.iris <- generic.cv(X, as.factor(Y), 
# genericAlgo = genericAlgo.ruf, regression = FALSE)
  
## 10-fold cross-validation for the randomForest algorithm:

## create the wrapper function
# require(randomForest) || install.packages("randomForest")
# genericAlgo.rf <- function(X, Y) randomForest(X, Y)

## run
# RF.10cv.iris <- generic.cv(X, as.factor(Y), 
# genericAlgo = genericAlgo.rf, regression = FALSE)

## 10-fold cross-validation for Gradient Boosting Machines algorithm (gbm package)

## create the wrapper function
# require(gbm) || install.packages("gbm")
# genericAlgo.gbm <- function(X, Y) gbm.fit(X, Y, distribution = "multinomial",
# n.trees = 500, shrinkage = 0.05, interaction.depth = 24, n.minobsinnode = 1) 

## create a wrapper for the prediction function of gbm
# nClasses = length(unique(Y))
# specificPredictFunction.gbm <- function(model, newdata)
# {
#	modelPrediction = predict(model, newdata, 500) 
#	predictions = matrix(modelPrediction, ncol = nClasses )
#	colnames(predictions) = colnames(modelPrediction)
#	return(as.factor(apply(predictions, 1, function(Z) names(which.max(Z)))))
# }

## run
# gbm.10cv.iris <- generic.cv(X, Y, genericAlgo = genericAlgo.gbm, 
# specificPredictFunction = specificPredictFunction.gbm, regression = FALSE)

## 10-fold cross-validation for CART algorithm (rpart package):

# genericAlgo.CART <- function(X, Y) 
#{
#	ZZ = data.frame(Y, X)
#	if (is.factor(Y)) { modelObject = rpart(Y ~., data = ZZ, method = "class", ...)	}
#	else { 	modelObject = rpart(Y ~., data = ZZ, ...) }
#	return(modelObject) 
#}

# specificPredictFunction.CART <- function(model, newdata)
# predict(model, data.frame(newdata), type= "vector")

# CART.10cv.iris <- generic.cv(X, as.factor(Y), genericAlgo = genericAlgo.CART, 
# specificPredictFunction = specificPredictFunction.CART, regression = FALSE)