A vector, matrix, list, or data frame containing the predictions.
labels
A vector, matrix, list, or data frame containing the true class labels. Must have the same dimensions as predictions.
label.ordering
The default ordering of the classes can be changed by supplying a vector containing the negative and the positive class label (negative label first, positive label second).
folds
If specified, this must be a vector of fold ids equal in length to predictions and labels, or a list of length V (for V-fold cross-validation) of vectors of indexes for the observations contained in each fold. The folds argument must only be specified if the predictions and labels arguments are vectors.
confidence
A number between 0 and 1 that represents confidence level.
Details
See the documentation for the prediction function in the ROCR package for details on the predictions, labels and label.ordering arguments.
Value
A list containing the following named elements:
cvAUC
Cross-validated area under the curve estimate.
se
Standard error.
ci
A vector of length two containing the upper and lower bounds for the confidence interval.
confidence
A number between 0 and 1 representing the confidence.
LeDell, Erin; Petersen, Maya L.; and van der Laan, Mark J., “Computationally Efficient Confidence Intervals for Cross-validated Area Under the ROC Curve Estimates.” (December 2012). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 304. http://biostats.bepress.com/ucbbiostat/paper304
M. J. van der Laan and S. Rose. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. Springer, first edition, 2011.
Tobias Sing, Oliver Sander, Niko Beerenwinkel, and Thomas Lengauer. ROCR: Visualizing classifier performance in R. Bioinformatics, 21(20):3940-3941, 2005.
See Also
prediction, performance,
cvAUC, ci.pooled.cvAUC
Examples
# This i.i.d. data example does the following:
# Load a data set with a binary outcome. For the i.i.d. case we use a simulated data set of
# 500 observations, included with the package, of graduate admissions data.
#
# Divide the indices randomly into 10 folds, stratifying by outcome. Stratification is not
# necessary, but is commonly performed in order to create validation folds with similar
# distributions. Store this information in a list called folds.
#
# Define a function to fit a model on the training data and to generate predicted values
# for the observations in the validation fold, for a single iteration of the cross-validation
# procedure. We use a logistic regression fit.
#
# Apply this function across all folds to generate predicted values for each validation fold.
# The concatenated version of these predicted values is stored in vector called predictions.
# The outcome vector, Y, is the labels argument.
iid_example <- function(data, V=10){
.cvFolds <- function(Y, V){ #Create CV folds (stratify by outcome)
Y0 <- split(sample(which(Y==0)), rep(1:V, length=length(which(Y==0))))
Y1 <- split(sample(which(Y==1)), rep(1:V, length=length(which(Y==1))))
folds <- vector("list", length=V)
for (v in seq(V)) {folds[[v]] <- c(Y0[[v]], Y1[[v]])}
return(folds)
}
.doFit <- function(v, folds, data){ #Train/test glm for each fold
fit <- glm(Y~., data=data[-folds[[v]],], family=binomial)
pred <- predict(fit, newdata=data[folds[[v]],], type="response")
return(pred)
}
folds <- .cvFolds(Y=data$Y, V=V) #Create folds
predictions <- unlist(sapply(seq(V), .doFit, folds=folds, data=data)) #CV train/predict
predictions[unlist(folds)] <- predictions #Re-order pred values
# Get CV AUC and confidence interval
out <- ci.cvAUC(predictions=predictions, labels=data$Y, folds=folds, confidence=0.95)
return(out)
}
# Load data
library(cvAUC)
data(admissions)
# Get performance
set.seed(1)
out <- iid_example(data=admissions, V=10)
# The output is given as follows:
# > out
# $cvAUC
# [1] 0.9046473
#
# $se
# [1] 0.01620238
#
# $ci
# [1] 0.8728913 0.9364034
#
# $confidence
# [1] 0.95