This function calculates the empirical hit rates for a crowd of forecasters over a testing set. The function takes as its arguments the forecasters' probability integral transform (PIT) values – one for each testing set row – and the prediction interval of interest.
Usage
hitRate(matrixPIT, interval = c(0.25, 0.75))
Arguments
matrixPIT
A ntest-by-nForecaster matrix of PIT values where ntest is the number of rows in the testing set and nForecaster is the number of forecasters. Each column represents a different forecaster's PITs for the testing set. A PIT value is the forecaster's cdf evaluated at the realization of the response in the testing set.
interval
Prediction interval of interest. The default interval=c(0.25, 0.75) is the central 50% prediction interval.
Value
HR
An nForecaster vector of empirical hit rates – one for each forecaster. A forecaster's empirical hit rate is the percentage of PIT values that fall within [interval[1],interval[2]], e.g., [0.25,0.75] according to the default.
Author(s)
Yael Grushka-Cockayne, Victor Richmond R. Jose, Kenneth C. Lichtendahl Jr., and Huanghui Zeng.
References
Grushka-Cockayne Y, Jose VRR, Lichtendahl KC Jr. (2014). Ensembles of overfit and overconfident forecasts, working paper.
See Also
trimTrees, cinbag
Examples
# Load the data
set.seed(201) # Can be removed; useful for replication
data <- as.data.frame(mlbench.friedman1(500, sd=1))
summary(data)
# Prepare data for trimming
train <- data[1:400, ]
test <- data[401:500, ]
xtrain <- train[,-11]
ytrain <- train[,11]
xtest <- test[,-11]
ytest <- test[,11]
# Run trimTrees
set.seed(201) # Can be removed; useful for replication
tt <- trimTrees(xtrain, ytrain, xtest, ytest, trim=0.15)
# Outputs from trimTrees
mean(hitRate(tt$treePITs))
hitRate(tt$trimmedEnsemblePITs)
hitRate(tt$untrimmedEnsemblePITs)