R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Empirical Hit Rates for a Crowd of Forecasters

hitRate

R Documentation

Empirical Hit Rates for a Crowd of Forecasters

Description

This function calculates the empirical hit rates for a crowd of forecasters over a testing set. The function takes as its arguments the forecasters' probability integral transform (PIT) values – one for each testing set row – and the prediction interval of interest.

Usage

hitRate(matrixPIT, interval = c(0.25, 0.75))

Arguments

`matrixPIT`	A `ntest`-by-`nForecaster` matrix of PIT values where `ntest` is the number of rows in the testing set and `nForecaster` is the number of forecasters. Each column represents a different forecaster's PITs for the testing set. A PIT value is the forecaster's cdf evaluated at the realization of the response in the testing set.
`interval`	Prediction interval of interest. The default `interval=c(0.25, 0.75)` is the central 50% prediction interval.

Value

`HR`	An `nForecaster` vector of empirical hit rates – one for each forecaster. A forecaster's empirical hit rate is the percentage of PIT values that fall within [`interval[1]`,`interval[2]`], e.g., [0.25,0.75] according to the default.

Author(s)

Yael Grushka-Cockayne, Victor Richmond R. Jose, Kenneth C. Lichtendahl Jr., and Huanghui Zeng.

References

Grushka-Cockayne Y, Jose VRR, Lichtendahl KC Jr. (2014). Ensembles of overfit and overconfident forecasts, working paper.

Examples

# Load the data
set.seed(201) # Can be removed; useful for replication
data <- as.data.frame(mlbench.friedman1(500, sd=1))
summary(data)

# Prepare data for trimming
train <- data[1:400, ]
test <- data[401:500, ]
xtrain <- train[,-11]  
ytrain <- train[,11]
xtest <- test[,-11]
ytest <- test[,11]
      
# Run trimTrees
set.seed(201) # Can be removed; useful for replication
tt <- trimTrees(xtrain, ytrain, xtest, ytest, trim=0.15)

# Outputs from trimTrees
mean(hitRate(tt$treePITs))
hitRate(tt$trimmedEnsemblePITs)
hitRate(tt$untrimmedEnsemblePITs)