R: Smooth Parameter Estimation and Bootstrapping of Generalized...
game
R Documentation
Smooth Parameter Estimation and Bootstrapping of Generalized Pareto Distributions
with Penalized Maximum Likelihood Estimation
Description
gamGPDfit() fits the parameters of a generalized Pareto
distribution (GPD) depending on covariates in a non- or semiparametric
way.
gamGPDboot() fits and bootstraps the parameters of a GPD
distribution depending on covariates in a non- or semiparametric
way. Applies the post-blackend bootstrap of Chavez-Demoulin and
Davison (2005).
data.frame containing the losses (in some component; can be
specified with the argument datvar; the other components
contain the covariates).
B
number of bootstrap replications.
threshold
threshold of the peaks-over-threshold (POT)
method.
nextremes
number of excesses. This can be used to determine
datvar
name of the data column in x which contains the
the data to be modeled.
xiFrhs
right-hand side of the formula for xi in
the gam() call for fitting xi.
nuFrhs
right-hand side of the formula for nu in
the gam() call for fitting nu.
init
bivariate vector containing initial values
for (xi, beta).
niter
maximal number of iterations in the backfitting
algorithm.
include.updates
logical indicating whether
updates for xi and nu are returned as well (note: this might lead to
objects of large size).
eps.xi
epsilon for stop criterion for xi.
eps.nu
epsilon for stop criterion for nu.
boot.progress
logical indicating
whether progress information about gamGPDboot() is displayed.
progress
logical indicating whether progress information about
gamGPDfit() is displayed. For gamGPDboot(),
progress is only passed to gamGPDfit() in the case that
boot.progress==TRUE.
adjust
logical indicating whether non-real values
of the derivatives are adjusted.
verbose
logical indicating whether additional
information (in case of undesired behavior) is printed. For gamGPDboot(),
progress is only passed to gamGPDfit() if
boot.progress==TRUE.
debug
logical indicating whether initial fit
(before the bootstrap is initiated) is saved.
...
additional arguments passed to gam() (which is
called internally; see the source code of gamGPDfitUp()).
Details
gamGPDfit() fits the parameters xi and
beta of the generalized Pareto distribution
GPD(xi,beta) depending on covariates in
a non- or semiparametric way. The distribution function is given by
G[xi,beta](x)=1-(1+xi x/beta)^(-1/xi), x>=0,
for xi>0 (which is what we assume) and
beta>0. Note that β is also denoted by
σ in this package. Estimation of xi
and beta by gamGPDfit() is done via penalized maximum
likelihood estimation, where the estimators are computed with a
backfitting algorithm. In order to guarantee convergence of this
algorithm, a reparameterization of beta in terms of the parameter
nu is done via
beta=exp(nu)/(1+xi).
The parameters xi and nu (and thus
beta) are allowed to depend on covariates (including
time) in a non- or semiparametric way, for example:
xi=xi(x,t)=x^Talpha[xi]+h[xi](t),
nu=nu(x,t)=x^Talpha[nu]+h[nu](t),
where x denotes the vector of covariates,
alpha[xi], alpha[nu]
are parameter vectors and h[xi], h[nu] are
regression splines. For more details, see the references and the source
code.
gamGPDboot() first fits the GPD parameters via
gamGPDfit(). It then conducts the post-blackend bootstrap of
Chavez-Demoulin and Davison (2005). To this end, it computes the
residuals, resamples them (B times), reconstructs the
corresponding excesses, and refits the GPD parameters via
gamGPDfit() again.
Note that if gam() fails in gamGPDfit() or the
fitting or one of the bootstrap replications in gamGPDboot(),
then the output object contains (an) empty (sub)list(s). These
failures typically happen for too small sample sizes.
Value
gamGPDfit() returns either an empty list (list(); in
case at least one of the two gam() calls in the internal
function gamGPDfitUp() fails) or a list with the components
xi:
estimated parameters xi;
beta:
estimated parameters beta;
nu:
estimated parameters nu;
se.xi:
standard error for xi ((possibly
adjusted) second-order derivative of the reparameterized
log-likelihood with respect to xi) multiplied by -1;
se.nu:
standard error for nu ((possibly
adjusted) second-order derivative of the reparameterized
log-likelihood with respect to nu) multiplied by -1;
xi.covar:
(unique) covariates for xi;
nu.covar:
(unique) covariates for nu;
covar:
available covariate combinations used for
fitting beta (xi, nu);
y:
vector of excesses (exceedances minus threshold);
res:
residuals;
MRD:
mean relative distances between for all
iterations, calculated between old parameters (xi,
nu) (from the last iteration) and new parameters (currently
estimated ones);
logL:
log-likelihood at the estimated parameters;
xiObj:
R object of type gamObject for estimated
xi (returned by mgcv::gam());
nuObj:
R object of type gamObject for estimated
nu (returned by mgcv::gam());
xiUpdates:
if include.updates is
TRUE, updates for xi for each
iteration. This is a list of R objects of type gamObject
which contains xiObj as last element;
nuUpdates:
if include.updates is
TRUE, updates for nu for each
iteration. This is a list of R objects of type gamObject
which contains nuObj as last element;
gamGPDboot() returns a list of length B+1 where
the first component contains the results of
the initial fit via gamGPDfit() and the other B
components contain the results for each replication of the
post-blackend bootstrap. Components for which gam()
fails (e.g., due to too few data) are given as empty lists (list()).
Author(s)
Marius Hofert, Valerie Chavez-Demoulin.
References
Chavez-Demoulin, V., and Davison, A. C. (2005).
Generalized additive models for sample extremes.
Applied Statistics54(1), 207–222.
Chavez-Demoulin, V., Embrechts, P., and Hofert, M. (2014).
An extreme value approach for modeling Operational
Risk losses depending on covariates.
Journal of Risk and Insurance; accepted.
Examples
## generate an example data set
years <- 2003:2012 # years
nyears <- length(years)
n <- 250 # sample size for each (different) xi
u <- 200 # threshold
rGPD <- function(n, xi, beta) ((1-runif(n))^(-xi)-1)*beta/xi # sampling GPD
set.seed(17) # setting seed
xi.true.A <- seq(0.4, 0.8, length=nyears) # true xi for group "A"
## generate losses for group "A"
lossA <- unlist(lapply(1:nyears,
function(y) u + rGPD(n, xi=xi.true.A[y], beta=1)))
xi.true.B <- xi.true.A^2 # true xi for group "B"
## generate losses for group "B"
lossB <- unlist(lapply(1:nyears,
function(y) u + rGPD(n, xi=xi.true.B[y], beta=1)))
## build data frame
time <- rep(rep(years, each=n), 2) # "2" stands for the two groups
covar <- rep(c("A","B"), each=n*nyears)
value <- c(lossA, lossB)
x <- data.frame(covar=covar, time=time, value=value)
## fit
eps <- 1e-3 # to decrease the run time for this example
require(mgcv) # due to s()
fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar+s(time)-1,
nuFrhs=~covar+s(time)-1, eps.xi=eps, eps.nu=eps)
## note: choosing s(..., bs="cr") will fit cubic splines
## grab the fitted values per group and year
xi.fit <- fitted(fit$xiObj)
xi.fit. <- xi.fit[1+(0:(2*nyears-1))*n] # pick fit for each group and year
xi.fit.A <- xi.fit.[1:nyears] # fit for "A" and each year
xi.fit.B <- xi.fit.[(nyears+1):(2*nyears)] # fit for "B" and each year
## plot the fitted values of xi and the true ones we simulated from
par(mfrow=c(1,2))
plot(years, xi.true.A, type="l", ylim=range(xi.true.A, xi.fit.A),
main="Group A", xlab="Year", ylab=expression(xi))
points(years, xi.fit.A, type="l", col="red")
legend("topleft", inset=0.04, lty=1, col=c("black", "red"),
legend=c("true", "fitted"), bty="n")
plot(years, xi.true.B, type="l", ylim=range(xi.true.B, xi.fit.B),
main="Group B", xlab="Year", ylab=expression(xi))
points(years, xi.fit.B, type="l", col="blue")
legend("topleft", inset=0.04, lty=1, col=c("black", "blue"),
legend=c("true", "fitted"), bty="n")
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(QRM)
Loading required package: gsl
Loading required package: Matrix
Loading required package: mvtnorm
Loading required package: numDeriv
Loading required package: timeSeries
Loading required package: timeDate
Attaching package: 'QRM'
The following object is masked from 'package:base':
lbeta
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/QRM/game.Rd_%03d_medium.png", width=480, height=480)
> ### Name: game
> ### Title: Smooth Parameter Estimation and Bootstrapping of Generalized
> ### Pareto Distributions with Penalized Maximum Likelihood Estimation
> ### Aliases: gamGPDfit gamGPDboot
> ### Keywords: generalized Pareto distribution distribution multivariate
>
> ### ** Examples
>
> ## generate an example data set
> years <- 2003:2012 # years
> nyears <- length(years)
> n <- 250 # sample size for each (different) xi
> u <- 200 # threshold
> rGPD <- function(n, xi, beta) ((1-runif(n))^(-xi)-1)*beta/xi # sampling GPD
>
> set.seed(17) # setting seed
> xi.true.A <- seq(0.4, 0.8, length=nyears) # true xi for group "A"
> ## generate losses for group "A"
> lossA <- unlist(lapply(1:nyears,
+ function(y) u + rGPD(n, xi=xi.true.A[y], beta=1)))
> xi.true.B <- xi.true.A^2 # true xi for group "B"
> ## generate losses for group "B"
> lossB <- unlist(lapply(1:nyears,
+ function(y) u + rGPD(n, xi=xi.true.B[y], beta=1)))
> ## build data frame
> time <- rep(rep(years, each=n), 2) # "2" stands for the two groups
> covar <- rep(c("A","B"), each=n*nyears)
> value <- c(lossA, lossB)
> x <- data.frame(covar=covar, time=time, value=value)
>
> ## fit
> eps <- 1e-3 # to decrease the run time for this example
> require(mgcv) # due to s()
Loading required package: mgcv
Loading required package: nlme
This is mgcv 1.8-12. For overview type 'help("mgcv-package")'.
> fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar+s(time)-1,
+ nuFrhs=~covar+s(time)-1, eps.xi=eps, eps.nu=eps)
Mean relative differences in iteration 1 (xi, nu): (0.134418, 0.267671)
Mean relative differences in iteration 2 (xi, nu): (0.0902781, 0.00395381)
Mean relative differences in iteration 3 (xi, nu): (0.047209, 0.00224907)
Mean relative differences in iteration 4 (xi, nu): (0.0237896, 0.000663061)
Mean relative differences in iteration 5 (xi, nu): (0.0117614, 0.000237491)
Mean relative differences in iteration 6 (xi, nu): (0.00578603, 9.70989e-05)
Mean relative differences in iteration 7 (xi, nu): (0.00285222, 4.36456e-05)
Mean relative differences in iteration 8 (xi, nu): (0.0014202, 2.06884e-05)
Mean relative differences in iteration 9 (xi, nu): (0.00070896, 1.01373e-05)
> ## note: choosing s(..., bs="cr") will fit cubic splines
>
> ## grab the fitted values per group and year
> xi.fit <- fitted(fit$xiObj)
> xi.fit. <- xi.fit[1+(0:(2*nyears-1))*n] # pick fit for each group and year
> xi.fit.A <- xi.fit.[1:nyears] # fit for "A" and each year
> xi.fit.B <- xi.fit.[(nyears+1):(2*nyears)] # fit for "B" and each year
>
> ## plot the fitted values of xi and the true ones we simulated from
> par(mfrow=c(1,2))
> plot(years, xi.true.A, type="l", ylim=range(xi.true.A, xi.fit.A),
+ main="Group A", xlab="Year", ylab=expression(xi))
> points(years, xi.fit.A, type="l", col="red")
> legend("topleft", inset=0.04, lty=1, col=c("black", "red"),
+ legend=c("true", "fitted"), bty="n")
> plot(years, xi.true.B, type="l", ylim=range(xi.true.B, xi.fit.B),
+ main="Group B", xlab="Year", ylab=expression(xi))
> points(years, xi.fit.B, type="l", col="blue")
> legend("topleft", inset=0.04, lty=1, col=c("black", "blue"),
+ legend=c("true", "fitted"), bty="n")
>
>
>
>
>
> dev.off()
null device
1
>