R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Smooth Parameter Estimation and Bootstrapping of Generalized...

game	R Documentation

Smooth Parameter Estimation and Bootstrapping of Generalized Pareto Distributions with Penalized Maximum Likelihood Estimation

Description

gamGPDfit() fits the parameters of a generalized Pareto distribution (GPD) depending on covariates in a non- or semiparametric way.

gamGPDboot() fits and bootstraps the parameters of a GPD distribution depending on covariates in a non- or semiparametric way. Applies the post-blackend bootstrap of Chavez-Demoulin and Davison (2005).

Usage

gamGPDfit(x, threshold, nextremes = NULL, datvar, xiFrhs, nuFrhs,
          init = fit.GPD(x[,datvar], threshold = threshold,
                         type = "pwm", verbose = FALSE)$par.ests,
          niter = 32, include.updates = FALSE, eps.xi = 1e-05, eps.nu = 1e-05,
          progress = TRUE, adjust = TRUE, verbose = FALSE, ...)
gamGPDboot(x, B, threshold, nextremes = NULL, datvar, xiFrhs, nuFrhs,
           init = fit.GPD(x[,datvar], threshold = threshold,
                          type = "pwm", verbose = FALSE)$par.ests,
           niter = 32, include.updates = FALSE, eps.xi = 1e-5, eps.nu = 1e-5,
           boot.progress = TRUE, progress = FALSE, adjust = TRUE, verbose = FALSE,
           debug = FALSE, ...)

Arguments

`x`	data.frame containing the losses (in some component; can be specified with the argument `datvar`; the other components contain the covariates).
`B`	number of bootstrap replications.
`threshold`	threshold of the peaks-over-threshold (POT) method.
`nextremes`	number of excesses. This can be used to determine
`datvar`	name of the data column in `x` which contains the the data to be modeled.
`xiFrhs`	right-hand side of the formula for xi in the `gam()` call for fitting xi.
`nuFrhs`	right-hand side of the formula for nu in the `gam()` call for fitting nu.
`init`	bivariate vector containing initial values for (xi, beta).
`niter`	maximal number of iterations in the backfitting algorithm.
`include.updates`	`logical` indicating whether updates for xi and nu are returned as well (note: this might lead to objects of large size).
`eps.xi`	epsilon for stop criterion for xi.
`eps.nu`	epsilon for stop criterion for nu.
`boot.progress`	`logical` indicating whether progress information about `gamGPDboot()` is displayed.
`progress`	`logical` indicating whether progress information about `gamGPDfit()` is displayed. For `gamGPDboot()`, `progress` is only passed to `gamGPDfit()` in the case that `boot.progress==TRUE`.
`adjust`	`logical` indicating whether non-real values of the derivatives are adjusted.
`verbose`	`logical` indicating whether additional information (in case of undesired behavior) is printed. For `gamGPDboot()`, `progress` is only passed to `gamGPDfit()` if `boot.progress==TRUE`.
`debug`	`logical` indicating whether initial fit (before the bootstrap is initiated) is saved.
`...`	additional arguments passed to `gam()` (which is called internally; see the source code of `gamGPDfitUp()`).

Details

gamGPDfit() fits the parameters xi and beta of the generalized Pareto distribution GPD(xi,beta) depending on covariates in a non- or semiparametric way. The distribution function is given by

G[xi,beta](x)=1-(1+xi x/beta)^(-1/xi), x>=0,

for xi>0 (which is what we assume) and beta>0. Note that β is also denoted by σ in this package. Estimation of xi and beta by gamGPDfit() is done via penalized maximum likelihood estimation, where the estimators are computed with a backfitting algorithm. In order to guarantee convergence of this algorithm, a reparameterization of beta in terms of the parameter nu is done via

beta=exp(nu)/(1+xi).

The parameters xi and nu (and thus beta) are allowed to depend on covariates (including time) in a non- or semiparametric way, for example:

xi=xi(x,t)=x^Talpha[xi]+h[xi](t),

nu=nu(x,t)=x^Talpha[nu]+h[nu](t),

where x denotes the vector of covariates, alpha[xi], alpha[nu] are parameter vectors and h[xi], h[nu] are regression splines. For more details, see the references and the source code.

gamGPDboot() first fits the GPD parameters via gamGPDfit(). It then conducts the post-blackend bootstrap of Chavez-Demoulin and Davison (2005). To this end, it computes the residuals, resamples them (B times), reconstructs the corresponding excesses, and refits the GPD parameters via gamGPDfit() again.

Note that if gam() fails in gamGPDfit() or the fitting or one of the bootstrap replications in gamGPDboot(), then the output object contains (an) empty (sub)list(s). These failures typically happen for too small sample sizes.

Value

gamGPDfit() returns either an empty list (list(); in case at least one of the two gam() calls in the internal function gamGPDfitUp() fails) or a list with the components

xi:: estimated parameters xi;
beta:: estimated parameters beta;
nu:: estimated parameters nu;
se.xi:: standard error for xi ((possibly adjusted) second-order derivative of the reparameterized log-likelihood with respect to xi) multiplied by -1;
se.nu:: standard error for nu ((possibly adjusted) second-order derivative of the reparameterized log-likelihood with respect to nu) multiplied by -1;
xi.covar:: (unique) covariates for xi;
nu.covar:: (unique) covariates for nu;
covar:: available covariate combinations used for fitting beta (xi, nu);
y:: vector of excesses (exceedances minus threshold);
res:: residuals;
MRD:: mean relative distances between for all iterations, calculated between old parameters (xi, nu) (from the last iteration) and new parameters (currently estimated ones);
logL:: log-likelihood at the estimated parameters;
xiObj:: R object of type gamObject for estimated xi (returned by mgcv::gam());
nuObj:: R object of type gamObject for estimated nu (returned by mgcv::gam());
xiUpdates:: if include.updates is TRUE, updates for xi for each iteration. This is a list of R objects of type gamObject which contains xiObj as last element;
nuUpdates:: if include.updates is TRUE, updates for nu for each iteration. This is a list of R objects of type gamObject which contains nuObj as last element;

gamGPDboot() returns a list of length B+1 where the first component contains the results of the initial fit via gamGPDfit() and the other B components contain the results for each replication of the post-blackend bootstrap. Components for which gam() fails (e.g., due to too few data) are given as empty lists (list()).

Author(s)

Marius Hofert, Valerie Chavez-Demoulin.

References

Chavez-Demoulin, V., and Davison, A. C. (2005). Generalized additive models for sample extremes. Applied Statistics 54(1), 207–222.

Chavez-Demoulin, V., Embrechts, P., and Hofert, M. (2014). An extreme value approach for modeling Operational Risk losses depending on covariates. Journal of Risk and Insurance; accepted.

Examples

## generate an example data set
years <- 2003:2012 # years
nyears <- length(years)
n <- 250 # sample size for each (different) xi
u <- 200 # threshold
rGPD <- function(n, xi, beta) ((1-runif(n))^(-xi)-1)*beta/xi # sampling GPD

set.seed(17) # setting seed
xi.true.A <- seq(0.4, 0.8, length=nyears) # true xi for group "A"
## generate losses for group "A"
lossA <- unlist(lapply(1:nyears,
                       function(y) u + rGPD(n, xi=xi.true.A[y], beta=1)))
xi.true.B <- xi.true.A^2 # true xi for group "B"
## generate losses for group "B"
lossB <- unlist(lapply(1:nyears,
                       function(y) u + rGPD(n, xi=xi.true.B[y], beta=1)))
## build data frame
time <- rep(rep(years, each=n), 2) # "2" stands for the two groups
covar <- rep(c("A","B"), each=n*nyears)
value <- c(lossA, lossB)
x <- data.frame(covar=covar, time=time, value=value)

## fit
eps <- 1e-3 # to decrease the run time for this example
require(mgcv) # due to s()
fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar+s(time)-1,
                 nuFrhs=~covar+s(time)-1, eps.xi=eps, eps.nu=eps)
## note: choosing s(..., bs="cr") will fit cubic splines

## grab the fitted values per group and year
xi.fit <- fitted(fit$xiObj)
xi.fit. <- xi.fit[1+(0:(2*nyears-1))*n] # pick fit for each group and year
xi.fit.A <- xi.fit.[1:nyears] # fit for "A" and each year
xi.fit.B <- xi.fit.[(nyears+1):(2*nyears)] # fit for "B" and each year

## plot the fitted values of xi and the true ones we simulated from
par(mfrow=c(1,2))
plot(years, xi.true.A, type="l", ylim=range(xi.true.A, xi.fit.A),
     main="Group A", xlab="Year", ylab=expression(xi))
points(years, xi.fit.A, type="l", col="red")
legend("topleft", inset=0.04, lty=1, col=c("black", "red"),
       legend=c("true", "fitted"), bty="n")
plot(years, xi.true.B, type="l", ylim=range(xi.true.B, xi.fit.B),
     main="Group B", xlab="Year", ylab=expression(xi))
points(years, xi.fit.B, type="l", col="blue")
legend("topleft", inset=0.04, lty=1, col=c("black", "blue"),
       legend=c("true", "fitted"), bty="n")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(QRM)
Loading required package: gsl
Loading required package: Matrix
Loading required package: mvtnorm
Loading required package: numDeriv
Loading required package: timeSeries
Loading required package: timeDate

Attaching package: 'QRM'

The following object is masked from 'package:base':

    lbeta

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/QRM/game.Rd_%03d_medium.png", width=480, height=480)
> ### Name: game
> ### Title: Smooth Parameter Estimation and Bootstrapping of Generalized
> ###   Pareto Distributions with Penalized Maximum Likelihood Estimation
> ### Aliases: gamGPDfit gamGPDboot
> ### Keywords: generalized Pareto distribution distribution multivariate
> 
> ### ** Examples
> 
> ## generate an example data set
> years <- 2003:2012 # years
> nyears <- length(years)
> n <- 250 # sample size for each (different) xi
> u <- 200 # threshold
> rGPD <- function(n, xi, beta) ((1-runif(n))^(-xi)-1)*beta/xi # sampling GPD
> 
> set.seed(17) # setting seed
> xi.true.A <- seq(0.4, 0.8, length=nyears) # true xi for group "A"
> ## generate losses for group "A"
> lossA <- unlist(lapply(1:nyears,
+                        function(y) u + rGPD(n, xi=xi.true.A[y], beta=1)))
> xi.true.B <- xi.true.A^2 # true xi for group "B"
> ## generate losses for group "B"
> lossB <- unlist(lapply(1:nyears,
+                        function(y) u + rGPD(n, xi=xi.true.B[y], beta=1)))
> ## build data frame
> time <- rep(rep(years, each=n), 2) # "2" stands for the two groups
> covar <- rep(c("A","B"), each=n*nyears)
> value <- c(lossA, lossB)
> x <- data.frame(covar=covar, time=time, value=value)
> 
> ## fit
> eps <- 1e-3 # to decrease the run time for this example
> require(mgcv) # due to s()
Loading required package: mgcv
Loading required package: nlme
This is mgcv 1.8-12. For overview type 'help("mgcv-package")'.
> fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar+s(time)-1,
+                  nuFrhs=~covar+s(time)-1, eps.xi=eps, eps.nu=eps)
Mean relative differences in iteration 1 (xi, nu): (0.134418, 0.267671)
Mean relative differences in iteration 2 (xi, nu): (0.0902781, 0.00395381)
Mean relative differences in iteration 3 (xi, nu): (0.047209, 0.00224907)
Mean relative differences in iteration 4 (xi, nu): (0.0237896, 0.000663061)
Mean relative differences in iteration 5 (xi, nu): (0.0117614, 0.000237491)
Mean relative differences in iteration 6 (xi, nu): (0.00578603, 9.70989e-05)
Mean relative differences in iteration 7 (xi, nu): (0.00285222, 4.36456e-05)
Mean relative differences in iteration 8 (xi, nu): (0.0014202, 2.06884e-05)
Mean relative differences in iteration 9 (xi, nu): (0.00070896, 1.01373e-05)
> ## note: choosing s(..., bs="cr") will fit cubic splines
> 
> ## grab the fitted values per group and year
> xi.fit <- fitted(fit$xiObj)
> xi.fit. <- xi.fit[1+(0:(2*nyears-1))*n] # pick fit for each group and year
> xi.fit.A <- xi.fit.[1:nyears] # fit for "A" and each year
> xi.fit.B <- xi.fit.[(nyears+1):(2*nyears)] # fit for "B" and each year
> 
> ## plot the fitted values of xi and the true ones we simulated from
> par(mfrow=c(1,2))
> plot(years, xi.true.A, type="l", ylim=range(xi.true.A, xi.fit.A),
+      main="Group A", xlab="Year", ylab=expression(xi))
> points(years, xi.fit.A, type="l", col="red")
> legend("topleft", inset=0.04, lty=1, col=c("black", "red"),
+        legend=c("true", "fitted"), bty="n")
> plot(years, xi.true.B, type="l", ylim=range(xi.true.B, xi.fit.B),
+      main="Group B", xlab="Year", ylab=expression(xi))
> points(years, xi.fit.B, type="l", col="blue")
> legend("topleft", inset=0.04, lty=1, col=c("black", "blue"),
+        legend=c("true", "fitted"), bty="n")
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>