R: Compute error bounds for a fitted regional frequency...
regsimq
R Documentation
Compute error bounds for a fitted regional frequency distribution
Description
Computes, using Monte Carlo simulation, relative error bounds
for estimated quantiles of a regional frequency distribution
fitted by regfit.
Usage
regsimq(qfunc, para, cor = 0, index = NULL, nrec, nrep = 10000,
fit = "gev", f = c(0.01, 0.1, 0.5, 0.9, 0.99, 0.999),
boundprob = c(0.05, 0.95), save = TRUE)
Arguments
qfunc
List containing the quantile functions for each site.
Can also be a single quantile function, which will be used for each site.
para
Parameters of the quantile functions at each site.
If qfunc is a list, para must be a list of the same length
whose components are numeric vectors, the parameters of
the corresponding component of qfunc.
If qfunc is a single quantile function, para can be
a single vector, containing a single set of parameter values
that will be used for each site;
a matrix or data frame whose rows each contain the
parameter values for one site;
or a list of length length(nrec) whose components are
numeric vectors, each containing the parameter values for one site.
cor
Specifies the correlation matrix of the frequency distribution of
each site's data. Can be a matrix (which will be rescaled to a
correlation matrix if necessary) or a constant (which will be taken as
the correlation between each pair of sites).
index
Specifies the value of the site-specific scale factor
(“index flood”) at each site. Can be:
a vector, containing the values at each site;
a constant, which will be taken to be the index flood value at each site;
or (the default) NULL, in which case the index flood will be
computed by numerical integration of the quantile function.
nrec
Numeric vector containing the record lengths at each site.
nrep
Number of simulated regions.
fit
Character string specifying the distribution to be fitted.
See “Details” below.
f
Vector of probabilities corresponding to the quantiles
whose accuracy is to be estimated.
boundprob
Vector of probabilities for which error bounds will
be computed.
save
Logical. Should the simulated values of the ratio of
the estimated to the true regional growth curve be saved?
These values are needed when sitequantbounds is called with
its argument index present, e.g. to compute error bounds for
quantiles at sites other than those whose data were used to fit the
regional frequency distribution (e.g., ungauged sites).
If this computation is not required, storage can be saved
by setting save to FALSE.
Details
A realization of data from a region is generated as follows.
The frequency distributions at the sites (specified by
arguments qfunc and para) are randomly permuted,
and data are simulated from these frequency distributions
with inter-site correlation specified by argument cor
and record lengths at each site specified by argument nrec.
The regional frequency distribution specified by argument fit
is then fitted to the simulated data, as in function regfit.
The procedure is as described in Hosking and Wallis (1997), Table 6.1,
except that the permutation of frequency distributions is a later
modification, intended to give more realistic sets of simulated data.
From each realization the sample mean values and estimates of the
quantiles of the regional growth curve, for nonexceedance probabilities
specified by argument f, are saved.
From the simulated values, for each quantile specified by argument f
the relative root mean square error (relative RMSE) is computed as in
Hosking and Wallis (1997, eq. (6.15)).
Error bounds are also computed, as in Hosking and Wallis (1997, eq. (6.18))
but with bound probabilities specified by argument boundprob
rather than the fixed values 0.05 and 0.95 considered by Hosking and Wallis.
The error bounds are sample quantiles, across the nrep realizations,
of the ratio of the estimated regional growth curve
to the true at-site growth curve
or of the ratio of the estimated to the true quantiles at individual sites.
For distribution fit there should exist a function to estimate
the parameters of the distribution given a set of L-moments.
The function should have a name that is the character string
"pel" followed by the character string fit.
It should accept a single argument, a vector containing L-moments
l_1, l_2, t_3, t_4, etc.,
and return a vector of distribution parameters.
For distribution fit there should also exist a quantile function,
which should have a name that is the character string
"qua" followed by the character string fit.
It should accept two arguments: a vector of probabilities
and a vector containing the parameters of the distribution.
The search path used to find the "pel" and "qua" functions
is the same as for arguments supplied to regsimq, i.e.
the enclosing frames of the function, followed by the search path
specified by search().
The estimation routines and quantile functions in package lmom
have the form described here. For example, to use a
generalized extreme value distribution set fit to be
the string "gev"; then the fitting function pelgev
and the quantile function quagev will be used
(unless these functions have been masked by another object
on the search path).
Value
An object of class "regsimq".
This is a list with the following components:
f
Vector of probabilities corresponding to the quantiles
whose accuracy is to be estimated. A copy of argument f.
boundprob
Vector of probabilities corresponding to the error bounds.
A copy of argument boundprob.
relbounds.rgc
Data frame containing the relative RMSE and
error bounds for the regional growth curve. It has columns
"f", probabilities corresponding to each quantile,
"rel.RMSE", relative RMSE of quantiles of regional growth curve,
and, for each bound probability in boundprob,
a column giving the error bound (quantile of the empirical distribution
of simulated values of the ratio of the estimated regional growth curve
to the true at-site growth curve) for that bound probability.
relbounds.by.site
List of length(nrec) data frames.
Each data frame contains the relative RMSE and error bounds
for quantiles at one site, and has the same structure
as the relbounds.rgc component of the return value.
sim.rgcratio
If save is TRUE, a matrix of dimension
length(f)bynrep, containing the simulated
values of the ratio of the estimated to the true regional growth curve
for quantiles corresponding to probabilities in f.
If save is FALSE, list element
sim.rgcratio is NULL.
Note
Error bounds for the regional growth curve apply to the
regional growth curve regarded as an estimator of the
growth curve for a randomly chosen site.
The growth curve for site i is defined to be
q_i(.) = Q_i(.) / mu_i
where Q_i(.) is the site's quantile function
and mu_i is the site-specific scale factor
(“index flood”) for site i.
For each of the nrep simulated regions,
and each probability F in f,
the regional growth curve qhat(F) is estimated
and the ratios qhat(F)/q_i(F) are calculated.
The relative error bounds are empirical quantiles,
corresponding to the probabilities in boundprob,
of the nrep * length(nrec) values of these ratios
obtained from the simulations.
This differs from the approach taken in Hosking and Wallis (1997),
Table 6.2 and Fig. 6.2, in which error bounds are computed regarding
the estimated regional growth curve as an estimator of the regional
average growth curve q^R(.), the harmonic mean
of the sites' growth curves (Hosking and Wallis, 1997, p. 102).
When the parent region is homogeneous, with identical frequency distributions
at each site (apart from a site-specific scale factor), the two
approaches give identical results. For heterogeneous regions the
“regard as estimator of randomly chosen site growth curve” approach
yields error bounds that are wider, but are more realistic given that the
ultimate aim of regional frequency analysis is estimation of quantiles
at individual sites.
Hosking, J. R. M., and Wallis, J. R. (1997).
Regional frequency analysis: an approach based on L-moments.
Cambridge University Press.
See Also
regfit, for details of fitting a regional frequency distribution;
regquantbounds and sitequantbounds, for
converting the relative bounds returned by regsimq into absolute bounds
for quantiles of the regional growth curve or of the
frequency distributions at individual sites.
Examples
data(Cascades) # A regional data set
rmom <- regavlmom(Cascades) # Regional average L-moments
# Fit generalized normal distribution to regional data
rfit <- regfit(Cascades, "gno")
# Set up an artificial region to be simulated:
# -- Same number of sites as Cascades
# -- Same record lengths as Cascades
# -- Same site means as Cascades
# -- L-CV varies linearly across sites, with mean value equal
# to the regional average L-CV for the Cascades data.
# 'LCVrange' specifies the range of L-CV across the sites.
# -- L-skewness the same at each site, and equal to the regional
# average L-skewness for the Cascades data
nsites <- nrow(Cascades)
means <- Cascades$mean
LCVrange <- 0.025
LCVs <- seq(rmom[2]-LCVrange/2, rmom[2]+LCVrange/2, len=nsites)
Lskews<-rep(rmom[3], nsites)
# Each site will have a generalized normal distribution:
# get the parameter values for each site
pp <- t(apply(cbind(means, means*LCVs ,Lskews), 1, pelgno))
# Set correlation between each pair of sites to 0.64, the
# average inter-site correlation for the Cascades data
avcor <- 0.64
# Run the simulation. To save time, use only 100 replications.
simq <- regsimq(qfunc=quagno, para=pp, cor=avcor, nrec=Cascades$n,
nrep=100, fit="gno")
# Relative RMSE and error bounds for the regional growth curve
simq$relbounds.rgc
# Relative RMSE and error bounds for quantiles at site 3
simq$relbounds.by.site[[3]]