Last data update: 2014.03.03

R: Compute error bounds for a regional frequency distribution
regquantboundsR Documentation

Compute error bounds for a regional frequency distribution

Description

For a regional frequency distribution, the functions compute the root mean square error (RMSE) and error bounds for quantiles either of the regional growth curve (regquantbounds) or of distributions at individual sites (sitequantbounds).

Usage

regquantbounds(relbounds, rfd)

sitequantbounds(relbounds, rfd, sitenames, index, seindex, drop = TRUE)

Arguments

relbounds

An object of class "regsimq", the result of calling function regsimq to simulate relative RMSE and error bounds for a regional frequency distribution.

rfd

An object of class "rfd", containing the specification of a regional frequency distribution.

sitenames

Vector of site names.

index

Values of the estimated site-specific scale factor (“index flood”) for the sites.

seindex

Standard errors of the estimates in index.

drop

Logical: if TRUE and there is only one site, the value returned from sitequantbounds will be an object of class "rfdbounds" rather than a list containing one such object.

Details

The relative RMSE values from relbounds are multiplied by the quantile values from rfd to yield absolute RMSE values for quantile estimates, and the quantile values from rfd are divided by the error bounds from relbounds to yield error bounds for quantiles, as in Hosking and Wallis (1997), eq. (6.19). These computations apply to quantiles either of the regional growth curve (for regquantbounds) or of the frequency distributions at individual sites (for sitequantbounds).

If argument index of sitequantbounds is missing, then results (RMSE and error bounds of quantiles) are computed for sites in the region specified by rfd and its index component, assuming that the site-specific scale factor (“index flood”) is estimated by the sample mean at each site, computed from the same data set that was used to fit the regional frequency distribution.

If index and sitenames are both missing, then results will be computed for all of the sites in the region specified by rfd.

If index is missing and sitenames is present, then error bounds will be computed for a subset of the sites in the region specified by rfd. sitenames will be used to select sites from the vector rfd$index, either by position or by name.

If argument index of sitequantbounds is present, then results are computed for arbitrary sites (for example, ungauged sites for which the regional growth curve of the regional frequency distribution rfd is believed to apply), assuming that the site-specific scale factor (“index flood”) is estimated from data that are (approximately) statistically independent of the data used to fit the regional frequency distribution. In this case relbounds$sim.rgcratio must not be NULL, i.e. relbounds should have been generated by a call to regsimq with argument save=TRUE.

If index and sitenames are both present, they must have the same length, and will be taken to refer to sites whose names are the elements of sitenames and whose index-flood values are the elements of index.

If index is present and sitenames is missing, results are computed for sites whose index-flood values are the elements of index; if index has names, these names will be used as the site names.

When index and seindex are specified, it is assumed in the simulation procedure that the relative estimation error is lognormally distributed, i.e. that the logarithm of the ratio of the estimated to the true index flood value has a normal distribution with mean 0 and standard deviation seindex/index.

As noted by Hosking and Wallis (1997, discussion following (6.19)), error bounds in the lower tail of the distribution may be unhelpful when the fitted distribution can take negative values. In these cases the computed bounds will be NA (if the quantile estimate is negative) or Inf (if the quantile estimate is positive but the corresponding error bound in relbounds is negative).

Value

For regquantbounds, an object of class "rfdbounds". This is a data frame with columns f, probabilities for which quantiles are estimated; qhat, estimated quantiles; RMSE, RMSE of the estimated quantiles. Also, for each bound probability in relbounds$boundprob, there is a column containing the error bound corresponding to that probability. The object also has an attribute "boundprob" that contains the bound probabilities.

For sitequantbounds, a list each of whose components is an object of class "rfdbounds" containing results for one site. In this case the second column of the data frame is named Qhat, not qhat. If drop is TRUE and the list has one component, a single "rfdbounds" object is returned.

Note

For a region that is confidently believed to be homogeneous, the region used to generate the results in relbounds may be the same as that specified by rfd. In practice, it is often acknowledged that some degree of heterogeneity is present in the data to which the distribution rfd is fitted. The simulations used in function regsimq to generate relbounds can then be based on a region whose specification includes an appropriate degree of heterogeneity, and the error bounds calculated by regquantbounds and sitequantbounds will honestly reflect the failure of the assumption of homogeneity made by regfit (i.e. that the at-site growth curves are the same for all sites in the region) to hold exactly. The example below illustrates this practice.

Author(s)

J. R. M. Hosking jrmhosking@gmail.com

References

Hosking, J. R. M., and Wallis, J. R. (1997). Regional frequency analysis: an approach based on L-moments. Cambridge University Press.

See Also

regsimq, which runs the simulations that generate the results returned by regquantbounds.

Examples

data(Cascades)              # A regional data set

rmom <- regavlmom(Cascades) # Regional average L-moments

# Fit a generalized normal distribution to the regional data
rfit <- regfit(Cascades, "gno")

# Set up an artificial region to be simulated:
# -- Same number of sites as Cascades
# -- Same record lengths as Cascades
# -- Same site means as Cascades
# -- L-CV varies linearly across sites, with mean value equal
#    to the regional average L-CV for the Cascades data.
#      'LCVrange' specifies the  range of L-CV across the sites,
#    and is chosen to reflect the amount of heterogeneity that
#    may reasonably be believed to be present in the Cascades
#    data (see the example for 'regsimh').
# -- L-skewness is the same at each site, and is equal to the
#    regional average L-skewness for the Cascades data
nsites <- nrow(Cascades)
means <- Cascades$mean
LCVrange <- 0.025
LCVs <- seq(rmom[2]-LCVrange/2, rmom[2]+LCVrange/2, len=nsites)
Lskews<-rep(rmom[3], nsites)

# Each site will have a generalized normal distribution:
# get the parameter values for each site
pp <- t(apply(cbind(means, means*LCVs ,Lskews), 1, pelgno))
pp

# Set correlation between each pair of sites to 0.64, the
# average inter-site correlation for the Cascades data
avcor <- 0.64

# Run the simulation.  To save time, use only 100 replications.
simq <- regsimq(qfunc=quagno, para=pp, cor=avcor, nrec=Cascades$n,  nrep=100, fit="gno")

# Apply the simulated bounds to the estimated regional growth curve
regquantbounds(simq, rfit)

# Apply the simulated bounds to quantiles for site 3
sitequantbounds(simq, rfit, site=3)

# Apply the simulated bounds to quantiles for a site whose mean
# is estimated to be 100 with standard error 25
sitequantbounds(simq, rfit, index=100, seindex=25)

Results