R: Kernel Density Estimate and GPD Tail Extreme Value Mixture...
kdengpdcon
R Documentation
Kernel Density Estimate and GPD Tail Extreme Value Mixture Model With
Single Continuity Constraint
Description
Density, cumulative distribution function, quantile function and
random number generation for the extreme value mixture model with kernel density
estimate for bulk distribution upto the threshold and conditional GPD above threshold
with continuity at threshold. The parameters
are the bandwidth lambda, threshold u
GPD shape xi and tail fraction phiu.
kernel centres (typically sample data vector or scalar)
lambda
bandwidth for kernel (as half-width of kernel) or NULL
u
threshold
xi
shape parameter
phiu
probability of being above threshold [0, 1] or TRUE
bw
bandwidth for kernel (as standard deviations of kernel) or NULL
kernel
kernel name (default = "gaussian")
log
logical, if TRUE then log density
q
quantiles
lower.tail
logical, if FALSE then upper tail probabilities
p
cumulative probabilities
n
sample size (positive integer)
Details
Extreme value mixture model combining kernel density estimate (KDE) for the bulk
below the threshold and GPD for upper tail with continuity at threshold.
The user can pre-specify phiu
permitting a parameterised value for the tail fraction φ_u. Alternatively, when
phiu=TRUE the tail fraction is estimated as the tail fraction from the
KDE bulk model.
The alternate bandwidth definitions are discussed in the
kernels, with the lambda as the default.
The bw specification is the same as used in the
density function.
The possible kernels are also defined in kernels
with the "gaussian" as the default choice.
The cumulative distribution function with tail fraction φ_u defined by the
upper tail fraction of the kernel density estimate (phiu=TRUE), upto the
threshold x ≤ u, given by:
F(x) = H(x)
and above the threshold x > u:
F(x) = H(u) + [1 - H(u)] G(x)
where H(x) and G(X) are the KDE and conditional GPD
cumulative distribution functions respectively.
The cumulative distribution function for pre-specified φ_u, upto the
threshold x ≤ u, is given by:
F(x) = (1 - φ_u) H(x)/H(u)
and above the threshold x > u:
F(x) = φ_u + [1 - φ_u] G(x)
Notice that these definitions are equivalent when φ_u = 1 - H(u).
The continuity constraint means that (1 - φ_u) h(u)/H(u) = φ_u g(u)
where h(x) and g(x) are the KDE and conditional GPD
density functions respectively. The resulting GPD scale parameter is then:
σ_u = φ_u H(u) / [1 - φ_u] h(u)
.
In the special case of where the tail fraction is defined by the bulk model this reduces to
σ_u = [1 - H(u)] / h(u)
.
If no bandwidth is provided lambda=NULL and bw=NULL then the normal
reference rule is used, using the bw.nrd0 function, which is
consistent with the density function. At least two kernel
centres must be provided as the variance needs to be estimated.
See gpd for details of GPD upper tail component and
dkden for details of KDE bulk component.
Value
dkdengpdcon gives the density,
pkdengpdcon gives the cumulative distribution function,
qkdengpdcon gives the quantile function and
rkdengpdcon gives a random sample.
Acknowledgments
Based on code
by Anna MacDonald produced for MATLAB.
Note
Unlike most of the other extreme value mixture model functions the
kdengpdcon functions have not been vectorised as
this is not appropriate. The main inputs (x, p or q)
must be either a scalar or a vector, which also define the output length.
The kerncentres can also be a scalar or vector.
The kernel centres kerncentres can either be a single datapoint or a vector
of data. The kernel centres (kerncentres) and locations to evaluate density (x)
and cumulative distribution function (q) would usually be different.
Default values are provided for all inputs, except for the fundamentals
kerncentres, x, q and p. The default sample size for
rkdengpdcon is 1.
Missing (NA) and Not-a-Number (NaN) values in x,
p and q are passed through as is and infinite values are set to
NA. None of these are not permitted for the parameters or kernel centres.
Due to symmetry, the lower tail can be described by GPD by negating the quantiles.
Error checking of the inputs (e.g. invalid probabilities) is carried out and
will either stop or give warning message as appropriate.
Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value
threshold estimation and uncertainty quantification. REVSTAT - Statistical
Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf
Bowman, A.W. (1984). An alternative method of cross-validation for the smoothing of
density estimates. Biometrika 71(2), 353-360.
Duin, R.P.W. (1976). On the choice of smoothing parameters for Parzen estimators of
probability density functions. IEEE Transactions on Computers C25(11), 1175-1179.
MacDonald, A., Scarrott, C.J., Lee, D., Darlow, B., Reale, M. and Russell, G. (2011).
A flexible extreme value mixture model. Computational Statistics and Data Analysis
55(6), 2137-2157.
Wand, M. and Jones, M.C. (1995). Kernel Smoothing. Chapman && Hall.
See Also
kernels, kfun,
density, bw.nrd0
and dkde in ks package.