Goodness-of-fit tests for copulas based on the empirical process
comparing the empirical copula with a parametric estimate of the
copula derived under the null hypothesis.
Approximate p-values for the test statistic can be obtained either
using the parametric bootstrap (see the two first
references) or by means of a fast multiplier approach
(see references three and four).
The default test statistic, "Sn", is the Cramer-von Mises functional
S[n] defined
in Equation (2) of Genest, Remillard and Beaudoin (2009).
The prinicipal function is gofCopula() which, depending on
simulation either calls gofPB() or gofMB().
object of class "copula" representing the
hypothesized copula family.
x
a data matrix that will be transformed to pseudo-observations.
N
number of bootstrap or multiplier replications to be used to
simulate realizations of the test statistic under the null
hypothesis.
method
a character string specifying the
goodness-of-fit test statistic to be used. For simulation = "pb",
one of "Sn", "SnB", "SnC", "AnChisq", or "AnGamma", see
gofTstat(). For simulation = "mult", one of
"Sn" or "Rn", where the latter is R_n from
Genest et al. (2013).
estim.method
a character string specifying the estimation method to
be used to estimate the dependence parameter(s); see fitCopula().
simulation
a string specifying the simulation method for
generating realizations of the test statistic under the null
hypothesis; can be either "pb" (parametric bootstrap) or
"mult" (multiplier).
print.every
is deprecated in favor of verbose.
verbose
a logical specifying if progress of the bootstrap
should be displayed via txtProgressBar.
...
for gofCopula, additional arguments passed to
gofPB() or gofMB();
for gofPB() and gofMB(): additional arguments passed
to fitCopula(). These may notably contain
optim.method, optim.control, lower,
or upper depending on the optim.method.
trafo.method
string specifying the transformation to
U[0,1]^d; either "none" or one of "rtrafo",
see rtrafo, or "htrafo", see htrafo.
trafoArgs
a list of optional arguments passed
to the transformation method (see trafo.method above).
useR
logical indicating whether an R or the C implementation is used.
m, zeta.m, b
only for method "Rn" in “MB”, the
multiplier bootstrap. m is the power, zeta.m the adjustment
parameter zeta.m for the denominator of the test
statistic, and b is the bandwidth required for the estimation
of the first-order partial derivatives based on the empirical copula.
Details
If the parametric bootstrap is used, the dependence parameters of
the hypothesized copula family can be estimated either by maximizing
the pseudo-likelihood, by inverting Kendall's tau, or by inverting
Spearman's rho. If the multiplier is used, any estimation method
can be used in the bivariate case, but only maximum pseudo-likelihood
estimation can be used in the multivariate (multiparameter) case.
For the normal and t copulas, several dependence structures can be
hypothesized: "ex" for exchangeable, "ar1" for AR(1),
"toep" for Toeplitz, and "un" for unstructured (see
ellipCopula()). For the t copula, "df.fixed" has to
be set to TRUE, which implies that the degrees of freedom are
not considered as a parameter to be estimated.
Thus far, the multiplier approach is implemented for six copula
families: the Clayton, Gumbel, Frank, Plackett, normal and t.
Although the processes involved in the multiplier and the parametric
bootstrap-based test are asymptotically equivalent under the null,
note that the finite-sample behavior of the two tests might differ
significantly.
Also note that in the case of the parametric and multiplier bootstraps,
the approximate p-value is computed as
(0.5 + sum(T[b] >= T, b=1, .., N)) / (N+1),
where T and T[b] denote the test statistic and
the bootstrapped test statistc, respectively. This ensures that the
approximate p-value is a number strictly between 0 and 1, which is
sometimes necessary for further treatments. See Pesarin (2001) for
more details.
Value
An object of classhtest which is a list,
some of the components of which are
statistic
value of the test statistic.
p.value
corresponding approximate p-value.
parameter
estimates of the parameters for the hypothesized
copula family.
Note
These tests were derived under the assumption of continuous margins,
which implies that ties occur with probability zero. The
presence of ties in the data might substantially affect the
approximate p-values. One way of dealing with ties was suggested in the
Journal of Statistical Software reference.
Since R is widely used by practitioners, a word of warning concerning
goodness-of-fit tests in general is also advisable.
Goodness-of-fit tests are often (ab)used in practice to
“justify” an assumption under which one then continues to work
(carelessly). From a mathematical point of view, this is not correct.
References
Genest, C., Huang, W., and Dufour, J.-M. (2013).
A regularized goodness-of-fit test for copulas.
Journal de la Soci<c3><83><c2><a9>t<c3><83><c2><a9> fran<c3><83><c2><a7>aise de statistique154, 64–77.
Genest, C. and R<c3><83><c2><a9>millard, B. (2008). Validity of the parametric
bootstrap for goodness-of-fit testing in semiparametric models.
Annales de l'Institut Henri Poincare: Probabilites et Statistiques44, 1096–1127.
Genest, C., R<c3><83><c2><a9>millard, B., and Beaudoin, D. (2009).
Goodness-of-fit tests for copulas: A review and a power study.
Insurance: Mathematics and Economics44, 199–214.
Kojadinovic, I., Yan, J., and Holmes M. (2011).
Fast large-sample goodness-of-fit tests for copulas.
Statistica Sinica21, 841–871.
Kojadinovic, I. and Yan, J. (2011). A goodness-of-fit test for
multivariate multiparameter copulas based on multiplier central limit
theorems. Statistics and Computing21, 17–30.
Kojadinovic, I. and Yan, J. (2010).
Modeling Multivariate Distributions with Continuous Margins Using the
copula R Package.
Journal of Statistical Software34(9), 1–20.
http://www.jstatsoft.org/v34/i09/.
Pesarin, F. (2001).
Multivariate Permutation Tests: With Applications in Biostatistics.
Wiley.
See Also
fitCopula() for the underlying estimation procedure and
gofTstat() for the available test statistics.
Examples
## the following example is available in batch through
## demo(gofCopula)% == ../demo/gofCopula.R __keep >> EXACTLY << in sync!__
## Not run:
## A two-dimensional data example ----------------------------------
x <- rCopula(200, claytonCopula(3))
(tau. <- cor(x, method="kendall")[1,2]) # around 0.5 -- 0.6
## Does the Gumbel family seem to be a good choice?
(thG <- iTau(gumbelCopula(), tau.)) # 3.02
gofCopula(gumbelCopula(thG), x)
# SnC: really s..l..o..w.. --- SnB is *EVEN* slower
gofCopula(gumbelCopula(thG), x, method = "SnC")
## What about the Clayton family?
(thC <- iTau(claytonCopula(), tau.)) # 4.05
gofCopula(claytonCopula(thC), x)
gofCopula(claytonCopula(thC), x, method = "AnChisq")
## The same with a different estimation method
gofCopula(gumbelCopula (thG), x, estim.method="itau")
gofCopula(claytonCopula(thC), x, estim.method="itau")
## A three-dimensional example ------------------------------------
x <- rCopula(200, tCopula(c(0.5, 0.6, 0.7), dim = 3, dispstr = "un"))
## Does the Clayton family seem to be a good choice?
## here starting with the "same" as indepCopula(3) :
(gCi3 <- gumbelCopula(1, dim = 3, use.indepC="FALSE"))
gofCopula(gCi3, x)
## What about the t copula?
t.copula <- tCopula(rep(0, 3), dim = 3, dispstr = "un", df.fixed=TRUE)
## this is *VERY* slow currently
gofCopula(t.copula, x)
## The same with a different estimation method
gofCopula(gCi3, x, estim.method="itau")
gofCopula(t.copula, x, estim.method="itau")
## The same using the multiplier approach
gofCopula(gCi3, x, simulation="mult")
gofCopula(t.copula, x, simulation="mult")
## End(Not run)