R: Sensitivity Indices based on Hilbert-Schmidt Independence...
sensiHSIC
R Documentation
Sensitivity Indices based on Hilbert-Schmidt Independence Criterion (HSIC)
Description
sensiHSIC conducts a sensitivity analysis where the impact of
an input variable is defined in terms of the distance between the input/output
joint probability distribution and the product of their marginals when they are
embedded in a Reproducing Kernel Hilbert Space (RKHS). This distance corresponds
to the Hilbert-Schmidt Independence Criterion (HSIC) proposed by Gretton et al.
(2005) and serves as a dependence measure between random variables, see Da Veiga
(2014) for an illustration in the context of sensitivity analysis.
Usage
sensiHSIC(model = NULL, X, kernelX = "rbf", paramX = NA,
kernelY = "rbf", paramY = NA, nboot = 0, conf = 0.95, ...)
## S3 method for class 'sensiHSIC'
tell(x, y = NULL, ...)
## S3 method for class 'sensiHSIC'
print(x, ...)
## S3 method for class 'sensiHSIC'
plot(x, ylim = c(0, 1), ...)
Arguments
model
a function, or a model with a predict method,
defining the model to analyze.
X
a matrix or data.frame representing the input random sample.
kernelX
a string or a list of strings specifying the reproducing kernel
to be used for the input variables. If only one kernel is provided, it is used
for all input variables. Available choices are "rbf" (Gaussian), "laplace"
(exponential), "dcov" (distance covariance, see details), "raquad" (rationale
quadratic), "invmultiquad" (inverse multiquadratic), "linear" (Euclidean scalar
product), "matern3" (Matern 3/2), "matern5" (Matern 5/2), "ssanova1" (kernel of
Sobolev space of order 1) and "ssanova2" (kernel of Sobolev space of order 2).
paramX
a scalar or a vector of hyperparameters to be used in the input
variable kernels. If only one scalar is provided, it is replicated for all input
variables. By default paramX is equal to the standard deviation of the
input variable for "rbf", "laplace", "raquad", "invmultiquad", "matern3" and
"matern5" and to 1 for "dcov". Kernels "linear", "ssanova1" and "ssanova2"
do not involve hyperparameters. If kernelX is a combination of kernels
with and without hyperparameters, paramX must have a (dummy) value for the
hyperparameter-free kernels, see examples below.
kernelY
a string specifying the reproducing kernel to be used for the
output variable. Available choices are "rbf" (Gaussian), "laplace" (exponential),
"dcov" (distance covariance, see details), "raquad" (rationale quadratic),
"invmultiquad" (inverse multiquadratic), "linear" (Euclidean scalar product),
"matern3" (Matern 3/2), "matern5" (Matern 5/2), "ssanova1" (kernel of Sobolev
space of order 1) and "ssanova2" (kernel of Sobolev space of order 2).
paramY
a scalar to be used in the output variable kernel. By default
paramY is equal to the standard deviation of the output variable for "rbf",
"laplace", "raquad", "invmultiquad", "matern3" and "matern5" and to 1 for "dcov".
Kernels "linear", "ssanova1" and "ssanova2" do not involve hyperparameters.
nboot
the number of bootstrap replicates
conf
the confidence level for confidence intervals.
x
a list of class "sensiHSIC" storing the state of the
sensitivity study (parameters, data, estimates).
y
a vector of model responses.
ylim
y-coordinate plotting limits.
...
any other arguments for model which are passed
unchanged each time it is called.
Details
The HSIC sensitivity indices are obtained as a normalized version of the Hilbert-Schmidt independence criterion:
Si = HSIC(Xi,Y) / (√ HSIC(Xi,Xi) √ HSIC(Y,Y)),
see Da Veiga (2014) for details.
When kernelX="dcov" and kernelY="dcov", the kernel is given by k(u,u')=1/2(||u||+||u'||-||u-u'||) and the sensitivity index is equal to the distance correlation introduced by Szekely et al. (2007) as was recently proven by Sejdinovic et al. (2013).
Value
sensiHSIC returns a list of class "sensiHSIC", containing all
the input arguments detailed before, plus the following components:
call
the matched call.
X
a data.frame containing the design of experiments.
y
a vector of model responses.
S
the estimations of HSIC sensitivity indices.
Author(s)
Sebastien Da Veiga, Snecma
References
Da Veiga S. (2014), Global sensitivity analysis with dependence measures,
Journal of Statistical Computation and Simulation, in press.
http://hal.archives-ouvertes.fr/hal-00903283
Gretton A., Bousquet O., Smola A., Scholkopf B. (2005),
Measuring statistical dependence with hilbert-schmidt norms,
Jain S, Simon H, Tomita E, editors: Algorithmic learning theory,
Lecture Notes in Computer Science, Vol. 3734, Berlin: Springer, 63–77.
Sejdinovic D., Sriperumbudur B., Gretton A., Fukumizu K., (2013),
Equivalence of distance-based and RKHS-based statistics in hypothesis
testing, Annals of Statistics 41(5), 2263–2291.
Szekely G.J., Rizzo M.L., Bakirov N.K. (2007),
Measuring and testing dependence by correlation of distances,
Annals of Statistics 35(6), 2769–2794.
See Also
kde, sensiFdiv
Examples
## Not run:
# Test case : the non-monotonic Sobol g-function
# Only one kernel is provided with default hyperparameter value
n <- 100
X <- data.frame(matrix(runif(8 * n), nrow = n))
x <- sensiHSIC(model = sobol.fun, X, kernelX = "raquad", kernelY = "rbf")
print(x)
# Test case : the Ishigami function
# A list of kernels is given with default hyperparameter value
n <- 100
X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n))
x <- sensiHSIC(model = ishigami.fun, X, kernelX = c("rbf","matern3","dcov"),
kernelY = "rbf")
print(x)
# A combination of kernels is given and a dummy value is passed for
# the first hyperparameter
x <- sensiHSIC(model = ishigami.fun, X, kernelX = c("ssanova1","matern3","dcov"),
paramX = c(1,2,1), kernelY = "ssanova1")
print(x)
## End(Not run)