Function which performs the fitting of an adaptive mixture of
Student-t distributions to approximate a target density through its
kernel function
Usage
AdMit(KERNEL, mu0, Sigma0 = NULL, control = list(), ...)
Arguments
KERNEL
kernel function of the target density on which the adaptive mixture is fitted. This
function should be vectorized for speed purposes (i.e., its first
argument should be a matrix and its output a vector). Moreover, the function must contain
the logical argument log. If log = TRUE, the function
returns (natural) logarithm values of the kernel function. NA and
NaN values are not allowed. (See *Details* for examples
of KERNEL implementation.)
mu0
initial value in the first stage optimization (for the location of
the first Student-t component) in the adaptive mixture, or
location of the first Student-t component if Sigma0 is not NULL.
Sigma0
scale matrix of the first Student-t component (square, symmetric and positive definite). Default:
Sigma0 = NULL, i.e., the scale matrix of the first Student-t
component is estimated by the function AdMit.
control
control parameters (see *Details*).
...
further arguments to be passed to KERNEL.
Details
The argument KERNEL is the kernel function of the target
density, and it should be vectorized for speed purposes.
As a first example, consider the kernel function proposed by Gelman-Meng (1991):
where commonly used values
are A=1, B=0, C1=3 and C2=3.
A vectorized implementation of this function might be:
GelmanMeng <- function(x, A = 1, B = 0, C1 = 3, C2 = 3, log = TRUE)
{
if (is.vector(x))
x <- matrix(x, nrow = 1)
r <- -.5 * (A * x[,1]^2 * x[,2]^2 + x[,1]^2 + x[,2]^2
- 2 * B * x[,1] * x[,2] - 2 * C1 * x[,1] - 2 * C2 * x[,2])
if (!log)
r <- exp(r)
as.vector(r)
}
This way, we may supply a point (x1,x2)
for x and the function will output a single value (i.e., the kernel
estimated at this point). But the function is vectorized, in the sense
that we may supply a Nx2 matrix
of values for x, where rows of x are
points (x1,x2) and the output will be a vector of
length N, containing the kernel values for these points.
Since the AdMit procedure evaluates KERNEL for a
large number of points, a vectorized implementation is important. Note
also the additional argument log = TRUE which is used for
numerical stability.
As a second example, consider the following (simple) econometric model:
y_t ~ i.i.d. N(mu,sigma^2) t=1,...,T
where mu is the mean value and sigma is the
standard deviation. Our purpose is to estimate
theta=(mu,sigma) within a Bayesian
framework, based on a vector y of T observations; the
kernel thus consists of the product of the
prior and the likelihood function. As previously mentioned, the
kernel function should be vectorized, i.e., treat a (Nx2) matrix of points
theta for which the kernel will be evaluated.
Using the common (Jeffreys) prior p(theta)=1/sigma
for sigma>0, a vectorized implementation of the
kernel function might be:
KERNEL <- function(theta, y, log = TRUE)
{
if (is.vector(theta))
theta <- matrix(theta, nrow = 1)
## sub function which returns the log-kernel for a given
## thetai value (i.e., a given row of theta)
KERNEL_sub <- function(thetai)
{
if (thetai[2] > 0) ## check if sigma>0
{ ## if yes, compute the log-kernel at thetai
r <- - log(thetai[2])
+ sum(dnorm(y, thetai[1], thetai[2], TRUE))
}
else
{ ## if no, returns -Infinity
r <- -Inf
}
as.numeric(r)
}
## 'apply' on the rows of theta (faster than a for loop)
r <- apply(theta, 1, KERNEL_sub)
if (!log)
r <- exp(r)
as.numeric(r)
}
Since this kernel function also depends on the vector y, it
must be passed to KERNEL in the AdMit function. This is
achieved via the argument ..., i.e., AdMit(KERNEL, mu = c(0, 1), y = y).
To gain even more speed, implementation of KERNEL might rely on C or Fortran
code using the functions .C and .Fortran. An example is
provided in the file ‘AdMitJSS.R’ in the package's folder.
The argument control is a list that can supply any of
the following components:
Ns
number of draws used in the evaluation of the
importance sampling weights (integer number not smaller than 100). Default: Ns = 1e5.
Np
number of draws used in the optimization of the mixing
probabilities (integer number not smaller than 100 and not larger
than Ns). Default: Np = 1e3.
Hmax
maximum number of Student-t components in the
adaptive mixture (integer number not smaller than one). Default: Hmax = 10.
df
degrees of freedom parameter of the
Student-t components (real number not smaller than one). Default: df = 1.
CVtol
tolerance for the relative change of the coefficient of
variation (real number in [0,1]). The
adaptive algorithm stops if the new
component leads to a relative change in the coefficient of
variation that is smaller or equal than
CVtol. Default: CVtol = 0.1, i.e., 10%.
weightNC
weight assigned to the new
Student-t component of the adaptive mixture as
a starting value in the optimization of the mixing probabilities
(real number in [0,1]). Default: weightNC = 0.1, i.e., 10%.
trace
tracing information on
the adaptive fitting procedure (logical). Default:
trace = FALSE, i.e., no tracing information.
IS
should importance sampling be used to estimate the
mode and the scale matrix of the Student-t components (logical). Default: IS = FALSE,
i.e., use numerical optimization instead.
ISpercent
vector of percentage(s) of largest weights used to
estimate the mode and the scale matrix of the Student-t
components of the adaptive mixture by importance
sampling (real number(s) in [0,1]). Default:
ISpercent = c(0.05, 0.15, 0.3), i.e., 5%, 15% and 30%.
ISscale
vector of scaling factor(s) used to rescale the
scale matrix of the mixture components (real positive numbers).
Default: ISscale = c(1, 0.25, 4).
trace.mu
Tracing information on
the progress in the optimization of the mode of the mixture
components (non-negative integer number). Higher values
may produce more tracing information (see the source code
of the function optim for further details).
Default: trace.mu = 0, i.e., no tracing information.
maxit.mu
maximum number of iterations used
in the optimization of the modes of the mixture components
(positive integer). Default: maxit.mu = 500.
reltol.mu
relative convergence tolerance
used in the optimization of the modes of the mixture components
(real number in [0,1]). Default: reltol.mu = 1e-8.
trace.p, maxit.p, reltol.p
the same as for
the arguments above, but for the optimization of the mixing
probabilities of the mixture components.
Value
A list with the following components:
CV: vector (of length H) of coefficients of variation of
the importance sampling weights.
mit: list (of length 4) containing information on the fitted mixture of
Student-t distributions, with the following components:
p: vector (of length H) of mixing probabilities.
mu: matrix (of size Hxd) containing the
vectors of modes (in row) of the mixture components.
Sigma: matrix (of size Hxd*d) containing the scale
matrices (in row) of the mixture components.
df: degrees of freedom parameter of the Student-t components.
where H (>=1) is the number of components in the adaptive
mixture of Student-t distributions and d (>=1) is
the dimension of the first argument in KERNEL.
summary: data frame containing information on the optimization
procedures. It returns for each component of the adaptive mixture of
Student-t distribution: 1. the method used to estimate the mode
and the scale matrix of the Student-t component (‘USER’ if Sigma0 is
provided by the user; numerical optimization: ‘BFGS’, ‘Nelder-Mead’;
importance sampling: ‘IS’, with percentage(s) of importance weights
used and scaling factor(s)); 2. the time required for this optimization;
3. the method used to estimate the mixing probabilities
(‘NLMINB’, ‘BFGS’, ‘Nelder-Mead’, ‘NONE’); 4. the time required for this
optimization; 5. the coefficient of variation of the importance
sampling weights.
Note
Further details and examples of the R package AdMit
can be found in Ardia, Hoogerheide, van Dijk (2009a,b). See also
the package vignette by typing vignette("AdMit") and the
files ‘AdMitJSS.txt’ and ‘AdMitRnews.txt’ in the ‘/doc’ package's folder.
Further details on the core algorithm are given in
Hoogerheide (2006), Hoogerheide, Kaashoek, van Dijk (2007) and
Hoogerheide, van Dijk (2008).
The adaptive mixture mit returned by the function AdMit is used by the
function AdMitIS to perform importance sampling using
mit as the importance density or by the function AdMitMH to perform
independence chain Metropolis-Hastings sampling using mit as the
candidate density.
Please cite the package in publications. Use citation("AdMit").
Author(s)
David Ardia for the R port,
Lennart F. Hoogerheide and Herman K. van Dijk for the AdMit algorithm.
References
Ardia, D., Hoogerheide, L.F., van Dijk, H.K. (2009a).
AdMit: Adaptive Mixture of Student-t Distributions.
The R Journal1(1), pp.25–30.
http://journal.r-project.org/2009-1/
Ardia, D., Hoogerheide, L.F., van Dijk, H.K. (2009b).
Adaptive Mixture of Student-t Distributions as a Flexible Candidate
Distribution for Efficient Simulation: The R Package AdMit.
Journal of Statistical Software29(3), pp.1–32.
http://www.jstatsoft.org/v29/i03/
Gelman, A., Meng, X.-L. (1991).
A Note on Bivariate Distributions That Are Conditionally Normal.
The American Statistician45(2), pp.125–126.
Hoogerheide, L.F. (2006).
Essays on Neural Network Sampling Methods and Instrumental Variables.
PhD thesis, Tinbergen Institute, Erasmus University Rotterdam (NL).
ISBN: 9051708261.
(Book nr. 379 of the Tinbergen Institute Research Series.)
Hoogerheide, L.F., Kaashoek, J.F., van Dijk, H.K. (2007).
On the Shape of Posterior Densities and Credible Sets in Instrumental Variable Regression Models with Reduced
Rank: An Application of Flexible Sampling Methods using Neural Networks.
Journal of Econometrics139(1), pp.154–180.
doi: 10.1016/j.jeconom.2006.06.009.
Hoogerheide, L.F., van Dijk, H.K. (2008).
Possibly Ill-Behaved Posteriors in Econometric Models: On the Connection between Model
Structures, Non-elliptical Credible Sets and Neural Network
Simulation Techniques.
Tinbergen Institute discussion paper2008-036/4.
http://www.tinbergen.nl/discussionpapers/08036.pdf
See Also
AdMitIS for importance sampling using an
adaptive mixture of Student-t distributions as the importance density,
AdMitMH for the independence chain Metropolis-Hastings
algorithm using an adaptive mixture of Student-t distributions as
the candidate density.
Examples
## NB : Low number of draws for speedup. Consider using more draws!
## Gelman and Meng (1991) kernel function
GelmanMeng <- function(x, A = 1, B = 0, C1 = 3, C2 = 3, log = TRUE)
{
if (is.vector(x))
x <- matrix(x, nrow = 1)
r <- -.5 * (A * x[,1]^2 * x[,2]^2 + x[,1]^2 + x[,2]^2
- 2 * B * x[,1] * x[,2] - 2 * C1 * x[,1] - 2 * C2 * x[,2])
if (!log)
r <- exp(r)
as.vector(r)
}
## Run AdMit (with default values)
set.seed(1234)
outAdMit <- AdMit(GelmanMeng, mu0 = c(0.0, 0.1), control = list(Ns = 1e4))
print(outAdMit)
## Run AdMit (using importance sampling to estimate
## the modes and the scale matrices)
set.seed(1234)
outAdMit <- AdMit(KERNEL = GelmanMeng,
mu0 = c(0.0, 0.1),
control = list(IS = TRUE, Ns = 1e4))
print(outAdMit)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(AdMit)
Loading required package: mvtnorm
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/AdMit/AdMit.Rd_%03d_medium.png", width=480, height=480)
> ### Name: AdMit
> ### Title: Adaptive Mixture of Student-t Distributions
> ### Aliases: AdMit
> ### Keywords: htest
>
> ### ** Examples
>
> ## NB : Low number of draws for speedup. Consider using more draws!
> ## Gelman and Meng (1991) kernel function
> GelmanMeng <- function(x, A = 1, B = 0, C1 = 3, C2 = 3, log = TRUE)
+ {
+ if (is.vector(x))
+ x <- matrix(x, nrow = 1)
+ r <- -.5 * (A * x[,1]^2 * x[,2]^2 + x[,1]^2 + x[,2]^2
+ - 2 * B * x[,1] * x[,2] - 2 * C1 * x[,1] - 2 * C2 * x[,2])
+ if (!log)
+ r <- exp(r)
+ as.vector(r)
+ }
>
> ## Run AdMit (with default values)
> set.seed(1234)
> outAdMit <- AdMit(GelmanMeng, mu0 = c(0.0, 0.1), control = list(Ns = 1e4))
> print(outAdMit)
$CV
[1] 4.4425871 1.3448258 0.8856950 0.8380885
$mit
$mit$p
cmp1 cmp2 cmp3 cmp4
0.4525493 0.1338627 0.2672455 0.1463425
$mit$mu
k1 k2
cmp1 0.3819661 2.61803340
cmp2 3.8276502 0.20336840
cmp3 1.8030488 1.05601824
cmp4 2.5878805 0.05964092
$mit$Sigma
k1k1 k1k2 k2k1 k2k2
cmp1 0.2291798 -0.40000024 -0.40000024 1.57082073
cmp2 0.8477147 -0.08618971 -0.08618971 0.07277385
cmp3 0.2885380 -0.10007347 -0.10007347 0.21978570
cmp4 0.7388323 -0.17056222 -0.17056222 0.21741696
$mit$df
[1] 1
$summary
H METHOD.mu TIME.mu METHOD.p TIME.p CV
1 1 BFGS 0.003 NONE 0.000 4.4425871
2 2 BFGS 0.008 NLMINB 0.006 1.3448258
3 3 BFGS 0.024 NLMINB 0.016 0.8856950
4 4 BFGS 0.027 NLMINB 0.033 0.8380885
>
> ## Run AdMit (using importance sampling to estimate
> ## the modes and the scale matrices)
> set.seed(1234)
> outAdMit <- AdMit(KERNEL = GelmanMeng,
+ mu0 = c(0.0, 0.1),
+ control = list(IS = TRUE, Ns = 1e4))
> print(outAdMit)
$CV
[1] 4.4425871 1.0830595 0.9504815 0.9039732
$mit
$mit$p
cmp1 cmp2 cmp3 cmp4
0.4772211 0.2456065 0.1478713 0.1293012
$mit$mu
k1 k2
cmp1 0.3819661 2.6180334
cmp2 3.3555320 0.3555571
cmp3 2.1254006 0.1176695
cmp4 2.7367404 0.1234779
$mit$Sigma
k1k1 k1k2 k2k1 k2k2
cmp1 0.2291798 -0.4000002 -0.4000002 1.5708207
cmp2 1.7567372 -0.5057741 -0.5057741 0.3152036
cmp3 0.3314020 -0.4409544 -0.4409544 0.7345853
cmp4 0.5203117 -0.4649378 -0.4649378 0.4992338
$mit$df
[1] 1
$summary
H METHOD.mu TIME.mu METHOD.p TIME.p CV
1 1 BFGS 0.001 NONE 0.000 4.4425871
2 2 IS 0.05-1 0.003 NLMINB 0.004 1.0830595
3 2 IS 0.05-0.25 0.003 NLMINB 0.004 1.1666006
4 2 IS 0.05-4 0.003 NLMINB 0.005 1.3681444
5 2 IS 0.15-1 0.003 NLMINB 0.005 1.1898718
6 2 IS 0.15-0.25 0.003 NLMINB 0.004 1.1663085
7 2 IS 0.15-4 0.003 NLMINB 0.005 1.6186371
8 2 IS 0.3-1 0.003 NLMINB 0.005 1.3583240
9 2 IS 0.3-0.25 0.003 BFGS 0.023 1.2629412
10 2 IS 0.3-4 0.003 NLMINB 0.006 1.9138975
11 3 IS 0.05-1 0.002 NLMINB 0.011 0.9504815
12 3 IS 0.05-0.25 0.002 NLMINB 0.010 0.9885004
13 3 IS 0.05-4 0.002 NLMINB 0.011 1.0051189
14 3 IS 0.15-1 0.002 NLMINB 0.013 0.9844874
15 3 IS 0.15-0.25 0.002 NLMINB 0.014 0.9526902
16 3 IS 0.15-4 0.002 NLMINB 0.043 1.0851996
17 3 IS 0.3-1 0.002 NLMINB 0.013 1.0187545
18 3 IS 0.3-0.25 0.002 NLMINB 0.012 0.9540494
19 3 IS 0.3-4 0.002 NLMINB 0.040 1.0864449
20 4 IS 0.05-1 0.002 NLMINB 0.022 0.9080267
21 4 IS 0.05-0.25 0.002 NLMINB 0.055 0.9063889
22 4 IS 0.05-4 0.002 NLMINB 0.062 0.9522559
23 4 IS 0.15-1 0.002 NLMINB 0.023 0.9456680
24 4 IS 0.15-0.25 0.002 NLMINB 0.025 0.9039732
25 4 IS 0.15-4 0.002 NLMINB 0.050 0.9538416
26 4 IS 0.3-1 0.002 NLMINB 0.049 0.9547578
27 4 IS 0.3-0.25 0.002 NLMINB 0.038 0.9058338
28 4 IS 0.3-4 0.002 NLMINB 0.049 0.9528613
>
>
>
>
>
> dev.off()
null device
1
>