R: Runs the Markov chain Monte Carlo with the overdispersion...
MCMC_BB
R Documentation
Runs the Markov chain Monte Carlo with the overdispersion (Beta-Binomial) model
Description
This function initiates the Markov chain Monte Carlo (MCMC) for the beta-binomial
BEDASSLE model. The beta-binomial model allows populations to diverge from the
model's
expectations based on their location and their neighbors.
A matrix of allelic count data, for which nrow = the number of populations
and ncol = the number of bi-allelic loci sampled. Each cell gives the number
of times allele ‘1’ is observed in each population. The choice of which allele is
allele ‘1’ is arbitrary, but must be consistent across all populations at a locus.
sample_sizes
A matrix of sample sizes, for which nrow = the number of populations
and ncol = the number of bi-allelic loci sampled (i.e. - the dimensions of
sample.sizes must match those of counts). Each cell gives the number
of chromosomes successfully genotyped at each locus in each population.
D
Pairwise geographic distance (D_{i,j}). This can be two-dimensional Euclidean
distance, or great-circle distance, or, in fact, any positive definite matrix (deriving,
for instance, from a resistance distance). However, note that the algorithm silently
restricts the prior on the alpha parameters, and specifically the alpha_2 parameter, to
the part of parameter space that results in valid covariance matrices; in the case of
two-dimensional Euclidean distances, this will not happen, since any value of alpha_2
between 0 and 2 is valid (see Guillot et al.'s "Valid covariance models for the analysis
of geographical genetic variation" for more detail on this).
E
Pairwise ecological distance(s) (E_{i,j}), which may be continuous (e.g. -
difference in elevation) or binary (same or opposite side of some hypothesized
barrier to gene flow). Users may specify one or more ecological distance matrices.
If more than one is specified, they should be formatted as a list.
k
The number of populations in the analysis. This should be equal to
nrow(counts).
loci
The number of loci in the analysis. This should be equal to
ncol(counts)
delta
The size of the "delta shift" on the off-diagonal elements of the parametric
covariance matrix, used to ensure its positive-definiteness (even, for example,
when there are separate populations sampled at the same geographic/ecological
coordinates). This value must be large enough that the covariance matrix is
positive-definite, but, if possible, should be smaller than the smallest off-
diagonal distance elements, lest it have an undue impact on inference. If the
user is concerned that the delta shift is too large relative to the pairwise
distance elements in D and E, she should run subsequent analyses,
varying the size of delta, to see if it has an impact on model inference.
aD_stp
The scale of the tuning parameter on aD (alphaD). The scale of the tuning
parameter is the standard deviation of the normal distribution from which small
perturbations are made to those parameters updated via a random-walk sampler.
A larger value of the scale of the tuning parameter will lead to, on average,
larger proposed moves and lower acceptance rates (for more on acceptance rates,
see plot_acceptance_rate).
aE_stp
The scale of the tuning parameter on aE (alphaE). If there are multiple
ecological distances included in the analysis, there will be multiple alphaE
parameters (one for each matrix in the list of E). These may be updated all
with the same scale of a tuning parameter, or they can each get their own, in
which case aE_stp should be a vector of length equal to the number of ecological
distance variables.
a2_stp
The scale of the tuning parameter on a2 (alpha_2).
phi_stp
The scale of the tuning parameter on the phi parameters.
thetas_stp
The scale of the tuning parameter on the theta parameters.
mu_stp
The scale of the tuning parameter on mu.
ngen
The number of generations over which to run the MCMC (one parameter is updated
at random per generation, with mu, theta, and phi all counting, for the purposes of
updates, as one parameter).
printfreq
The frequency with which MCMC progress is printed to the screen. If
printfreq =1000, an update with the MCMC generation number and the posterior
probability at that generation will print to the screen every 1000 generations.
savefreq
The frequency with which the MCMC saves its output as an R object (savefreq =
50,000
means that MCMC output is saved every 50,000 generations). If ngen is large,
this saving process may be computationally expensive, and so should not be performed
too frequently. However, users may wish to evalute MCMC performance while the chain
is still running, or may be forced to truncate runs early, and should therefore
specify a savefreq that is less than ngen. We recommend a
savefreq
of between 1/10th and 1/20th of ngen.
samplefreq
The thinning of the MCMC chain (samplefreq = 1000 means that the parameter
values saved in the MCMC output are sampled once every 1000 generations). A higher
samplefreq will decrease parameter autocorrelation time. However, there is
still information in autocorrelated draws from the joint posterior, so the
samplefreq
should be viewed merely as a computational convenience, to decrease the size of the
MCMC output objects.
directory
If specified, this points to a directory into which output will be saved.
prefix
If specified, this prefix will be added to all output file names.
continue
If TRUE, this will initiate the MCMC chain from the last parameter values of a
previous analysis. This option can be used to effectively increase the ngen
of an initial run. If FALSE, the MCMC will be initiated from random parameter
values.
continuing.params
The list of parameter values used to initiate the MCMC if continue = TRUE. If
the user wants to continue an analysis on a dataset, these should be the parameter
values from the last generation of the previous analysis. This list may be generated
using the function make.continuing.params.
Details
This function saves an MCMC output object at intervals specified by savefreq.
This object may be ported into R working memory using the base function
load.
As with any MCMC method, it is very important here to perform MCMC diagnosis and
evaluate chain mixing. I have provided a number of MCMC diagnosis graphing functions
for user convenience in visually assessing MCMC output. These include
plot_all_trace;plot_all_marginals;
plot_all_joint_marginals;
and plot_all_acceptance_rates. To evaluate model adequacy, users should
use posterior.predictive.sample and
plot_posterior_predictive_sample. These MCMC diagnosis/model adequacy
functions all call the standard MCMC output R object that the BEDASSLE MCMC generates
as their principal argument.
If users wish to start another MCMC run from where the current run left off, they
should use make.continuing.params, and initiate the new run with option
continue = TRUE and the continuing.params list from the previous
run specified.
Author(s)
Gideon Bradburd
References
Bradburd, G.S., Ralph, P.L., and Coop, G.M. Disentangling the effects of geographic and
ecological
isolation on genetic differentiation. Evolution 2013.
Examples
#With the HGDP dataset and mcmc operators
data(HGDP.bedassle.data)
data(mcmc.operators)
#The beta-binomial likelihood function may generate "NaNs produced" warnings,
#so temporarily disable warnings.
op <- options("warn")
options(warn = -1)
#Call the Markov chain Monte Carlo for the overdispersion model
MCMC_BB(
counts = HGDP.bedassle.data$allele.counts,
sample_sizes = HGDP.bedassle.data$sample.sizes,
D = HGDP.bedassle.data$GeoDistance,
E = HGDP.bedassle.data$EcoDistance,
k = HGDP.bedassle.data$number.of.populations,
loci = HGDP.bedassle.data$number.of.loci,
delta = mcmc.operators$delta,
aD_stp = mcmc.operators$aD_stp,
aE_stp = mcmc.operators$aE_stp,
a2_stp = mcmc.operators$a2_stp,
phi_stp = mcmc.operators$phi_stp,
thetas_stp = mcmc.operators$thetas_stp,
mu_stp = mcmc.operators$mu_stp,
ngen = mcmc.operators$ngen,
printfreq = mcmc.operators$printfreq,
savefreq = mcmc.operators$savefreq,
samplefreq = mcmc.operators$samplefreq,
directory = NULL,
prefix = mcmc.operators$prefix,
continue = FALSE,
continuing.params = NULL
)
#Re-enable warnings
options(op)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(BEDASSLE)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/BEDASSLE/MCMC_BB.Rd_%03d_medium.png", width=480, height=480)
> ### Name: MCMC_BB
> ### Title: Runs the Markov chain Monte Carlo with the overdispersion
> ### (Beta-Binomial) model
> ### Aliases: MCMC_BB
>
> ### ** Examples
>
> #With the HGDP dataset and mcmc operators
> data(HGDP.bedassle.data)
> data(mcmc.operators)
>
> #The beta-binomial likelihood function may generate "NaNs produced" warnings,
> #so temporarily disable warnings.
> op <- options("warn")
> options(warn = -1)
>
> #Call the Markov chain Monte Carlo for the overdispersion model
> MCMC_BB(
+ counts = HGDP.bedassle.data$allele.counts,
+ sample_sizes = HGDP.bedassle.data$sample.sizes,
+ D = HGDP.bedassle.data$GeoDistance,
+ E = HGDP.bedassle.data$EcoDistance,
+ k = HGDP.bedassle.data$number.of.populations,
+ loci = HGDP.bedassle.data$number.of.loci,
+ delta = mcmc.operators$delta,
+ aD_stp = mcmc.operators$aD_stp,
+ aE_stp = mcmc.operators$aE_stp,
+ a2_stp = mcmc.operators$a2_stp,
+ phi_stp = mcmc.operators$phi_stp,
+ thetas_stp = mcmc.operators$thetas_stp,
+ mu_stp = mcmc.operators$mu_stp,
+ ngen = mcmc.operators$ngen,
+ printfreq = mcmc.operators$printfreq,
+ savefreq = mcmc.operators$savefreq,
+ samplefreq = mcmc.operators$samplefreq,
+ directory = NULL,
+ prefix = mcmc.operators$prefix,
+ continue = FALSE,
+ continuing.params = NULL
+ )
[1] 2
[1] -1599033
[1] 4
[1] -725466.6
[1] 6
[1] -725641.2
[1] 8
[1] -725641.2
[1] 10
[1] -725638.3
[1] 12
[1] -725638.1
[1] 14
[1] -725620
[1] 16
[1] -725621.1
[1] 18
[1] -725767.7
[1] 20
[1] -725767.7
[1] 22
[1] -725914.1
[1] 24
[1] -726063
[1] 26
[1] -726062
[1] 28
[1] -726061.1
[1] 30
[1] -726068.6
[1] 32
[1] -726067.7
[1] 34
[1] -726063
[1] 36
[1] -726063
[1] 38
[1] -726064.7
[1] 40
[1] -726068.2
[1] 42
[1] -726181.4
[1] 44
[1] -726181.4
[1] 46
[1] -726182.1
[1] 48
[1] -726179.9
[1] 50
[1] -726156.8
[1] 52
[1] -726156.8
[1] 54
[1] -726216.4
[1] 56
[1] -726215.7
[1] 58
[1] -726349.6
[1] 60
[1] -726488.7
[1] 62
[1] -726487.9
[1] 64
[1] -726480.9
[1] 66
[1] -726481.7
[1] 68
[1] -726523.5
[1] 70
[1] -726522.8
[1] 72
[1] -726509
[1] 74
[1] -726508.9
[1] 76
[1] -726621.2
[1] 78
[1] -726635.8
[1] 80
[1] -726765.6
[1] 82
[1] -726750.6
[1] 84
[1] -726743.3
[1] 86
[1] -726744.1
[1] 88
[1] -726744.1
[1] 90
[1] -726847.6
[1] 92
[1] -726847.2
[1] 94
[1] -726847.1
[1] 96
[1] -726847.1
[1] 98
[1] -726888.7
[1] 100
[1] -726889.4
[1] "Output 100 runs to example_MCMC_output*.Robj ."
>
> #Re-enable warnings
> options(op)
>
>
>
>
>
> dev.off()
null device
1
>