this function is now superseded by the alternative
link{bce1}.
estimates probability distributions of a sample composition
based on an input ratio matrix, Rat, containing
biomarker ratios in (field) samples, and an input data matrix,
Dat, containing the biomarker ratios for several taxonomic groups
initial ratio matrix. Each row of Rat contains the
biomarker composition of one taxon. As a result of the Bayesian
procedure, this initial ratio matrix will be altered.
Dat
initial data matrix. Each row of Dat contains the
biomarker composition of one (field) sample.
relsdRat
relative standard deviation on ratio matrix. Either one
number or a matrix with the same dimensions as Rat.
abssdRat
absolute standard deviation on ratio matrix. Either one
number or a matrix with the same dimensions as Rat.
minRat
minimum values of ratio matrix. Either one number or a
matrix with the same dimensions as Rat.
maxRat
maximum values of ratio matrix. Either one number or a
matrix with the same dimensions as Rat.
relsdDat
relative standard deviation on data matrix. Either one
number or a matrix with the same dimensions as Dat.
abssdDat
absolute standard deviation on data matrix. Either one
number or a matrix with the same dimensions as Dat.
tol
minimum standard deviation for data matrix Dat. One
value.
tolX
minimum x values. Used for MCMC initiation. One value.
positive
A vector containing numbers of columns that should
contain strictly positive data. Only these columns are rescaled. The
other columns (not in positive) are not rescaled, and can become
negative.
iter
number of iterations for MCMC.
outputlength
number of iterations kept in the output.
burninlength
number of initial iterations to be removed from
output.
jmpRat
jump length of the ratio matrix Rat (in normal
space). Either a number, a vector with length equal to the number of
biomarkers (number of columns in Rat), or a or matrix with the
same dimensions as the ratio matrix rat.
jmpX
jump length of the composition matrix (in a
simplex). Either one number, a vector of length equal to the number of
taxa (number of rows in Rat)
or a matrix with the same dimensions = c(number of taxa, number of
field samples).
unif
logical; if TRUE a uniform distribution for ratio matrix
is used. This is similar as in chemtax.
verbose
logical; if TRUE, extra information is provided during
the run of the function, such as extra warnings, elapsed time and
expected time until the end of the MCMC.
initRat
ratio matrix used to start the markov chain: defaults
to the initial ratio matrix.
initX
composition matrix used to start the markov chain:
default the LSEI solution of Ax=B.
userProb
function taking two arguments: ratio matrix RAT and
composition matrix X, and returning the posterior probability.
Dependence of the probability on the data should be incorporated in
the function. If not specified, the default probability distribution
is the product of a non-informative distribution on the composition
matrix, and gamma distributions for the ratio matrix and the data
given the model output.
confInt
confidence interval in output; because the
distributions may not be symmetrical, standard deviations are not
always a useful measure; instead, upper and lower boundaries of the
given confidence interval are given. Default is 2/3, i.e there is a
probability of 0.66 for a value to be contained within the interval.
export
logical; if TRUE, the function
export.bce is called and a list of variables and plots
are exported to the specified file.
file
Only if export is TRUE. If not NULL, a
character string specifying the file to which objects are saved.
Details
The function BCE searches probability distributions for all
elements of a taxonomical composition matrix X and a ratio
matrix Rat for which:
X%*%Rat ~= Dat
It does this by returning iter samples for X and Rat, organized
in three-dimensional arrays. The input
data matrix Dat and ratio matrix Rat should be
in the following formats, with the relative concentrations per
biomarker organized in columns:
data matrix:
marker1
marker2
marker3
marker4
sample1
0.14
0.005
0.35
0.033
sample2
0.15
0.004
0.36
0.034
sample3
0.13
0.004
0.31
0.030
sample4
0.13
0.005
0.33
0.031
sample5
0.14
0.008
0.33
0.036
sample6
0.11
0.082
0.34
0.044
and ratio matrix:
marker1
marker2
marker3
marker4
species1
0.27
0.13
0.35
0.076
species2
0.084
0
0.5
0.24
species3
0.195
0.3
0
0.1
species4
0.06
0
0
0
species5
0
0
0
0
species6
0
0
0
0
Value
A bce (bayesian compositional estimator) object; a list containing 4 elements
Rat
Array with dimension c(nrow(Rat),ncol(Rat),
iter) containing the random walk values of the ratio
matrix Rat.
X
Array with dimension c(nrow(X),ncol(X),iter)
containing the random walk values of the composition matrix X.
logp
vector with length iter containing the random
walk values of the (log) posterior probability.
naccepted
integer indicating the number of runs that were accepted.
Note
Producing sensible output:
Markov Chain Monte Carlo simulations are not as straightforward as one
might wish; several preliminary runs might be necessary to determine
the desired number of iterations, burn-in length and jump
length. For all estimated values of Rat and X, their trace
(evolution of the values over all iterations) has to display random
behaviour; no obvious trends should appear. A few parameters can be
tuned to obtain such behaviour:
jump length The jump length determines how big the jumps are
for each step in the random walk. A longer jump length will make you
jump around faster in the parameter space, but acceptance of new
points can get very low. Smaller jump lengths increase the
acceptance rate, but the algorithm will move too slowly, and a lot
more runs will be needed to scan the whole parameter space. A good
way to find a good jump length, is look at the number of points
accepted. If the output is saved under the name MCMC, you
can find the number of accepted points under
MCMC$naccepted. It is also given if you run the model with
verbose=TRUE (default). This value should be somewhere
between 5% and 40%. For long runs, 5 % can be acceptable, for
short runs, you will prefer a higher acceptance in order to have
enough different points. 20% accepted is usually a good number. Do
some preliminary runs with iter=1000-10000 and tune the
jump length parameters jmpRat and jmpX. You can set
different jump lengths for each column of
the ratio matrix, or 1 jump length for the whole ratio matrix, and 1
jump length for the composition matrix. Decreasing the jump lengths
will generally increase the acceptance rate and vice versa. Also the
mixing rate (the speed with which accepted points change their
values) will be influenced. You want this mixing rate to be as high
as possible, whilst maintaining enough accepted points.
burninlength The program uses the solution of lsei using the
original ratio matrix as starting values for the MCMC. This might in
some cases be far from the optimal solution, and the MCMC algorithm
will start with moving towards this optimal solution. This is called
a burn-in. When there is a slow mixing rate, this can take a
considerable number of cycles. As it can influence the averages and
standard deviations, you might want to remove it from the mcmc
objects. By defining a burnin length, the first
'burninlength' cycles will not be written to the output. Look
at some plots to determine if you need to specify a burnin length.
iter the number of iterations: start with 10000 runs
or less; check the output and estimate how many runs you will need
to get a random pattern in the output.
Author(s)
Karel Van den Meersche <k.vdmeersche@nioo.knaw.nl>,
Karline Soetaert <k.soetaert@nioo.knaw.nl>.
References
Van den Meersche, K., K. Soetaert and J.J. Middelburg (2008) A
Bayesian compositional estimator for microbial taxonomy
based on biomarkers, Limnology and Oceanography Methods 6, 190-199
See Also
summary.bce, plot.bce,
export.bce, pairs.bce
Examples
##====================================
# example using bceInput data
# first try
X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2,
iter=1000,outputlength=5000,jmpX=.01,jmpRat=.01)
## the number of accepted runs is too low;
## we play around with the jump lengths jmpX and jmpRat
X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2,
iter=1000,outputlength=5000,jmpX=.02,jmpRat=.002)
## we inspect the output:
plot(X)
## For every element of X and Rat, we want to obtain a well-mixed,
## random trace. In this case, mixing is still a little poor.
## to optimize mixing in the ratio matrix, it is a good idea
## to make the jump length linear to the ratio matrix
## standard deviation (sdrat=.2*rat) :
X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2,
iter=1000,outputlength=5000,jmpX=.02,
jmpRat=.2*(.2*bceInput$Rat))
plot(X)
## mixing improved a lot; we repeat the run with more iterations
## to improve the reliability of the results.
## the following run can take a few minutes - so it is toggled off
#X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2,
# iter=100000,outputlength=5000,jmpX=.02,
# jmpRat=.2*(.2*bceInput$Rat))
#plot(X)
## you can see in the plots that traces for all elements of Rat and X
## are well-mixed. This run was saved in "bceOutput"
Sum <-summary(bceOutput)
# show results as mean with ranges
print(Sum$meanX)
# plot estimated means and ranges (lbX=lower, ubX=upper bound)
xlim <- range(c(Sum$lbX,Sum$ubX))
# first the mean
dotchart(x=t(Sum$meanX),xlim=xlim,
main="Taxonomic composition",
sub="using bce",pch=16)
# then ranges
nr <- nrow(Sum$meanX)
nc <- ncol(Sum$meanX)
for (i in 1:nr)
{ip <-(nr-i)*(nc+2)+1
cc <- ip : (ip+nc-1)
segments(t(Sum$lbX[i,]),cc,t(Sum$ubX[i,]),cc)
}
# show results as pairs plot
pairs(bceOutput,sample=3,main="Station 3")
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(BCE)
Loading required package: FME
Loading required package: deSolve
Attaching package: 'deSolve'
The following object is masked from 'package:graphics':
matplot
Loading required package: rootSolve
Loading required package: coda
Loading required package: limSolve
Loading required package: Matrix
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/BCE/BCE-1_5.Rd_%03d_medium.png", width=480, height=480)
> ### Name: BCE
> ### Title: Bayesian Composition Estimator
> ### Aliases: BCE
> ### Keywords: models
>
> ### ** Examples
>
> ##====================================
>
> # example using bceInput data
> # first try
>
> X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2,
+ iter=1000,outputlength=5000,jmpX=.01,jmpRat=.01)
runs: 100 ; elapsed time: 0.025 s ; estimated time left: 0.225 s ; speed: 4000 runs/s NULL
runs: 1000 ; elapsed time: 0.187 s ; estimated time left: 0 s ; speed: 5347.594 runs/s NULL
number of accepted runs: 24 out of 1000 (2.4%) NULL
>
> ## the number of accepted runs is too low;
> ## we play around with the jump lengths jmpX and jmpRat
>
> X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2,
+ iter=1000,outputlength=5000,jmpX=.02,jmpRat=.002)
runs: 100 ; elapsed time: 0.017 s ; estimated time left: 0.153 s ; speed: 5882.353 runs/s NULL
runs: 1000 ; elapsed time: 0.212 s ; estimated time left: 0 s ; speed: 4716.981 runs/s NULL
number of accepted runs: 235 out of 1000 (23.5%) NULL
>
> ## we inspect the output:
> plot(X)
>
> ## For every element of X and Rat, we want to obtain a well-mixed,
> ## random trace. In this case, mixing is still a little poor.
> ## to optimize mixing in the ratio matrix, it is a good idea
> ## to make the jump length linear to the ratio matrix
> ## standard deviation (sdrat=.2*rat) :
> X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2,
+ iter=1000,outputlength=5000,jmpX=.02,
+ jmpRat=.2*(.2*bceInput$Rat))
runs: 100 ; elapsed time: 0.024 s ; estimated time left: 0.216 s ; speed: 4166.667 runs/s NULL
runs: 1000 ; elapsed time: 0.265 s ; estimated time left: 0 s ; speed: 3773.585 runs/s NULL
number of accepted runs: 363 out of 1000 (36.3%) NULL
> plot(X)
>
> ## mixing improved a lot; we repeat the run with more iterations
> ## to improve the reliability of the results.
> ## the following run can take a few minutes - so it is toggled off
> #X <- BCE(bceInput$Rat,bceInput$Dat,relsdRat=.2,relsdDat=.2,
> # iter=100000,outputlength=5000,jmpX=.02,
> # jmpRat=.2*(.2*bceInput$Rat))
> #plot(X)
> ## you can see in the plots that traces for all elements of Rat and X
> ## are well-mixed. This run was saved in "bceOutput"
>
> Sum <-summary(bceOutput)
>
> # show results as mean with ranges
> print(Sum$meanX)
species1 species2 species3 species4 species5
sample1 0.02503498 0.04816580 0.3369675 0.2706247 0.3192070
sample2 0.30287018 0.05707375 0.1793557 0.2295378 0.2311626
sample3 0.42950103 0.06408520 0.1323277 0.1922717 0.1818144
sample4 0.22868879 0.32560762 0.1022658 0.1591701 0.1842676
sample5 0.16835843 0.23543653 0.1358605 0.2239845 0.2363600
>
> # plot estimated means and ranges (lbX=lower, ubX=upper bound)
> xlim <- range(c(Sum$lbX,Sum$ubX))
>
> # first the mean
> dotchart(x=t(Sum$meanX),xlim=xlim,
+ main="Taxonomic composition",
+ sub="using bce",pch=16)
>
> # then ranges
> nr <- nrow(Sum$meanX)
> nc <- ncol(Sum$meanX)
>
> for (i in 1:nr)
+ {ip <-(nr-i)*(nc+2)+1
+ cc <- ip : (ip+nc-1)
+ segments(t(Sum$lbX[i,]),cc,t(Sum$ubX[i,]),cc)
+ }
>
> # show results as pairs plot
> pairs(bceOutput,sample=3,main="Station 3")
>
>
>
>
>
>
> dev.off()
null device
1
>