List of fragment distributions as generated by the getDistrs function
targetGenomeDB
annotatedGenome object with isoforms we wish to
quantify. By default these are the same as in knownGenomeDB,
but more typically targetGenomeDB is imported from a .gtf file
produced by some isoform prediction software.
knownGenomeDB
annotatedGenome object with known isoforms,
e.g. from UCSC or GENCODE annotations. Used to set the prior
probability that any given isoform is expressed. knownGenome
should be the same genome annotations used to create argument
mprior (when provided)
pc
Named vector of exon path counts as returned by pathCounts
readLength
Read length in bp, e.g. in a paired-end experiment where
75bp are sequenced on each end one would set readLength=75.
islandid
Name of the gene island to be analyzed. If not specified, all
gene islands are analyzed.
priorq
Parameter of the prior distribution on the proportion of
reads coming from each variant. The prior is Dirichlet with prior
sample size for each variant equal to priorq.
We recommend priorq=3 as this defines a non-local
prior that penalizes falsely predicted isoforms that show low expression.
mprior
Prior on the model space returned by
modelPrior, used to favor isoforms
consistent with knownGenomeDB. If left
missing it is estimated from knownGenomeDB.
See details.
minpp
Models (i.e. splicing configurations) with posterior probability less than minpp
are not reported. This argument can help reduce substantially the amount of
required memory to store the results.
selectBest
If set to TRUE only the model with highest
posterior probability is reported. While this can save memory, we do
not recommend this option as it may ignore a substantial amount of
uncertainty.
searchMethod
Method used to perform the model search.
"allmodels" enumerates all possible models (warning:
this is not feasible for genes with >5 exons). "rwmcmc" uses a
random-walk MCMC scheme to focus on models with high posterior
probability.
"submodels" considers that some isoforms in targetGenomeDB
may not be expressed, but does not search for new variants.
"auto" uses "allmodels" for genes with up to 5
exons and "rwmcmc" for longer genes. See details.
niter
Number of MCMC iterations.
exactMarginal
Set to FALSE to estimate posterior model
probabilities as the proportion of MCMC visits. Set to TRUE to
use the integrated likelihoods (default). See details.
integrateMethod
Method to compute integrated likelihoods. The default
('plugin') evaluates likelihood*prior at the posterior mode and
is the faster option. Set 'Laplace' for Laplace approximations
and 'IS' for Importance Sampling. The latter increases
computation cost very substantially.
verbose
Set to TRUE to display progress information.
mc.cores
Number of processors to be used for parallel
computation. Can only be used if the package multicore is
available for your system.
Warning: using multiple processors substantially increases the memory
requirements, so set this value carefully.
Details
calcDenovo explores which subset of the isoforms indicated in
targetGenomeDB are truly expressed.
It also adds new isoforms when some reads follow an exon path that
is not possible under any of the isoforms in targetGenomeDB.
calcDenovo the posterior probability of each model
(i.e. configuration of expressed variants) via Bayes theorem.
P(model|y) "proportional to" m(y|model) P(model)
where m(y|model) is the integrated likelihood and P(model) is the
prior probability of the model.
For example, a gene with 20 predicted isoforms in targetGenome
gives rise 2^20 - 1 possible models (configurations of expressed isoforms).
Importantly, P(model) can be set by analyzing available genome
annotations in knownGenomeDB.
For instance, genes with 20 exons have isoforms that tend
to use most of the 20 exons. They also tend to express more
isoforms than genes with 5 exons. The function modelPrior
analyzes knownGenomeDB to set reasonable values for P(model).
An exhaustive enumeration of all possible models is
not feasible unless the gene is very short (e.g. around 5 exons).
For longer genes we use computational strategies to search a subset of
"interesting" models. This is controlled by the argument searchMethod
(see above).
In order to compute P(model|y) one can either use the computed
m(y|model) P(model) (option exactMarginal==TRUE) or the
proportion of MCMC visits (option exactMarginal==FALSE). Unless
niter is large the former option typically provides more
precise estimates.
Value
A denovoGenomeExpr object.
Author(s)
Camille Stephan-Otto Attolini, Manuel Kroiss, David Rossell
References
Rossell D, Stephan-Otto Attolini C, Kroiss M, Stocker A. Quantifying Alternative Splicing from Paired-End
RNA-sequencing data. Annals of Applied Statistics, 8(1):309-330.
See Also
denovoExpr to obtain expression estimates from the
calcDenovo output.
plotExpr to produce a plot with splicing variants and estimated expression.