R: Main interface for Bayesian Inference of Regulatory Influence...
birteRun
R Documentation
Main interface for Bayesian Inference of Regulatory Influence on Expression (biRte).
Description
The function estimates regulator activities via MCMC sampling. The function assumes experimental data of two conditions to be given. If desired, influences on mRNA log fold changes can be estimated rather than on expression levels itself.
Besides miRNA, mRNA and TF expression data, biRte also allows to integrate other data types (e.g. CNV data - data type 'other').
birteLimma is a convenience function, which allows to directly pass results from a previous limma analysis (see limmaAnalysis) to birteRun. When working with relative expression levels (log fold changes) birteLimma allows to deal with arbitrary complex statistical designs, including e.g. time as a covariate.
mRNA expression data matrix. IMPORTANT: Replicates must be ordered according to nrep.mRNA
mRNA.Sigma
gene expression variances (array data). IMPORTANT: Names have to match the row names in dat.mRNA. If mRNA.Sigma = NULL, birte.run tries to deduce variances from a limma analysis (see limmaAnalysis)
nrep.mRNA
number of replicates per condition.
df.mRNA
residual degrees of freedom of linear model for mRNA data.
data.regulators
list with at most 3 components (miRNA, TF, other). Each component contains one data matrix. IMPORTANT: Samples in data matrices have to be grouped according to conditions. That means first there are all samples from the first condition, then those from the second condition, etc.
sigma.regulators
list with at most 3 components (miRNA, TF, other). Each component contains one named vector of expression variances (array data) or dispersion parameters (RNAseq data). IMPORTANT: Names have to fit to the row names of the data matrix.
nrep.regulators
list with at most 3 components (miRNA, TF, other). Each component contains the number of replicates per condition
diff.regulators
list with at most 3 components (miRNA, TF, other). Each component is a character vector with differentially expressed regulators. Has to be subset of row names of the data matrix
init.regulators
list with at most 3 components (miRNA, TF, other). Each component is matrix of #conditions x length(affinities[[regulator type]]): initial states for regulators. In case this matrix is not provided (i.e. NULL) initial states are assumed to be 0. IMPORTANT: column names have to match names(affinities[[regulator type]])
theta.regulators
list with at most 3 components (miRNA, TF, other). If single numbers are provided, each component contains the expected fraction of active regulators. If vectors are provided, each vector entry corresponds to the individual probability of a specific regulator to be active. Accordingly, vectors should be named in agreement with the regulator-target gene network. If affinities$other corresponds to interaction terms between regulators, theta.regulators can also be provided as a #regulators x #regulators matrix.
reg.interactions
If TRUE, entries of affinities$other are interpreted as interaction terms between regulators.
affinities
Regulator-target gene interactions. This is a list with at most three components (TF, miRNA, other). Each of these lists again contains a weighted adjacency list representation. See humanNetworkSimul for an example. IMPORTANT: gene names used in this network have to match with row names of dat.mRNA. Moreover, regulator names have to fit to row names of the corresponding data matrices.
use.affinities
Should weights given in the bipartite regulator-target gene graph given a specific meaning? If yes, it is assumed that weights correspond to quantitative influences of regulators on their targets.
niter
Number of MCMC iterations (AFTER burnin).
nburnin
Number of MCMC iterations UNTIL burnin is assumed to be finished.
thin
Thinning of Markov chain: only use every thin's sample for posterior computation.
potential_swaps
Pre-computed potential swaps (OPTIONAL, see get_potential_swaps).
only_switches
Should only switches be performed?
only.diff.TFs
Should, in case of TF expression data, only the information for differentially expressed TFs be considered? Note that this makes fewer assumption about the relation of mRNA and protein expression data, but typically leads to less conservative results (i.e. more TFs predicted to be active).
limmamRNA
results of limma analysis for mRNA data according to limmaAnalysis
limma.regulators
list with at most 3 components (miRNA, TF, other). Each component contains the results of a limma analysis for regulator data according to limmaAnalysis
lfc.regulators
list with at most 3 components (miRNA, TF, other). Each component contains the log fold change cutoff for differential expression. It is assumed to be 0, if not provided.
fdr.regulators
list with at most 3 components (miRNA, TF, other). Each component contains the FDR cutoff for differential expression (DEFAULT: 0.05).
explain.LFC
If yes, biRte tries to explain mRNA log fold changes rather than expression levels itself.
model
If "no-plug-in", for marginal log likelihoods are considered for regulator specific expression data. Otherwise, (posterior) variance estimates are used directly.
Value
The function returns a list containing the following entries:
post
#regulators x #conditions matrix containing the marginal probability for each regulator to influence mRNA expression.
map
#regulators x #conditions matrix containing the regulator configuration with highest joint probability.
coef
matrix of #conditions x #replicates: Each entry is itself a matrix (embedded into a list) of #coefficients x #effective samples. The data contains the expected regression coefficients.
log_lik_trace
(Marginal) log-likelihood trace of MCMC sampling.
eff_sample_size
effective sample size after burnin and thinning
contains.interactions
TRUE, if affinities$other corresponds to interaction terms between regulators, FALSE otherwise
explain.LFC
TRUE, if model explains mRNA log fold change, FALSE otherwise
nburnin
number of burnin iterations - as provided as an argument
affinities
original regulator-target gene network
C_cnt
number of conditions
design
design matrix effectively used for model training
param
estimated parameters for mRNA precision (i.e. inverse variance) distribution
Author(s)
Holger Froehlich
Examples
# artificial data
data(humanNetworkSimul)
sim = simulateData(affinities2)
limmamRNA = limmaAnalysis(sim$dat.mRNA, design=NULL, "treated - control")
limmamiRNA = limmaAnalysis(sim$dat.miRNA, design=NULL, "treated - control")
limmaTF = limmaAnalysis(sim$dat.TF, design=NULL, "treated - control")
# burnin and sampling size is much too small in reality
result = birteLimma(dat.mRNA=sim$dat.mRNA,
data.regulators=list(miRNA=sim$dat.miRNA, TF=sim$dat.TF),
limmamRNA=limmamRNA, limma.regulators=list(miRNA=limmamiRNA, TF=limmaTF),
affinities=affinities2, niter=100, nburnin=100, thin=2)
plotConvergence(result)
pred = birtePredict(result, rownames(sim$dat.mRNA))
MSE.Bayes = mean((pred[[1]][[1]]$mean - limmamRNA$pvalue.tab[rownames(sim$dat.mRNA),"logFC"])^2)
MSE.Bayes
# real data
library(Biobase)
data(EColiOxygen)
# prepare network
affinities = list(TF=sapply(names(EColiNetwork$TF), function(tf){
w = rep(1, length(EColiNetwork$TF[[tf]]));
names(w)= EColiNetwork$TF[[tf]]; w}))
# prepare data
colnames(exprs(EColiOxygen)) = make.names(paste(pData(EColiOxygen)$GenotypeVariation,
pData(EColiOxygen)$GrowthProtocol, sep="."))
mydat = cbind(exprs(EColiOxygen)[,colnames(exprs(EColiOxygen)) =="wild.type.aerobic"],
exprs(EColiOxygen)[,colnames(exprs(EColiOxygen)) == "wild.type.anaerobic"])
# more realistic sampling
## Not run:
result = birteRun(dat.mRNA=mydat,
nrep.mRNA=c(3,4), affinities=affinities, niter=10000, nburnin=10000)
plotConvergence(result)
## End(Not run)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(birte)
Loading required package: RcppArmadillo
Loading required package: Rcpp
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/birte/birteRun.Rd_%03d_medium.png", width=480, height=480)
> ### Name: birteRun
> ### Title: Main interface for Bayesian Inference of Regulatory Influence on
> ### Expression (biRte).
> ### Aliases: birteRun birteLimma
>
> ### ** Examples
>
> # artificial data
> data(humanNetworkSimul)
> sim = simulateData(affinities2)
==> simulated 7 active TFs, 3 active miRNAs, 0 active interactions
building W-matrix ...done.
Building expression matrix ...
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -0.08832304 0.44427645 0.231807103 0.4263753 0.06618620 0.77236840
[2,] 0.20196218 0.33708410 0.246422672 0.3753465 0.19381166 0.40831016
[3,] -0.02617412 0.49209744 0.149559887 0.3221747 -0.00274640 0.11093441
[4,] -0.15577233 0.48925971 -0.225634930 0.2160207 0.07954002 0.11264030
[5,] -0.16412914 0.01505488 -0.003880561 0.3268443 0.17735965 0.05813995
[6,] -0.06060351 0.11143828 -0.075091014 0.2648811 0.08172216 0.27833001
[,7] [,8] [,9] [,10]
[1,] 0.1628323 0.01622086 0.323356591 0.64602359
[2,] 0.3792107 -0.10526277 0.172768472 0.33925815
[3,] 0.4725284 -0.08114723 0.110583055 -0.09662859
[4,] 0.5721298 0.35405054 0.069663615 0.18878694
[5,] 0.5514869 -0.18941715 0.171024523 0.06195488
[6,] 0.3975907 0.17351088 -0.004059506 0.46492307
done.
> limmamRNA = limmaAnalysis(sim$dat.mRNA, design=NULL, "treated - control")
> limmamiRNA = limmaAnalysis(sim$dat.miRNA, design=NULL, "treated - control")
> limmaTF = limmaAnalysis(sim$dat.TF, design=NULL, "treated - control")
>
> # burnin and sampling size is much too small in reality
> result = birteLimma(dat.mRNA=sim$dat.mRNA,
+ data.regulators=list(miRNA=sim$dat.miRNA, TF=sim$dat.TF),
+ limmamRNA=limmamRNA, limma.regulators=list(miRNA=limmamiRNA, TF=limmaTF),
+ affinities=affinities2, niter=100, nburnin=100, thin=2)
Automatic assignement of #replicates with design matrix (mRNA data).
miRNA ==> Automatic assignement of #replicates with design matrix.
TF ==> Automatic assignement of #replicates with design matrix.
Formatting regulator-target network -> checking overlap between network and measurements.
--> biRte tries to explain mRNA log fold changes
TF
6 differential regulators of type TF found
Calculating potential swaps for regulator type TF
miRNA
3 differential regulators of type miRNA found
Calculating potential swaps for regulator type miRNA
BIRTE
Data and network: #mRNAs = 999 , #miRNAs = 13 , #TFs = 111 , #other influence factors = 0 , #TFs with expression data = 6 , miRNA expression data = TRUE , use data for other influencing factors = FALSE
Using *relative* expression changes
Prior parameters: theta_TF = 0.8 0.8 0.8 0.8 0.8 0.8 , theta_miRNA = 0.07692308 0.07692308 0.8 0.07692308 0.07692308 0.07692308 , theta_other =
Hyperparameters for variance prior (mRNA): alpha = 3.563368 , beta = 3.563368
Hyperparameters for variance priors (miRNA, TFs): alpha (miRNA) = 2.063479 , beta (miRNA) = 0.09145334 , alpha (TF) = 3.429703 , beta (TF) = 0.2186442
MCMC parameters: burnin = 100 , niter = 100 , thin = 2 , MODEL = 2
Q_cnt = 0
sampling ...
initial X[0]: 999 x 1
initial (marginal) log-likelihood = -1732.94
(marginal) log-likelihood = -1701.38, #TFs[0] = 7, #miRNAs[0] = 1, #others/interactions[0] = 0 #regulators[0] = 8
Effective sample size = 50
finished.
> plotConvergence(result)
> pred = birtePredict(result, rownames(sim$dat.mRNA))
> MSE.Bayes = mean((pred[[1]][[1]]$mean - limmamRNA$pvalue.tab[rownames(sim$dat.mRNA),"logFC"])^2)
> MSE.Bayes
[1] 2.969365
>
>
> # real data
> library(Biobase)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> data(EColiOxygen)
> # prepare network
> affinities = list(TF=sapply(names(EColiNetwork$TF), function(tf){
+ w = rep(1, length(EColiNetwork$TF[[tf]]));
+ names(w)= EColiNetwork$TF[[tf]]; w}))
> # prepare data
> colnames(exprs(EColiOxygen)) = make.names(paste(pData(EColiOxygen)$GenotypeVariation,
+ pData(EColiOxygen)$GrowthProtocol, sep="."))
> mydat = cbind(exprs(EColiOxygen)[,colnames(exprs(EColiOxygen)) =="wild.type.aerobic"],
+ exprs(EColiOxygen)[,colnames(exprs(EColiOxygen)) == "wild.type.anaerobic"])
>
> # more realistic sampling
> ## Not run:
> ##D result = birteRun(dat.mRNA=mydat,
> ##D nrep.mRNA=c(3,4), affinities=affinities, niter=10000, nburnin=10000)
> ##D plotConvergence(result)
> ## End(Not run)
>
>
>
>
>
> dev.off()
null device
1
>