R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Function to recover Auto Regressive TIme-VArying interactions...

ARTIVAsubnet

R Documentation

Function to recover Auto Regressive TIme-VArying interactions between a gene of interest (referred to as "target gene") and a set of genes which may explain the expression of the target gene.

Description

This function generates Reversible Jump MCMC (RJ-MCMC) sampling for approximating the posterior distribution of a time varying regulatory network, under the Auto Regressive TIme VArying (ARTIVA) model (for a detailed description of the algorithm, see Lebre et al. BMC Systems Biology, 2010).

Starting from time-course gene expression measurements for a gene of interest (referred to as "target gene") and a set of genes (referred to as "parent genes") which may explain the expression of the target gene, the ARTIVA procedure identifies temporal segments for which a set of interactions occur between the "parent genes" and the "target gene". The time points that delimit the different temporal segments are referred to as changepoints (CP).

If the measurements time delay is short enough so that the expression of the target gene depends more likely on the expression of the parent genes at some previous time points, then the time delay for the interactions can be chosen with argument dyn. In that case the set of parent genes may contain the target gene (auto-regulation). Otherwise, contemporaneous measurements of the parent genes are used to explain the expression of the target gene and argument dyn is set to 0. In the latter case, the target gene must be kept out of the set of possible parent genes.

The ARTIVA algorithm uses a combination of efficient and robust methods: (1) dynamical Bayesian networks (DBN) to model directed regulatory interactions between the parent genes and the analyzed target gene (2) RJ-MCMC sampling for inferring - simultaneously - (a) the time position (changepoints) at which the regulatory interactions between the parent genes and the target gene changes and (b) the regulatory network topologies (interactions between parent and target genes) associated with the temporal segments delimited by the changepoints.

If available, repeated measurements can be used, the design of experiments must be specified with argument dataDescription.

Usage

ARTIVAsubnet(targetData, parentData, targetName="Target",parentNames=NULL, 
dataDescription=NULL, saveEstimations=TRUE, saveIterations=FALSE, 
savePictures = TRUE, outputPath=NULL, dyn=1,  segMinLength=2, maxCP=NULL, 
maxPred=NULL, nbCPinit=NULL, CPinit=NULL, niter=50000, burn_in=NULL, 
PSRFactor=FALSE, PSRF_thres=1.05, segmentAnalysis=TRUE,  edgesThreshold=0.5,
layout = "fruchterman.reingold", cCP= 0.5, cEdges=0.5, alphaCP=1,
betaCP=0.5, alphaEdges=1, betaEdges=0.5, v0=1, gamma0=0.1, alphad2=2,
betad2=0.2, silent=FALSE)

Arguments

`targetData`	A vector with the temporal gene expression measurements for the target gene (i.e. the gene whose regulation factors are looked for).
`parentData`	A matrix (or a vector if only 1 parent gene) with the temporal gene expression measurements for the proposed parent genes (i.e. potential regulation factors). Parent genes are shown in row and expression values in column. For computational reasons, we advise not to test simultaneously more than 10 parent genes.
`targetName`	Name of the target gene (optional, default: targetName="Target").
`parentNames`	A vector with the names for parent gene(s) (optional, default: `parentNames=NULL`).
`dataDescription`	(Required only when the gene expression measurements contain repeated values for the same time points). A vector indicating the ordering of the time measurements in the data. For example dataDescription=rep(1:n, each=2), if there are two measurements for each time point AND the repetitions for each time point are next to each other. Note that temporal gene expression measurements have to be organized identically in arguments `targetData` and `parentData` (optional, default: `dataDescription=NULL`).
`saveEstimations`	Boolean, if `TRUE` all estimated posterior distributions are saved as text files either in a new sub folder named "ARTIVA_Results" created by default in the current folder or in a folder specified with argument `outputPath` (see below) (optional, default: `saveEstimations=TRUE`).
`saveIterations`	Boolean, if `TRUE` the configuration for all iterations is saved as text files either in a new sub folder named "ARTIVA_Results" created by default in the current folder or in a folder specified with argument `outputPath` (see below) (optional, default: `saveIterations=FALSE`).
`savePictures`	Boolean, if `TRUE` all estimated posterior distributions and networks are plotted in a pdf file either in a new sub folder named "ARTIVA_Results" created by default in the current folder or in a folder specified with argument `outputPath` (see below) (optional, default: `savePictures=TRUE`).
`outputPath`	File path to a folder in which the output results have to be saved, either a complete path or the name of a folder to be created in the current directory (optional, default: `outputPath=NULL`).
`dyn`	Time delay to be considered in the auto-regressive process, between the temporal expression measurements of the analyzed target gene and the ones of the parent genes (optional, default: `dyn=1`).
`segMinLength`	Minimal length (number of time points) to define a temporal segment. Must be - strictly - greater than 1 if there is no repeated measurements for each time point in arguments `targetData` and `parentData` (optional, default: `segMinLength=2`).
`maxCP`	Maximal number of CPs to be considered by the ARTIVA inference procedure. Note that for a temporal course with `n` time points the maximal number of CPs is `(n-1-dyn)` (see before for a description of argument `dyn`). For long temporal courses (more than 20 time points), we advise - for computational reasons - to limit the maximal number of CP to 15 (optional, default: `maxCP=NULL`, if `maxCP=NULL` then the maximal number of CPs is set to `maxCP=min(round((n-1-dyn)/segMinLength)-1, 15)` where `n` is the number of time points).
`maxPred`	Maximal number of simultaneous incoming edges for each segment of the regulatory network estimated for the target gene (default: `maxPred=NULL`, if `maxPred=NULL` then the maximal number of incoming edges is set to `maxPred=min(dim(parentData)[1],15)`)
`nbCPinit`	Number of CPs to be considered at the algorithm initialization (optional, default: `nbCPinit=NULL`, if `nbCPinit=NULL` then the initial number of CPs is randomly set to a value between 0 and `maxCP/2`.).
`CPinit`	A vector with the initial positions for potential CPs. (optional, default: `CPinit = NULL`, if `CPinit = NULL` then the initial vector is chosen randomly according to priors, with `nbCPinit` changepoints (see previous argument `nbCPinit`) ).
`niter`	Number of iterations to be performed in the RJ-MCMC sampling (optional, default: `niter = 50000`).
`burn_in`	Number of initial iterations that are discarded for the estimation of the model distribution (posterior distribution). The `ARTIVAsubnet` function is a RJ-MCMC algorithm which, at each iteration, randomly samples a new configuration of the time-varying regulatory network from probability distributions based on constructing a Markov chain that has the network model distribution as its equilibrium distribution (The equilibrium distribution is obtained when the Markov Chain converges, which requires a large number of iterations). Typically, initial iterations are notconfident because the Markov Chain has not stabilized. The burn-in samples allow to not consider these initial iterations in the final analysis (optional, default: `burn_in=NULL`, if `burn_in=NULL` then the first 25% of the iterations is left for `burn_in`).
`PSRFactor`	Boolean, if `TRUE` the RJ-MCMC procedure is stopped when the Potential Scale Ratio Factor or PSRF (which is a usual convergence criterion) is below a specified threshold (see documentation for argument `PSRF_thres` below) or when the maximal number of iterations is reached (see documentation for argument `niter` previously) (optional, default: `PSRFactor=FALSE`). For more details about the PSRF criterion, see Gelman and Rubin (1992).
`PSRF_thres`	(Only when `PSRFactor=TRUE`) RJ-MCMC stopping threshold: the RJ-MCMC procedure is stopped when the Potential Scale Ratio Factor (PSRF) is below this threshold (see documentation for argument `PSRFactor` previously). Values of the PSRF below 1.1 are usually taken as indication of sufficient convergence (optional, default: `PSRF_threshold=1.05`).
`segmentAnalysis`	Boolean, if `TRUE` the posterior distribution for the edges in each temporal segment delimited by the estimated changepoints (CPs) is computed. (The estimated CPs are the `k` CPs with maximal posterior probability, where `k` is the number of CPs having the maximal porsterior probability (such that each temporal segment is larger than `segMinLength`) (see documentation below for output values `CPpostDist`, `nbSegs`, `CPposition`. If `segmentAnalysis=FALSE` only the posterior distributions of the CPs number and CPs positions are computed (optional, default: `segmentAnalysis=TRUE`).
`edgesThreshold`	Probability threshold for the selection of the edges of the time-varying regulatory network when `segmentAnalysis=TRUE` (optional, default: `edgesThreshold=0.5`).
`layout`	Name of the function determining the placement of the vertices for drawing a graph, possible values among others: `"random",` `"circle",` `"sphere",` `"fruchterman.reingold",` `"kamada.kawai",` `"spring",` `"reingold.tilford",` `"fruchterman.reingold.grid"`, see package `igraph0` for more details (default: `layout="fruchterman.reingold"`).
`cCP`	Maximal probability to propose the birth (resp. death) of a changepoint (CP) during the RJ-MCMC iterations (optional, default: `cCP=0.5`).
`cEdges`	Maximal probability - when a move update of the network topology is chosen - to propose ecah edge move (birth or death of an edge) within a temporal segment (optional, default: `cEdges=0.5`).
`alphaCP`	Hyperparameter for sampling the number `k` of CPs. `k` follows a Gamma distribution: Gamma(alphaCP, betaCP (see below)) (optional, default: `alphaCP=1`). Note that function `choosePriors` can be used to choose `alphaCP` (or `betaCP`) according to the desired dimension penalisation.
`betaCP`	Hyperparameter for sampling the number `k` of CPs. `k` follows a Gamma distribution: Gamma(alphaCP (see before), `betaCP`) (optional, default: `betaCP=0.5`). Note that function `choosePriors` can be used to choose `betaCP` (or `alphaCP`) according to the desired dimension penalisation.
`alphaEdges`	Hyperparameter for sampling the number `l` of regulatory interactions between the target gene and the parent genes. `l` follows a Gamma distribution: `rgamma(1,` `shape=alphaEdges, rate=betaEdges)` (see `betaEdges` below), (default: `alphaEdges=1`). Function `choosePriors` can be used to choose `alphaEdges` (or `betaEdges`) according to the desired dimension penalisation.
`betaEdges`	Hyperparameter for sampling the number `l` of regulatory interactions between the target gene and the parent genes. `l` follows a `Gamma` distribution: `rgamma(1,` `shape=alphaEdges, rate=betaEdges)` (see `alphaEdges` upper) (optional, default: `betaEdges=0.5`). Function `choosePriors` can be used to choose `betaEdges` (or `alphaEdges`) according to the desired dimension penalisation.
`v0`	Hyperparameter for sampling the noise variance (denoted by `Sig2`) in the auto-regressive model defining the regulatory time vrarying network. The prior distribution of the noise variance is an Inverse Gamma distribution with shape parameter `v0/2` and scale parameter `gamma0/2`: `rinvgamma(1,shape=v0/2,` `scale = gamma0)`, (optional, default: `v0=1`).
`gamma0`	Hyperparameter for sampling the noise variance (denoted by `Sig2`) in the ARTIVA package) in the auto-regressive model defining the regulatory time vrarying network. The prior distribution of the noise variance is an Inverse Gamma distribution with shape parameter `v0/2` and scale parameter `gamma0/2`: `rinvgamma(1,` `shape=v0/2, scale = gamma0)`, (optional, default: `gamma0=0.1`).
`alphad2`	Hyperparameter for sampling a parameter that represents the expected signal-to-noise ratio (denoted by `delta2` in the ARTIVA package). It is sampled according to an Inverse Gamma distribution: `rinvgamma(1, shape=alphad2,` `scale=betad2)`, (optional, default: `alphad2=2`).
`betad2`	Hyperparameter for sampling a parameter that represents the expected signal-to-noise ratio (denoted by `delta2` in the ARTIVA package). It is sampled according to an Inverse Gamma distribution: `rinvgamma(1, shape=alphad2,` `scale=betad2)`, (optional, default: `betad2=0.2`).
`silent`	Boolean, if `TRUE` messages are printed along the ARTIVA procedure (optional, default: `silent=FALSE`).

Value

`Samples`	Results obtained at each iteration of the RJ-MCMC procedure. `Samples` is list composed of the following elements: 1) `Samples$CP`: a matrix with in row the different iterations performed with the `ARTIVAsubnet` function and in column the identified positions for CPs (if the parameter dyn=1, first CP=2 and final CP=n+1, with n the number of time points); 2) `Samples$Edges`: a matrix with in row the different ARTIVA iterations and in column the number of regulatory interactions identified in each temporal phases (-1 values are default values, when no temporal phase exist); 3) `Samples$coeff`: a matrix with in row the different ARTIVA iterations and in column the coefficient values corresponding to the identified regulatory interactions. 4) `Samples$variance`: a matrix with in row the different ARTIVA iterations and in column the variance values modelling data noise for each identified temporal phase.
`Counters`	Results obtained at each iteration of the RJ-MCMC procedure. `Counters` is list composed of the following elements: 1) `Counters$CPMovesCount`: Number of modifications PROPOSED during ARTIVA iterations in term of CPs (i.e. birth of a new CP, death of an existing CP, move of an existing CP or upate of regulatory models in each temporal phases). 2) `Counters$CPMovesAcceptationPrct`: Percentage of modifications ACCEPTED during ARTIVA iterations in term of CPs (i.e. birth of a new CP, death of an existing CP, move of an existing CP or upate of regulatory models in each temporal phases). 3) `Counters$EdgesMoveCount`: Number of modifications PROPOSED during ARTIVA iterations in term of regulatory models (i.e. birth of a new edge between parent and target genes, death of an existing edge or update of the regression coefficient for existing edges). 4) `Counters$EdgesMovesAcceptationPrct`: Percentage of modifications ACCEPTED during ARTIVA iterations in term of regulatory models (i.e. birth of a new edge between parent and target genes, death of an existing edge or update of the regression coefficient for existing edges). 5) `Counters$iterations`: Total number of iterations generated by the ARTIVA procedure. 6) `Counters$rcvgce`: (only if `PSRFactor=TRUE`) Number of iterations before the stopping criterion is reached, i.e. before the PSRF factor is below the threshold specified with argument `PSRF_thres`. The returned value is NULL if the stopping criterion is not reached.
`CPpostDist`	A list of 2 tables : 1) `CPpostDist$CPnumberPostDist`: A table containing the approximated distribution for the number of CPs. 2)`CPpostDist$CPpositionPostDist`: A table containing the approximated distribution for the position of the CPs.
`nbSegs`	An integer equal to the number of temporal segments with the largest value observed in the posterior distribution (see previously `CPnumberPostDist`).
`SegmentPostDist`	(only when parameter `segmentAnalysis=TRUE`) A list of tables: 1) `SegmentPostDist$CPpos`: A table containing the most significant CP positions that delimit `nbSegs` temporal segments, according to `CPpostDist$CPnumber`, `CPpostDist$CPposition` and `segMinLength` (if parameter `dyn=1`, first CP is `2` and final CP is `n+1`, where `n` is the number of time points). 2) `SegmentPostDist$edgesPostDist`: A table containing the approximate posterior distribution for the incoming edges (regulatory interaction between parent and target genes) for each temporal segment delimited by the CP given in `SegmentPostDist$CPpos` (see previously). Each raw corresponds to a segment, ordered by time. 3) `SegmentPostDist$edgesCoeff` A table containing the estimated coefficients for the incoming edges (regulatory interaction between parent and target genes) for each temporal segment delimited by the CP given in `SegmentPostDist$CPpos` (see previously). Each raw corresponds to a segment, ordered by time.
`network`	A table containing the information to plot (see function `traceNetworks`) the network estimated with the `ARTIVAsubnet` procedure.
`GLOBvar`	A list of parameters used in the `ARTIVAsubnet` procedure
`HYPERvar`	A list of hyperparameters used in the `ARTIVAsubnet` procedure
`OUTvar`	A list of output parameters used in the `ARTIVAsubnet` procedure.
`targetData`	`targetData` vector given as input of the `ARTIVAsubnet` procedure.
`parentData`	`parentData` matrix given as input of the `ARTIVAsubnet` procedure.

Author(s)

S. Lebre and G. Lelandais

References

S. Lebre, J. Becq, F. Devaux, M. P. H. Stumpf, G. Lelandais (2010) Statistical inference of the time-varying structure of gene-regulation networks BMC Systems Biology, 4:130.

Gelman, A. and D. Rubin (1992) Inference from iterative simulation using multiple sequences. Statistical science 7 (4), 457-472.

Examples

# Load the ARTIVA R package
library(ARTIVA)

# Load the dataset with simulated gene expression profiles
data(simulatedProfiles)

# Name of the target gene to be analyzed with ARTIVA 
targetGene = 1

# Names of the parent genes (typically transcription factors) 
parentGenes = c("TF1", "TF2", "TF3", "TF4", "TF5")

###
# ARTIVA analysis searching for potential interactions between the target 
# genes and a predefined list of parent genes. 
###

# Note that the number of iterations in the RJ-MCMC sampling is reduced 
# to in this example to 'niter=20000' (in order obtain a quick overview
# of the ARTIVAnet fonction, but it should be increased (e.g. up to
# 50000) for a better parameter estimation.

# Without saving the output pictures "savePictures=FALSE"
## Not run: 
ARTIVAtest = ARTIVAsubnet(targetData = simulatedProfiles[targetGene,],
  parentData = simulatedProfiles[parentGenes,],
  targetName = targetGene,
  parentNames = parentGenes,
  segMinLength = 2,
  edgesThreshold = 0.5, 
  niter = 20000,
  savePictures=FALSE)

# By default, the output results (pictures and estimation values) are
# saved in a folder named "ARTIVAsubnet" created in the current directory
ARTIVAtest = ARTIVAsubnet(targetData = simulatedProfiles[targetGene,],
  parentData = simulatedProfiles[parentGenes,],
  targetName = targetGene,
  parentNames = parentGenes,
  segMinLength = 2,
  edgesThreshold = 0.5, 
  niter = 20000)

# In order to save the results in a specific folder, for example a
# folder entitled "ARTIVA-test" in the current directory:

ARTIVAtest2 = ARTIVAsubnet(targetData = simulatedProfiles[targetGene,],
  parentData = simulatedProfiles[parentGenes,],
  targetName = targetGene,
  parentNames = parentGenes,
  segMinLength = 2,
  edgesThreshold = 0.5, 
  niter = 20000,
  outputPath = "ARTIVA-test")

## End(Not run)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ARTIVA)
Loading required package: MASS
Loading required package: igraph

Attaching package: 'igraph'

The following objects are masked from 'package:stats':

    decompose, spectrum

The following object is masked from 'package:base':

    union

Loading required package: gplots

Attaching package: 'gplots'

The following object is masked from 'package:stats':

    lowess

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/ARTIVA/ARTIVAsubnet.Rd_%03d_medium.png", width=480, height=480)
> ### Name: ARTIVAsubnet
> ### Title: Function to recover Auto Regressive TIme-VArying interactions
> ###   between a gene of interest (referred to as "target gene") and a set
> ###   of genes which may explain the expression of the target gene.
> ### Aliases: ARTIVAsubnet
> ### Keywords: DBN inference network inference graphical model time series
> 
> ### ** Examples
> 
> # Load the ARTIVA R package
> library(ARTIVA)
> 
> # Load the dataset with simulated gene expression profiles
> data(simulatedProfiles)
> 
> # Name of the target gene to be analyzed with ARTIVA 
> targetGene = 1
> 
> # Names of the parent genes (typically transcription factors) 
> parentGenes = c("TF1", "TF2", "TF3", "TF4", "TF5")
> 
> ###
> # ARTIVA analysis searching for potential interactions between the target 
> # genes and a predefined list of parent genes. 
> ###
> 
> # Note that the number of iterations in the RJ-MCMC sampling is reduced 
> # to in this example to 'niter=20000' (in order obtain a quick overview
> # of the ARTIVAnet fonction, but it should be increased (e.g. up to
> # 50000) for a better parameter estimation.
> 
> # Without saving the output pictures "savePictures=FALSE"
> ## Not run: 
> ##D ARTIVAtest = ARTIVAsubnet(targetData = simulatedProfiles[targetGene,],
> ##D   parentData = simulatedProfiles[parentGenes,],
> ##D   targetName = targetGene,
> ##D   parentNames = parentGenes,
> ##D   segMinLength = 2,
> ##D   edgesThreshold = 0.5, 
> ##D   niter = 20000,
> ##D   savePictures=FALSE)
> ##D 
> ##D # By default, the output results (pictures and estimation values) are
> ##D # saved in a folder named "ARTIVAsubnet" created in the current directory
> ##D ARTIVAtest = ARTIVAsubnet(targetData = simulatedProfiles[targetGene,],
> ##D   parentData = simulatedProfiles[parentGenes,],
> ##D   targetName = targetGene,
> ##D   parentNames = parentGenes,
> ##D   segMinLength = 2,
> ##D   edgesThreshold = 0.5, 
> ##D   niter = 20000)
> ##D 
> ##D # In order to save the results in a specific folder, for example a
> ##D # folder entitled "ARTIVA-test" in the current directory:
> ##D 
> ##D ARTIVAtest2 = ARTIVAsubnet(targetData = simulatedProfiles[targetGene,],
> ##D   parentData = simulatedProfiles[parentGenes,],
> ##D   targetName = targetGene,
> ##D   parentNames = parentGenes,
> ##D   segMinLength = 2,
> ##D   edgesThreshold = 0.5, 
> ##D   niter = 20000,
> ##D   outputPath = "ARTIVA-test")
> ## End(Not run)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>