Last data update: 2014.03.03

R: Bayesian Network Discovery using a Standard MCMC Algorithm
Networks.STDR Documentation

Bayesian Network Discovery using a Standard MCMC Algorithm

Description

This function implements the standard Markov chain Monte Carlo (MCMC) algorithm to perform feature selection and sub-network discovery using a Bayesian nonparametric model based on the Dirichlet process mixture (DPM) model and the Ising model.

Usage

Networks.STD(pvalue,net,iter=5000, nburns=2000,
piall=c(0.75, 0.8, 0.85, 0.9),rhoall=c(0.5, 1, 5, 10, 15),
status=FALSE,fit,show.steps=1,showlikelihood=FALSE,likelihood.frequency=100)

Arguments

pvalue

a vector of p-values obtained from large scale statistical hypothesis testing

net

an "n" by "n" binary adjacent matrix (0/1) for the network configuration with n=length(pvalue)

iter

the number of iterations; the default is 5000

nburns

the number of burn-in; the default is 2000

piall

a vector of possible choices for "pi0" in an increasing order; the default value is c(0.75, 0.8, 0.85, 0.9)

rhoall

a vector of possible choices of "rho0" and "rho1" in an increasing order; the default value is c(0.5, 1, 5, 10, 15)

status

a logical variable indicating whether the you are initial an MCMC chain or resume a chain

fit

a list provide the current values of parameters which are used for resume the chain

show.steps

integer representing the frequency of the results of iterations presented, the default value is 1.The setting is invalid when trace=FALSE. The setting would not affect the data saved, only for printing

showlikelihood

a logical variable indicating whether to show the log-likelihood value simultaneously.Set TRUE if show the log-likelihood value simultaneously, FALSE, otherwise. FALSE is the default setting

likelihood.frequency

a number representing the frequency showing the log-likelihood value simultaneously. For example, setting likelihood.frequency=100 means showing the log-likelihood value every 100 iterations. The default setting is 100 and it is recommended that do not set a small frequency because it slow down the MCMC chain updating

Details

This function implements a Bayesian nonparametric mixture model for feature selection incorporating network information (Zhao et al., 2014):

  • r_i| g_i, theta ~ N(mu_g_i, sigma_g_i),

  • g_i | z_i=k, q_k ~ Discrete(a_k, q_k),

  • theta ~ G_0k, for g in a_k,

  • q_k ~ Dirichlet(tau_k 1_{L_k}/L_k),

  • theta={theta_g}_{g in a_0 and a_1}

  • theta_g=(mu_g, sigma_g)

where we define

Index

a_0=(-L_0+1,-L_0+2,...,0) , a_1=(1,2,...,L_1) and the correspondent probability q_0=(q_-L_0+1, q_-L_0+2, ...,q_0), q_1=(q_1, q_2, ..., q_L_1), according to the definition of Discrete(a_k, b_k), for example, Pr(g_i=L_0+2)=q_-L_0+2.

Assumption

In this algorithm, we assume that "important" features should have larger statistics comparing to "unimportant" ones without the loss of generality. In this regard, we set the restriction mu_g<mu_g+1 for g=-L_0+1, -L_0+2,...,L_1.

This function implements the NET-DPM-1 Zhao et al.(2014). Please refer to the Appendix B.1 for more details.

Value

An object of class "Networks.STD" containing Bayesian information of the MCMC chain

trace

a matrix of dimension (iter-nburns) by length(pvalue) containing posterior samples of classification label "g_i"

convergence

MCMC Heidelberger and Welch convergence diagnostic

graph

An igraph graph object of full network

model

a list containing parameters of the MCMC chain for resuming the chain

Author(s)

Zhou Lan, Jian Kang, Tianwei Yu and Yize Zhao

Department of Biostatistics and Bioinformatics, Emory University

References

Zhao, Y., Kang, J., Yu, T. A Bayesian nonparametric mixture model for selecting gene and gene-sub network, Annals of Applied Statistics, In press: 2014.

Zhou Lan, Jian Kang, Tianwei Yu, Yize Zhao, BANFF: an R package for network identifications via Bayesian nonparametric mixture models, working paper.

Examples

library(igraph)
###Creating the network of 10X10 image
g <- graph.lattice(length=10,dim=2)
net=as(get.adjacency(g,attr=NULL),"matrix")##this is the input of argument 'net'
##Assign the signal elements with signal intention 
##as normal distribution N(1,0.2). While noise is set as N(0,0.2)
newz=rep(0,100)
for (i in 3:7)
{
 newz[(i*10+3):(i*10+7)]=1
}
testcov<-0
for(i in 1:100){
 if(newz[i]==0){
   testcov[i]<-rnorm(1,mean=0,sd=0.2)

 }else{
  testcov[i]<-rnorm(1,mean=1,sd=0.2)

 }
}
##The profile of the image
image(matrix(testcov,10,10),col=gray(seq(0,1,length=255)))
##Transform the signals into pvalue form and begin identification
pvalue=pnorm(-testcov)
total=Networks.STD(pvalue,net,iter=3,nburns=1,
piall=c(0.8, 0.85, 0.9, 0.95),rhoall=c(0.5,1,5,10,15))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(BANFF)
Loading required package: foreach
Loading required package: doParallel
Loading required package: iterators
Loading required package: parallel
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/BANFF/Networks.STD.Rd_%03d_medium.png", width=480, height=480)
> ### Name: Networks.STD
> ### Title: Bayesian Network Discovery using a Standard MCMC Algorithm
> ### Aliases: Networks.STD
> 
> ### ** Examples
> 
> library(igraph)

Attaching package: 'igraph'

The following objects are masked from 'package:stats':

    decompose, spectrum

The following object is masked from 'package:base':

    union

> ###Creating the network of 10X10 image
> g <- graph.lattice(length=10,dim=2)
> net=as(get.adjacency(g,attr=NULL),"matrix")##this is the input of argument 'net'
> ##Assign the signal elements with signal intention 
> ##as normal distribution N(1,0.2). While noise is set as N(0,0.2)
> newz=rep(0,100)
> for (i in 3:7)
+ {
+  newz[(i*10+3):(i*10+7)]=1
+ }
> testcov<-0
> for(i in 1:100){
+  if(newz[i]==0){
+    testcov[i]<-rnorm(1,mean=0,sd=0.2)
+ 
+  }else{
+   testcov[i]<-rnorm(1,mean=1,sd=0.2)
+ 
+  }
+ }
> ##The profile of the image
> image(matrix(testcov,10,10),col=gray(seq(0,1,length=255)))
> ##Transform the signals into pvalue form and begin identification
> pvalue=pnorm(-testcov)
> total=Networks.STD(pvalue,net,iter=3,nburns=1,
+ piall=c(0.8, 0.85, 0.9, 0.95),rhoall=c(0.5,1,5,10,15))
iteration: 1 
iteration: 2 
iteration: 3 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>