Last data update: 2014.03.03

R: Assign each multihit to a unique region based on the...
disambiguateMultihitsR Documentation

Assign each multihit to a unique region based on the posterior for the read-enriched hidden state

Description

Among multiple alignments of the same read (i.e. multihit), select the alignment corresponding to the bin with the maximum posterior for the enriched state.

Usage

disambiguateMultihits(alignGal, nbhGRList, postprobCutoff = 0)

Arguments

alignGal

GAlignments object with an additional column in the values slot that indicates whether the read corresponding to the current alignment is a unique hit (i.e., read mapped uniquely to a single loci) or multihit (i.e., read mapped to multiple loci).

nbhGRList

GRangesList each item containig the HMM training results on a single chromosome. Importantly, the posterior probabilities for the background and enriched states need to be present the metadata slot and used to disambiguate multihits, which is done by mainSeekSingleChrom.

postprobCutoff

Posterior cutoff for returning only the reads with maximum posterior that is greater than the threshold (Default: 0; i.e., no cutoff).

Details

Each multihit (i.e., read aligned to multiple loci) flagged in the getAlignGal function are assigned to a unique locus corresponding to the j^th bin with the highest posterior or responsibility from the RIP state. Intuitively, the RIP state corresponds to the read-enriched loci. Disambiguating multihits in this way will potentially improve the power of detecting more RIP regions but may also introduce certain bias towards the idea of "rich gets richer". After this step, RIPSeeker will rerun the functions from selectBinSize to nbh to improve the HMM model estimation with augmented read count data. Optionally, user can choose not to reiterate the training process to go straight to the next step to detect RIP regions (See seekRIP).

Value

GAlignments with each read mapped uniquely to a single locus.

Author(s)

Yue Li

See Also

getAlignGal, ripSeek, mainSeek, mainSeekSingleChrom

Examples

# Retrieve system files
extdata.dir <- system.file("extdata", package="RIPSeeker") 

bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)

bamFiles <- grep("PRC2", bamFiles, value=TRUE)

# Parameters setting
binSize <- 1e5							  # use a large fixed bin size for demo only
minBinSize <- NULL						# turn off min bin size in automatic bin size selection
maxBinSize <- NULL						# turn off max bin size in automatic bin size selection
multicore <- FALSE						# use multicore
strandType <- "-"						  # set strand type to minus strand

# Retrieve system files
extdata.dir <- system.file("extdata", package="RIPSeeker") 

bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)

bamFiles <- grep("PRC2", bamFiles, value=TRUE)

alignGal <- combineAlignGals(bamFiles=grep(pattern="SRR039214",             
            bamFiles, value=TRUE, invert=TRUE), reverseComplement=TRUE, genomeBuild="mm9")

alignGR <- as(alignGal, "GRanges")

alignGR <- addPseudoAlignment(alignGR)

alignGRList <- GRangesList(as.list(split(alignGR, seqnames(alignGR))))

################ run mainSeekSingleChrom function for HMM inference on a single chromosome ################
nbhGRList <- lapply(alignGRList, mainSeekSingleChrom, K = 2, binSize=binSize, 
			
			minBinSize = minBinSize, maxBinSize = maxBinSize, runViterbi=FALSE)

nbhGRList <- GRangesList(nbhGRList)

alignGalFiltered <- disambiguateMultihits(alignGal, nbhGRList)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(RIPSeeker)
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit


Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector
Loading required package: GenomicAlignments
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/RIPSeeker/disambiguateMultihits.Rd_%03d_medium.png", width=480, height=480)
> ### Name: disambiguateMultihits
> ### Title: Assign each multihit to a unique region based on the posterior
> ###   for the read-enriched hidden state
> ### Aliases: disambiguateMultihits
> 
> ### ** Examples
> 
> # Retrieve system files
> extdata.dir <- system.file("extdata", package="RIPSeeker") 
> 
> bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)
> 
> bamFiles <- grep("PRC2", bamFiles, value=TRUE)
> 
> # Parameters setting
> binSize <- 1e5							  # use a large fixed bin size for demo only
> minBinSize <- NULL						# turn off min bin size in automatic bin size selection
> maxBinSize <- NULL						# turn off max bin size in automatic bin size selection
> multicore <- FALSE						# use multicore
> strandType <- "-"						  # set strand type to minus strand
> 
> # Retrieve system files
> extdata.dir <- system.file("extdata", package="RIPSeeker") 
> 
> bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)
> 
> bamFiles <- grep("PRC2", bamFiles, value=TRUE)
> 
> alignGal <- combineAlignGals(bamFiles=grep(pattern="SRR039214",             
+             bamFiles, value=TRUE, invert=TRUE), reverseComplement=TRUE, genomeBuild="mm9")
Processing /home/ddbj/local/lib64/R/library/RIPSeeker/extdata/PRC2/SRR039210_processed_tophat/accepted_hits_noDup_sel_chrX.bam ... All hits are returned with flags.
Processing /home/ddbj/local/lib64/R/library/RIPSeeker/extdata/PRC2/SRR039211_processed_tophat/accepted_hits_noDup_sel_chrX.bam ... All hits are returned with flags.
2 BAM files are combined
> 
> alignGR <- as(alignGal, "GRanges")
> 
> alignGR <- addPseudoAlignment(alignGR)
*** chr1 do not have any alignment.
*** chr10 do not have any alignment.
*** chr11 do not have any alignment.
*** chr12 do not have any alignment.
*** chr13 do not have any alignment.
*** chr14 do not have any alignment.
*** chr15 do not have any alignment.
*** chr16 do not have any alignment.
*** chr17 do not have any alignment.
*** chr18 do not have any alignment.
*** chr19 do not have any alignment.
*** chr2 do not have any alignment.
*** chr3 do not have any alignment.
*** chr4 do not have any alignment.
*** chr5 do not have any alignment.
*** chr6 do not have any alignment.
*** chr7 do not have any alignment.
*** chr8 do not have any alignment.
*** chr9 do not have any alignment.
*** chrM do not have any alignment.
*** chrY do not have any alignment.

*** 21 pseudoreads are appended to the end.
> 
> alignGRList <- GRangesList(as.list(split(alignGR, seqnames(alignGR))))
> 
> ################ run mainSeekSingleChrom function for HMM inference on a single chromosome ################
> nbhGRList <- lapply(alignGRList, mainSeekSingleChrom, K = 2, binSize=binSize, 
+ 			
+ 			minBinSize = minBinSize, maxBinSize = maxBinSize, runViterbi=FALSE)

**********
chr1:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr10:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr11:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr12:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr13:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr14:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr15:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr16:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr17:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr18:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr19:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr2:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr3:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr4:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr5:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr6:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr7:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr8:
**********

***0. Using predefined binSize of 100000 bp.

**********
chr9:
**********

***0. Using predefined binSize of 100000 bp.

**********
chrM:
**********

***0. Using predefined binSize of 100000 bp.

**********
chrX:
**********

***0. Using predefined binSize of 100000 bp.
***1. Traning NB HMM to derive posterior (and Viterbi state sequence:)

***1. Initializing negative binomial HMM (nbh) with 2 states:


Starting NB mixture model (nbm_em) for K=2 clusters:
Iteration 0:	-11079.189
Iteration 1:	-8139.118
Iteration 2:	-7711.423
Iteration 3:	-7426.964
Iteration 4:	-7269.345
Iteration 5:	-7144.158
Iteration 6:	-7041.076
Iteration 7:	-6951.351
Iteration 8:	-6868.492
Iteration 9:	-6794.903
Iteration 10:	-6736.631

***2. Traininig nbh with forward-backward algorithm:

Iteration 0:	-6690.156
Iteration 1:	-6629.837
Iteration 2:	-6604.046
Iteration 3:	-6583.913
Iteration 4:	-6569.565
Iteration 5:	-6560.399
Iteration 6:	-6554.513

**********
chrY:
**********

***0. Using predefined binSize of 100000 bp.
There were 22 warnings (use warnings() to see them)
> 
> nbhGRList <- GRangesList(nbhGRList)
> 
> alignGalFiltered <- disambiguateMultihits(alignGal, nbhGRList)

2716/23867 multihit reads corresponding to 9903 ambiguous alignments
have been assigned to 2716 unique regions with maximum posterior for the enriched state

> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>