Identify clusters containing high-confidence substitutions and resolve
boundaries at high resolution
Description
Identifies clusters using either the mini-rank norm (MRN) algorithm (default
and recommended to achieve highest sensitivity) or via a continuous wavelet
transform (CWT) based approach. The former employs thresholding of
background coverage differences and finds the optimal cluster boundaries by
exhaustively evaluating all putative clusters using a rank-based approach.
This method has higher sensitivity and an approximately 10-fold faster
running time than the CWT-based cluster identification algorithm. The
latter, maintained for compatibility with wavClusteR, computes the
CWT on a 1 kb window of the coverage function centered at a high-confidence
substitution site, and identifies cluster boundaries by extending away from
peak positions.
GRanges object containing high-confidence substitution
sites as returned by the getHighConfSub function
coverage
An Rle object containing the coverage at each genomic
position as returned by a call to coverage
sortedBam
a GRanges object containing all aligned reads, including
read sequence (qseq) and MD tag (MD), as returned by the
readSortedBam function
method
a character, either set to "mrn" or to "cwt" to compute
clusters using the mini-rank norm or the wavelet transform-based algorithm,
respectively. Default is "mrn" (recommended).
cores
integer, the number of cores to be used for parallel
evaluation. Default is 1.
threshold
numeric, if method = "mrn", the difference in
coverage to be considered noise. If not specified, a Gaussian mixture model
is used to learn a threshold from the data. Empirically, 10% of the minimum
coverage required at substitutions (see argument minCov in the
getHighConfSub function) might suffice to provide highly resolved
clusters. However, if minCov is much lower than the median
strand-specific coverage at substitutions m, which can be computed
using summary(elementMetadata(highConfSub)[, 'coverage'])['Median']),
10% of m might represent an optimal choice.
step
numeric, if method = "cwt", step size of window shift. If
two high-confidence substitution sites are located within a distance less
than step, the wavelet transform is computed only once. Default: 1,
i.e. each high-confidence substitution site is considered independently.
snr
numeric, if method = "cwt", signal-to-noise ratio
controlling the peak calling as performed by wavCWTPeaks implemented
in the wmtsa package. Default: 3.
Value
GRanges object containing the identified cluster boundaries.
Note
Clusters returned by this function need to be further merged by the
function filterClusters, which also computes all relevant cluster
statistics.
Sievers C, Schlumpf T, Sawarkar R, Comoglio F and Paro R. (2012) Mixture
models and wavelet transforms reveal high confidence RNA-protein interaction
sites in MOV10 PAR-CLIP data, Nucleic Acids Res. 40(20):e160. doi:
10.1093/nar/gks697
Comoglio F, Sievers C and Paro R (2015) Sensitive and highly resolved identification
of RNA-protein interaction sites in PAR-CLIP data, BMC Bioinformatics 16, 32.
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(wavClusteR)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/wavClusteR/getClusters.Rd_%03d_medium.png", width=480, height=480)
> ### Name: getClusters
> ### Title: Identify clusters containing high-confidence substitutions and
> ### resolve boundaries at high resolution
> ### Aliases: getClusters
> ### Keywords: core
>
> ### ** Examples
>
>
> filename <- system.file( "extdata", "example.bam", package = "wavClusteR" )
> example <- readSortedBam( filename = filename )
> countTable <- getAllSub( example, minCov = 10, cores = 1 )
Loading required package: doMC
Loading required package: foreach
Loading required package: iterators
Considering substitutions, n = 497, processing in 1 chunks
chunk #: 1
considering the + strand
Computing local coverage at substitutions...
considering the - strand
Computing local coverage at substitutions...
> highConfSub <- getHighConfSub( countTable, supportStart = 0.2, supportEnd = 0.7, substitution = "TC" )
> coverage <- coverage( example )
> clusters <- getClusters( highConfSub = highConfSub,
+ coverage = coverage,
+ sortedBam = example,
+ method = 'mrn',
+ cores = 1,
+ threshold = 2 )
Computing start/end read positions
Number of chromosomes exhibiting high confidence transitions: 1
...Processing = chrX
>
>
>
>
>
>
> dev.off()
null device
1
>