R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Merge clusters and compute all relevant cluster statistics

filterClusters

R Documentation

Merge clusters and compute all relevant cluster statistics

Description

If clusters have been identified using the mini-rank norm algorithm, cluster statistics are computed. In contrast, if the CWT-based cluster identification algorithm was used, clusters are first filtered to retain only those instances containing a wavelet peak and a high-confidence substitution site within their cluster boundaries.

Usage

filterClusters(clusters, highConfSub, coverage, model, genome,
refBase = 'T', minWidth = 12, verbose = TRUE)

Arguments

`clusters`	GRanges object containing individual clusters as identified by the getClusters function
`highConfSub`	GRanges object containing high-confidence substitution sites as returned by the getHighConfSub function
`coverage`	An Rle object containing the coverage at each genomic position as returned by a call to coverage
`model`	List of 5 items containing the estimated mixture model as returned by the fitMixtureModel function
`genome`	BSgenome object of the relevant reference genome (e.g. `Hsapiens` for the human genome hg19)
`refBase`	A character specifying the base in the reference genome for which transitions are experimentally induced (e.g. 4-SU treatment - a standard in PAR-CLIP experiments - induces T to C transitions and hence `refBase = "T"` in this case). Default is "T"
`minWidth`	An integer corresponding to the minimum width of reported clusters. Shorter clusters are extended to `minWidth` starting from the cluster center
`verbose`	Logical, if TRUE processing steps are printed

Value

GRanges object containing the transcriptome-wide identified clusters, having metadata:

`Ntransitions`	The number of high-confidence transitions within the cluster
`MeanCov`	The mean coverage within the cluster
`NbasesInRef`	The number of genomic positions within the cluster corresponding to `refBase`
`CrossLinkEff`	The crosslinking efficiency within the cluster, estimated as the ratio between the number of high-confidence transitions within the cluster and the total number of genomic positions therein corresponding to `refBase`
`Sequence`	The genomic sequence undelying the cluster (plus strand)
`SumLogOdds`	The sum of the log-odd values within the cluster
`RelLogOdds`	The sum of the log-odds divided by the number of high-confidence transitions within the cluster. This variable can be regarded as a proxy for statistical significance and can be therefore used to rank clusters. See Comoglio, Sievers and Paro for details.

Note

1) This function calls the appropriate processing function according to the method used to compute clusters. This information is stored in the metadata(ranges(clusters)) slot as an object of type list.

2) Notice that genome corresponds to the according reference genome matching the organism in which experiments have been carried out. For example genome = Hsapiens is used for the human reference genome (assembly 19), where Hsapiens is provided by BSgenome.Hsapiens.UCSC.hg19.

Author(s)

Federico Comoglio and Cem Sievers

References

Herve Pages, BSgenome: Infrastructure for Biostrings-based genome data packages

Sievers C, Schlumpf T, Sawarkar R, Comoglio F and Paro R. (2012) Mixture models and wavelet transforms reveal high confidence RNA-protein interaction sites in MOV10 PAR-CLIP data, Nucleic Acids Res. 40(20):e160. doi: 10.1093/nar/gks697

Comoglio F, Sievers C and Paro R (2015) Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data, BMC Bioinformatics 16, 32.

Examples


require(BSgenome.Hsapiens.UCSC.hg19)

data( model, package = "wavClusteR" ) 

filename <- system.file( "extdata", "example.bam", package = "wavClusteR" )
example <- readSortedBam( filename = filename )
countTable <- getAllSub( example, minCov = 10, cores = 1 )
highConfSub <- getHighConfSub( countTable, supportStart = 0.2, supportEnd = 0.7, substitution = "TC" )
coverage <- coverage( example )
clusters <- getClusters( highConfSub = highConfSub, 
                         coverage = coverage, 
                         sortedBam = example, 
	                 method = 'mrn', 
	                 cores = 1, 
	                 threshold = 2 ) 

fclusters <- filterClusters( clusters = clusters, 
		             highConfSub = highConfSub, 
        		     coverage = coverage,
			     model = model, 
			     genome = Hsapiens, 
		             refBase = 'T', 
		             minWidth = 12 )
fclusters

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(wavClusteR)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/wavClusteR/filterClusters.Rd_%03d_medium.png", width=480, height=480)
> ### Name: filterClusters
> ### Title: Merge clusters and compute all relevant cluster statistics
> ### Aliases: filterClusters
> ### Keywords: core
> 
> ### ** Examples
> 
> 
> require(BSgenome.Hsapiens.UCSC.hg19)
Loading required package: BSgenome.Hsapiens.UCSC.hg19
Loading required package: BSgenome
Loading required package: rtracklayer
> 
> data( model, package = "wavClusteR" ) 
> 
> filename <- system.file( "extdata", "example.bam", package = "wavClusteR" )
> example <- readSortedBam( filename = filename )
> countTable <- getAllSub( example, minCov = 10, cores = 1 )
Loading required package: doMC
Loading required package: foreach
Loading required package: iterators
Considering substitutions, n = 497, processing in 1 chunks
   chunk #: 1
   considering the + strand
Computing local coverage at substitutions...
   considering the - strand
Computing local coverage at substitutions...
> highConfSub <- getHighConfSub( countTable, supportStart = 0.2, supportEnd = 0.7, substitution = "TC" )
> coverage <- coverage( example )
> clusters <- getClusters( highConfSub = highConfSub, 
+                          coverage = coverage, 
+                          sortedBam = example, 
+ 	                 method = 'mrn', 
+ 	                 cores = 1, 
+ 	                 threshold = 2 ) 
Computing start/end read positions
Number of chromosomes exhibiting high confidence transitions: 1
...Processing = chrX
> 
> fclusters <- filterClusters( clusters = clusters, 
+ 		             highConfSub = highConfSub, 
+         		     coverage = coverage,
+ 			     model = model, 
+ 			     genome = Hsapiens, 
+ 		             refBase = 'T', 
+ 		             minWidth = 12 )
Computing log odds...
Refining cluster sizes...
Combining clusters...
Quantifying transitions within clusters...
Computing statistics...
   |                                                                               |==                                                                    |   3%   |                                                                               |====                                                                  |   5%   |                                                                               |=====                                                                 |   8%   |                                                                               |=======                                                               |  10%   |                                                                               |=========                                                             |  13%   |                                                                               |===========                                                           |  15%   |                                                                               |=============                                                         |  18%   |                                                                               |==============                                                        |  21%   |                                                                               |================                                                      |  23%   |                                                                               |==================                                                    |  26%   |                                                                               |====================                                                  |  28%   |                                                                               |======================                                                |  31%   |                                                                               |=======================                                               |  33%   |                                                                               |=========================                                             |  36%   |                                                                               |===========================                                           |  38%   |                                                                               |=============================                                         |  41%   |                                                                               |===============================                                       |  44%   |                                                                               |================================                                      |  46%   |                                                                               |==================================                                    |  49%   |                                                                               |====================================                                  |  51%   |                                                                               |======================================                                |  54%   |                                                                               |=======================================                               |  56%   |                                                                               |=========================================                             |  59%   |                                                                               |===========================================                           |  62%   |                                                                               |=============================================                         |  64%   |                                                                               |===============================================                       |  67%   |                                                                               |================================================                      |  69%   |                                                                               |==================================================                    |  72%   |                                                                               |====================================================                  |  74%   |                                                                               |======================================================                |  77%   |                                                                               |========================================================              |  79%   |                                                                               |=========================================================             |  82%   |                                                                               |===========================================================           |  85%   |                                                                               |=============================================================         |  87%   |                                                                               |===============================================================       |  90%   |                                                                               |=================================================================     |  92%   |                                                                               |==================================================================    |  95%   |                                                                               |====================================================================  |  97%   |                                                                               |======================================================================| 100%
Consolidating results...
> fclusters
GRanges object with 39 ranges and 7 metadata columns:
       seqnames               ranges strand | Ntransitions   MeanCov
          <Rle>            <IRanges>  <Rle> |    <integer> <numeric>
   [1]     chrX [24002046, 24002068]      - |            1  10.00000
   [2]     chrX [24002318, 24002348]      - |            3  10.06452
   [3]     chrX [24002668, 24002693]      - |            2  13.80769
   [4]     chrX [24002710, 24002729]      - |            1 153.15000
   [5]     chrX [24002732, 24002761]      - |            2  28.66667
   ...      ...                  ...    ... .          ...       ...
  [35]     chrX [24005918, 24005945]      - |            1  14.71429
  [36]     chrX [24005949, 24005971]      - |            2  16.26087
  [37]     chrX [24006167, 24006191]      - |            2  11.04000
  [38]     chrX [24006533, 24006565]      - |            1  33.54545
  [39]     chrX [24007059, 24007083]      - |            3  16.36000
       NbasesInRef CrossLinkEff                          Sequence SumLogOdds
         <integer>    <numeric>                          <factor>  <numeric>
   [1]           6         0.17           AGGATTATTTGACTACTGGCCCA   2.979765
   [2]           9         0.33   TGTGTAATATTGAAGTTATACGGTGTACTGA   8.541695
   [3]          11         0.18        CTTTAAATTATGAATTCTCAAAAGAG   5.514298
   [4]           9         0.11              GATAGCTTATAAACTGAAAT   2.898055
   [5]          13         0.15    CAATTTATATTATAAACTGAAATGTTATGA   5.639963
   ...         ...          ...                               ...        ...
  [35]           9         0.11      CAATGTTAGACCAATGGCTTTGATAGTA   3.010309
  [36]           3         0.67           CTGGTGAGGTTTTTCTTTATATG   5.607608
  [37]           8         0.25         AGTGAGGATGGAATCGCTGTAATGA   5.511650
  [38]           8         0.12 GGAGGTGGAAGATGAGGTGATTCCACCGGTGAT   2.902204
  [39]           8         0.38         TGCTGGTGAACATTCTGAAAGTAAT   8.298195
       RelLogOdds
        <numeric>
   [1]  0.4966274
   [2]  0.9490773
   [3]  0.5012998
   [4]  0.3220061
   [5]  0.4338433
   ...        ...
  [35]  0.3344788
  [36]  1.8692025
  [37]  0.6889563
  [38]  0.3627755
  [39]  1.0372744
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>