R: Merge clusters and compute all relevant cluster statistics
filterClusters
R Documentation
Merge clusters and compute all relevant cluster statistics
Description
If clusters have been identified using the mini-rank norm algorithm, cluster
statistics are computed. In contrast, if the CWT-based cluster
identification algorithm was used, clusters are first filtered to retain
only those instances containing a wavelet peak and a high-confidence
substitution site within their cluster boundaries.
GRanges object containing individual clusters as identified
by the getClusters function
highConfSub
GRanges object containing high-confidence substitution
sites as returned by the getHighConfSub function
coverage
An Rle object containing the coverage at each genomic
position as returned by a call to coverage
model
List of 5 items containing the estimated mixture model as
returned by the fitMixtureModel function
genome
BSgenome object of the relevant reference genome (e.g.
Hsapiens for the human genome hg19)
refBase
A character specifying the base in the reference genome for
which transitions are experimentally induced (e.g. 4-SU treatment - a
standard in PAR-CLIP experiments - induces T to C transitions and hence
refBase = "T" in this case). Default is "T"
minWidth
An integer corresponding to the minimum width of reported
clusters. Shorter clusters are extended to minWidth starting from the
cluster center
verbose
Logical, if TRUE processing steps are printed
Value
GRanges object containing the transcriptome-wide identified
clusters, having metadata:
Ntransitions
The number of high-confidence
transitions within the cluster
MeanCov
The mean coverage within the
cluster
NbasesInRef
The number of genomic positions within the
cluster corresponding to refBase
CrossLinkEff
The
crosslinking efficiency within the cluster, estimated as the ratio between
the number of high-confidence transitions within the cluster and the total
number of genomic positions therein corresponding to refBase
Sequence
The genomic sequence undelying the cluster (plus strand)
SumLogOdds
The sum of the log-odd values within the cluster
RelLogOdds
The sum of the log-odds divided by the number of
high-confidence transitions within the cluster. This variable can be
regarded as a proxy for statistical significance and can be therefore used
to rank clusters. See Comoglio, Sievers and Paro for details.
Note
1) This function calls the appropriate processing function according
to the method used to compute clusters. This information is stored in the
metadata(ranges(clusters)) slot as an object of type list.
2) Notice that genome corresponds to the according reference genome
matching the organism in which experiments have been carried out. For
example genome = Hsapiens is used for the human reference genome
(assembly 19), where Hsapiens is provided by
BSgenome.Hsapiens.UCSC.hg19.
Author(s)
Federico Comoglio and Cem Sievers
References
Herve Pages, BSgenome: Infrastructure for Biostrings-based
genome data packages
Sievers C, Schlumpf T, Sawarkar R, Comoglio F and Paro R. (2012) Mixture
models and wavelet transforms reveal high confidence RNA-protein interaction
sites in MOV10 PAR-CLIP data, Nucleic Acids Res. 40(20):e160. doi:
10.1093/nar/gks697
Comoglio F, Sievers C and Paro R (2015) Sensitive and highly resolved identification
of RNA-protein interaction sites in PAR-CLIP data, BMC Bioinformatics 16, 32.