R: Summarize the reads assigned to a SplicingGraphs object
countReads-methods
R Documentation
Summarize the reads assigned to a SplicingGraphs object
Description
countReads counts the reads assigned to a SplicingGraphs object.
The counting can be done by splicing graph edge, reduced splicing
graph edge, transcript, or gene.
reportReads is similar to countReads but returns right before
the final counting step, that is, the returned DataFrame contains the reads
instead of their counts.
Can be "sgedge", "rsgedge", "tx", or "gene".
Specifies the level of resolution that summarization should be
performed at. See Details section below.
Details
Levels of resolution
countReads and reportReads allow summarization of the reads
at different levels of resolution. The level of resolution is determined
by the type of feature that one chooses via the by argument.
The supported resolutions are (from highest to lowest resolution):
by="sgedge" for summarization at the splicing graph edge
level (i.e. at the exons/intron level);
by="rsgedge" for summarization at the reduced
splicing graph edge level;
by="tx" for summarization at the transcript level;
by="gene" for summarization at the gene level.
Relationship between levels of resolution
There is a parent-child relationship between the features
corresponding to a given level of resolution (the parent features)
and those corresponding to a higher level of resolution (the child
features).
For example, in the case of the 2 first levels of resolution listed
above, the parent-child relationship is the following: the parent
features are the reduced splicing graph edges, the child features
are the splicing graph edges, and each parent feature is obtained by
merging one or more child features together.
Similarly, transcripts can be seen as parent features of reduced
splicing graph edges, and genes as parent features of transcripts.
Note that, the rsgedge/sgedge and gene/tx relationships are one-to-many,
but the tx/rsgedge relationship is many-to-many because a given edge can
belong to more than one transcript.
Finally the parent-child relationships between 2 arbitrary levels of
resolution is defined by combining the relationships between consecutive
levels. All possible parent-child relationships are summarized in the
following table:
An important distinction needs to be made between a read that hits a
given feature multiple times and a read that hits more than one feature.
If the former, the read is counted/reported only once for that feature.
For example, when summarizing at the transcript level, a read is
counted/reported only once for a given transcript, even if that read
hits more than one splicing graph edge (or reduced splicing graph
edge) associated with that transcript.
If the latter, the read is said to be ambiguous. An ambiguous read
is currently counted/reported for each feature where it has a hit.
This is a temporary situation: in the near future the user will be offered
options to handle ambiguous reads in different ways.
Ambiguous reads and levels of resolution
A read might be ambiguous at one level of resolution but not at the other.
Also the number of ambiguous reads is typically affected by the level
of resolution. However, even though higher resolution generally means
more ambiguous reads, this is only true when the switch from one level
of resolution to the other implies a parent-child relationship between
features that is one-to-many.
So, based on the above table, this is always true, except when
switching from using by="tx" to using by="sgedge" or
by="rsgedge". In those cases, the switch can produce more
ambiguities but it can also produce less.
Value
A DataFrame object with one row per:
unique splicing graph edge, if by="sgedge";
unique reduced splicing graph edge, if by="rsgedge";
transcript if by="tx";
gene if by="gene".
And with one column per sample (containing the counts for that sample for
countReads, and the reads for that sample for reportReads),
plus the two following left columns:
if by="sgedge": "sgedge_id", containing the
global splicing graph edge ids, and "ex_or_in",
containing the type of edge (exon or intron);
if by="rsgedge": "rsgedge_id", containing the
global reduced splicing graph edge ids, and
"ex_or_in", containing the type of edge (exon, intron,
or mixed);
if by="tx": "tx_id" and "gene_id";
if by="gene": "gene_id" and "tx_id".
For countReads, each column of counts is of type integer and
is named after the corresponding sample.
For reportReads, each column of reads is a CharacterList object
and its name is the name of the corresponding sample with the
".hits" suffix added to it.
In both cases, the name of the sample is the name that was passed to
assignReads when the reads of a given sample were initially
assigned. See ?assignReads for more information.
Author(s)
H. Pages
See Also
This man page is part of the SplicingGraphs package.
Please see ?`SplicingGraphs-package` for an overview of the
package and for an index of its man pages.
Examples
## ---------------------------------------------------------------------
## 1. Make SplicingGraphs object 'sg' from toy gene model and assign toy
## reads to it (see '?assignReads')
## ---------------------------------------------------------------------
example(assignReads)
## ---------------------------------------------------------------------
## 2. Summarize the reads by splicing graph edge
## ---------------------------------------------------------------------
countReads(sg)
reportReads(sg)
## ---------------------------------------------------------------------
## 3. Summarize the reads by reduced splicing graph edge
## ---------------------------------------------------------------------
countReads(sg, by="rsgedge")
reportReads(sg, by="rsgedge")
## ---------------------------------------------------------------------
## 4. Summarize the reads by transcript
## ---------------------------------------------------------------------
countReads(sg, by="tx")
reportReads(sg, by="tx")
## ---------------------------------------------------------------------
## 5. Summarize the reads by gene
## ---------------------------------------------------------------------
countReads(sg, by="gene")
reportReads(sg, by="gene")
## ---------------------------------------------------------------------
## 6. A close look at ambiguous reads
## ---------------------------------------------------------------------
resolutions <- c("sgedge", "rsgedge", "tx", "gene")
reported_reads <- lapply(resolutions,
function(by) {
reported_reads <- reportReads(sg, by=by)
unlist(reported_reads$TOYREADS.hits)
})
## The set of reported reads is the same at all levels of resolution:
unique_reported_reads <- lapply(reported_reads, unique)
stopifnot(identical(unique_reported_reads,
rep(unique_reported_reads[1], 4)))
## Extract ambigous reads for each level of resolution:
ambiguous_reads <- lapply(reported_reads,
function(x) unique(x[duplicated(x)]))
names(ambiguous_reads) <- resolutions
ambiguous_reads
## Reads that are ambiguous at the "rsgedge" level must also be
## ambiguous at the "sgedge" level:
stopifnot(all(ambiguous_reads$rsgedge %in% ambiguous_reads$sgedge))
## However, there is no reason why reads that are ambiguous at the
## "tx" level should also be ambiguous at the "sgedge" or "rsgedge"
## level!
## ---------------------------------------------------------------------
## 7. Remove the reads from 'sg'.
## ---------------------------------------------------------------------
sg <- removeReads(sg)
countReads(sg)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(SplicingGraphs)
Loading required package: GenomicFeatures
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: Biostrings
Loading required package: XVector
Loading required package: Rsamtools
Loading required package: Rgraphviz
Loading required package: graph
Attaching package: 'graph'
The following object is masked from 'package:Biostrings':
complement
Loading required package: grid
Attaching package: 'Rgraphviz'
The following objects are masked from 'package:IRanges':
from, to
The following objects are masked from 'package:S4Vectors':
from, to
Warning messages:
1: replacing previous import 'IRanges::from' by 'Rgraphviz::from' when loading 'SplicingGraphs'
2: replacing previous import 'IRanges::to' by 'Rgraphviz::to' when loading 'SplicingGraphs'
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/SplicingGraphs/countReads-methods.Rd_%03d_medium.png", width=480, height=480)
> ### Name: countReads-methods
> ### Title: Summarize the reads assigned to a SplicingGraphs object
> ### Aliases: countReads-methods countReads countReads,SplicingGraphs-method
>
> ### ** Examples
>
> ## ---------------------------------------------------------------------
> ## 1. Make SplicingGraphs object 'sg' from toy gene model and assign toy
> ## reads to it (see '?assignReads')
> ## ---------------------------------------------------------------------
> example(assignReads)
assgnR> ## ---------------------------------------------------------------------
assgnR> ## 1. Make SplicingGraphs object 'sg' from toy gene model (see
assgnR> ## '?SplicingGraphs')
assgnR> ## ---------------------------------------------------------------------
assgnR> example(SplicingGraphs)
SplcnG> ## ---------------------------------------------------------------------
SplcnG> ## 1. Load a toy gene model as a TxDb object
SplcnG> ## ---------------------------------------------------------------------
SplcnG>
SplcnG> library(GenomicFeatures)
SplcnG> suppressWarnings(
SplcnG+ toy_genes_txdb <- makeTxDbFromGFF(toy_genes_gff())
SplcnG+ )
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
SplcnG> ## ---------------------------------------------------------------------
SplcnG> ## 2. Compute all the splicing graphs (1 graph per gene) and return them
SplcnG> ## in a SplicingGraphs object
SplcnG> ## ---------------------------------------------------------------------
SplcnG>
SplcnG> ## Extract the exons grouped by transcript:
SplcnG> ex_by_tx <- exonsBy(toy_genes_txdb, by="tx", use.names=TRUE)
SplcnG> ## Extract the transcripts grouped by gene:
SplcnG> tx_by_gn <- transcriptsBy(toy_genes_txdb, by="gene")
SplcnG> sg <- SplicingGraphs(ex_by_tx, tx_by_gn)
SplcnG> sg
SplicingGraphs object with 5 gene(s) and 13 transcript(s)
SplcnG> ## Alternatively 'sg' can be constructed directly from the TxDb
SplcnG> ## object:
SplcnG> sg2 <- SplicingGraphs(toy_genes_txdb) # same as 'sg'
SplcnG> sg2
SplicingGraphs object with 5 gene(s) and 13 transcript(s)
SplcnG> ## Note that because SplicingGraphs objects have a slot that is an
SplcnG> ## environment (for caching the bubbles), they cannot be compared with
SplcnG> ## 'identical()' (will always return FALSE). 'all.equal()' should be
SplcnG> ## used instead:
SplcnG> stopifnot(isTRUE(all.equal(sg2, sg)))
SplcnG> ## 'sg' has 1 element per gene and 'names(sg)' gives the gene ids:
SplcnG> length(sg)
[1] 5
SplcnG> names(sg)
[1] "geneA" "geneB" "geneC" "geneD" "geneE"
SplcnG> ## ---------------------------------------------------------------------
SplcnG> ## 3. Basic manipulation of a SplicingGraphs object
SplcnG> ## ---------------------------------------------------------------------
SplcnG>
SplcnG> ## Basic accessors:
SplcnG> seqnames(sg)
geneA geneB geneC geneD geneE
chrX chrX chrX chrX chrX
Levels: chrX
SplcnG> strand(sg)
geneA geneB geneC geneD geneE
+ - + + +
Levels: + - *
SplcnG> seqinfo(sg)
Seqinfo object with 1 sequence from an unspecified genome; no seqlengths:
seqnames seqlengths isCircular genome
chrX NA NA <NA>
SplcnG> ## Number of transcripts per gene:
SplcnG> elementNROWS(sg)
geneA geneB geneC geneD geneE
2 2 3 4 2
SplcnG> ## The transcripts of a given gene can be extracted with [[. The result
SplcnG> ## is an *unnamed* GRangesList object containing the exons grouped by
SplcnG> ## transcript:
SplcnG> sg[["geneD"]]
GRangesList object of length 4:
[[1]]
GRanges object with 2 ranges and 5 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank start_SSid
<Rle> <IRanges> <Rle> | <integer> <character> <integer> <integer>
[1] chrX [601, 630] + | 10 Dx2 1 1
[2] chrX [666, 675] + | 12 Dx4 2 5
end_SSid
<integer>
[1] 3
[2] 6
[[2]]
GRanges object with 2 ranges and 5 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank start_SSid
[1] chrX [601, 620] + | 9 Dx1 1 1
[2] chrX [651, 700] + | 11 Dx3 2 4
end_SSid
[1] 2
[2] 8
[[3]]
GRanges object with 3 ranges and 5 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank start_SSid
[1] chrX [601, 620] + | 9 Dx1 1 1
[2] chrX [666, 675] + | 12 Dx4 2 5
[3] chrX [691, 700] + | 13 Dx5 3 7
end_SSid
[1] 2
[2] 6
[3] 8
...
<1 more element>
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
SplcnG> ## See '?plotTranscripts' for how to plot those transcripts.
SplcnG>
SplcnG> ## The transcripts of all the genes can be extracted with unlist(). The
SplcnG> ## result is a *named* GRangesList object containing the exons grouped
SplcnG> ## by transcript. The names on the object are the gene ids:
SplcnG> ex_by_tx <- unlist(sg)
SplcnG> ex_by_tx
GRangesList object of length 13:
$geneA
GRanges object with 1 range and 5 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank start_SSid
<Rle> <IRanges> <Rle> | <integer> <character> <integer> <integer>
[1] chrX [11, 50] + | 2 Ax2 1 1
end_SSid
<integer>
[1] 3
$geneA
GRanges object with 2 ranges and 5 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank start_SSid
[1] chrX [11, 40] + | 1 Ax1 1 1
[2] chrX [71, 100] + | 3 Ax3 2 4
end_SSid
[1] 2
[2] 5
$geneB
GRanges object with 2 ranges and 5 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank start_SSid
[1] chrX [251, 300] - | 23 Bx1 1 3
[2] chrX [201, 230] - | 20 Bx2 2 6
end_SSid
[1] 1
[2] 4
...
<10 more elements>
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
assgnR> sg
SplicingGraphs object with 5 gene(s) and 13 transcript(s)
assgnR> ## 'sg' has 1 element per gene and 'names(sg)' gives the gene ids.
assgnR> names(sg)
[1] "geneA" "geneB" "geneC" "geneD" "geneE"
assgnR> ## ---------------------------------------------------------------------
assgnR> ## 2. Load toy reads
assgnR> ## ---------------------------------------------------------------------
assgnR> ## Load toy reads (single-end) from a BAM file. We filter out secondary
assgnR> ## alignments, reads not passing quality controls, and PCR or optical
assgnR> ## duplicates (see ?scanBamFlag in the Rsamtools package for more
assgnR> ## information):
assgnR> flag0 <- scanBamFlag(isSecondaryAlignment=FALSE,
assgnR+ isNotPassingQualityControls=FALSE,
assgnR+ isDuplicate=FALSE)
assgnR> param0 <- ScanBamParam(flag=flag0)
assgnR> gal <- readGAlignments(toy_reads_bam(), use.names=TRUE, param=param0)
assgnR> gal
GAlignments object with 42 alignments and 0 metadata columns:
seqnames strand cigar qwidth start end width
<Rle> <Rle> <character> <integer> <integer> <integer> <integer>
r001 chrX - 10M 10 12 21 10
r002 chrX + 7M15N3M 10 15 39 25
r003 chrX + 10M 10 21 30 10
r004 chrX - 10M 10 26 35 10
r005 chrX + 10M 10 31 40 10
... ... ... ... ... ... ... ...
r038 chrX - 10M 10 278 287 10
r039 chrX + 10M 10 280 289 10
r040 chrX + 10M 10 285 294 10
r041 chrX - 5M12N5M 10 288 309 22
r042 chrX + 10M 10 290 299 10
njunc
<integer>
r001 0
r002 1
r003 0
r004 0
r005 0
... ...
r038 0
r039 0
r040 0
r041 1
r042 0
-------
seqinfo: 1 sequence from an unspecified genome
assgnR> ## ---------------------------------------------------------------------
assgnR> ## 3. Assign the reads to the exons and introns in 'sg'
assgnR> ## ---------------------------------------------------------------------
assgnR> ## The same read can be assigned to more than 1 exon or intron (e.g. a
assgnR> ## junction read with 1 gap can be assigned to 2 exons and 1 intron).
assgnR> sg <- assignReads(sg, gal, sample.name="TOYREADS")
assgnR> ## See the assignments to the splicing graph edges.
assgnR> edge_by_tx <- sgedgesByTranscript(sg, with.hits.mcols=TRUE)
assgnR> edge_data <- mcols(unlist(edge_by_tx))
assgnR> colnames(edge_data)
[1] "from" "to" "sgedge_id" "ex_or_in"
[5] "tx_id" "TOYREADS.hits"
assgnR> head(edge_data)
DataFrame with 6 rows and 6 columns
from to sgedge_id ex_or_in tx_id TOYREADS.hits
<character> <character> <character> <factor> <character> <CharacterList>
1 1 3 geneA:1,3 ex A1 r001,r003,r004,...
2 1 2 geneA:1,2 ex A2 r001,r003,r004,...
3 2 4 geneA:2,4 in A2 r006
4 4 5 geneA:4,5 ex A2 r006,r008,r009,...
5 1 3 geneB:1,3 ex B1 r027,r028,r029,...
6 3 4 geneB:3,4 in B1 r027,r028,r029
assgnR> edge_data[ , c("sgedge_id", "TOYREADS.hits")]
DataFrame with 51 rows and 2 columns
sgedge_id TOYREADS.hits
<character> <CharacterList>
1 geneA:1,3 r001,r003,r004,...
2 geneA:1,2 r001,r003,r004,...
3 geneA:2,4 r006
4 geneA:4,5 r006,r008,r009,...
5 geneB:1,3 r027,r028,r029,...
... ... ...
47 geneE:5,6
48 geneE:6,7
49 geneE:7,8
50 geneE:8,11
51 geneE:11,12
assgnR> edge_by_gene <- sgedgesByGene(sg, with.hits.mcols=TRUE)
assgnR> mcols(unlist(edge_by_gene))
DataFrame with 41 rows and 6 columns
from to sgedge_id ex_or_in tx_id
<character> <character> <character> <factor> <CharacterList>
1 1 3 geneA:1,3 ex A1
2 1 2 geneA:1,2 ex A2
3 2 4 geneA:2,4 in A2
4 4 5 geneA:4,5 ex A2
5 1 3 geneB:1,3 ex B1
... ... ... ... ... ...
37 11 12 geneE:11,12 ex E1,E2
38 2 5 geneE:2,5 in E2
39 6 7 geneE:6,7 in E2
40 7 8 geneE:7,8 ex E2
41 8 11 geneE:8,11 in E2
TOYREADS.hits
<CharacterList>
1 r001,r003,r004,...
2 r001,r003,r004,...
3 r006
4 r006,r008,r009,...
5 r027,r028,r029,...
... ...
37
38
39
40
41
assgnR> ## See the assignments to the reduced splicing graph edges.
assgnR> redge_by_gene <- rsgedgesByGene(sg, with.hits.mcols=TRUE)
assgnR> mcols(unlist(redge_by_gene))
DataFrame with 28 rows and 6 columns
from to rsgedge_id ex_or_in tx_id
<character> <character> <character> <factor> <CharacterList>
1 1 3 geneA:1,3 ex A1
2 1 5 geneA:1,2,4,5 mixed A2
3 1 3 geneB:1,3 ex B1
4 3 4 geneB:3,4 in B1,B2
5 4 6 geneB:4,6 ex B1
... ... ... ... ... ...
24 5 6 geneE:5,6 ex E1,E2
25 6 11 geneE:6,9,10,11 mixed E1
26 11 12 geneE:11,12 ex E1,E2
27 2 5 geneE:2,5 in E2
28 6 11 geneE:6,7,8,11 mixed E2
TOYREADS.hits
<CharacterList>
1 r001,r003,r004,...
2 r001,r003,r004,...
3 r027,r028,r029,...
4 r027,r028,r029
5 r014,r015,r016,...
... ...
24
25
26
27
28
assgnR> ## ---------------------------------------------------------------------
assgnR> ## 4. Summarize the reads assigned to 'sg' and eventually remove them
assgnR> ## ---------------------------------------------------------------------
assgnR> ## See '?countReads'.
assgnR>
assgnR>
assgnR>
>
> ## ---------------------------------------------------------------------
> ## 2. Summarize the reads by splicing graph edge
> ## ---------------------------------------------------------------------
> countReads(sg)
DataFrame with 41 rows and 3 columns
sgedge_id ex_or_in TOYREADS
<character> <factor> <integer>
1 geneA:1,3 ex 5
2 geneA:1,2 ex 5
3 geneA:2,4 in 1
4 geneA:4,5 ex 5
5 geneB:1,3 ex 14
... ... ... ...
37 geneE:11,12 ex 0
38 geneE:2,5 in 0
39 geneE:6,7 in 0
40 geneE:7,8 ex 0
41 geneE:8,11 in 0
> reportReads(sg)
DataFrame with 41 rows and 3 columns
sgedge_id ex_or_in TOYREADS.hits
<character> <factor> <CharacterList>
1 geneA:1,3 ex r001,r003,r004,...
2 geneA:1,2 ex r001,r003,r004,...
3 geneA:2,4 in r006
4 geneA:4,5 ex r006,r008,r009,...
5 geneB:1,3 ex r027,r028,r029,...
... ... ... ...
37 geneE:11,12 ex
38 geneE:2,5 in
39 geneE:6,7 in
40 geneE:7,8 ex
41 geneE:8,11 in
>
> ## ---------------------------------------------------------------------
> ## 3. Summarize the reads by reduced splicing graph edge
> ## ---------------------------------------------------------------------
> countReads(sg, by="rsgedge")
DataFrame with 28 rows and 3 columns
rsgedge_id ex_or_in TOYREADS
<character> <factor> <integer>
1 geneA:1,3 ex 5
2 geneA:1,2,4,5 mixed 9
3 geneB:1,3 ex 14
4 geneB:3,4 in 3
5 geneB:4,6 ex 13
... ... ... ...
24 geneE:5,6 ex 0
25 geneE:6,9,10,11 mixed 0
26 geneE:11,12 ex 0
27 geneE:2,5 in 0
28 geneE:6,7,8,11 mixed 0
> reportReads(sg, by="rsgedge")
DataFrame with 28 rows and 3 columns
rsgedge_id ex_or_in TOYREADS.hits
<character> <factor> <CharacterList>
1 geneA:1,3 ex r001,r003,r004,...
2 geneA:1,2,4,5 mixed r001,r003,r004,...
3 geneB:1,3 ex r027,r028,r029,...
4 geneB:3,4 in r027,r028,r029
5 geneB:4,6 ex r014,r015,r016,...
... ... ... ...
24 geneE:5,6 ex
25 geneE:6,9,10,11 mixed
26 geneE:11,12 ex
27 geneE:2,5 in
28 geneE:6,7,8,11 mixed
>
> ## ---------------------------------------------------------------------
> ## 4. Summarize the reads by transcript
> ## ---------------------------------------------------------------------
> countReads(sg, by="tx")
DataFrame with 13 rows and 3 columns
tx_id gene_id TOYREADS
<character> <character> <integer>
1 A1 geneA 5
2 A2 geneA 9
3 B1 geneB 24
4 B2 geneB 11
5 C1 geneC 0
... ... ... ...
9 D1 geneD 0
10 D2 geneD 0
11 D3 geneD 0
12 E1 geneE 0
13 E2 geneE 0
> reportReads(sg, by="tx")
DataFrame with 13 rows and 3 columns
tx_id gene_id TOYREADS.hits
<character> <character> <CharacterList>
1 A1 geneA r001,r003,r004,...
2 A2 geneA r001,r003,r004,...
3 B1 geneB r027,r028,r029,...
4 B2 geneB r027,r028,r029,...
5 C1 geneC
... ... ... ...
9 D1 geneD
10 D2 geneD
11 D3 geneD
12 E1 geneE
13 E2 geneE
>
> ## ---------------------------------------------------------------------
> ## 5. Summarize the reads by gene
> ## ---------------------------------------------------------------------
> countReads(sg, by="gene")
DataFrame with 5 rows and 3 columns
gene_id tx_id TOYREADS
<character> <CharacterList> <integer>
geneA geneA A1,A2 10
geneB geneB B1,B2 24
geneC geneC C1,C2,C3 0
geneD geneD D4,D3,D2,... 0
geneE geneE E1,E2 0
> reportReads(sg, by="gene")
DataFrame with 5 rows and 3 columns
gene_id tx_id TOYREADS.hits
<character> <CharacterList> <CharacterList>
geneA geneA A1,A2 r001,r003,r004,...
geneB geneB B1,B2 r027,r028,r029,...
geneC geneC C1,C2,C3
geneD geneD D4,D3,D2,...
geneE geneE E1,E2
>
> ## ---------------------------------------------------------------------
> ## 6. A close look at ambiguous reads
> ## ---------------------------------------------------------------------
> resolutions <- c("sgedge", "rsgedge", "tx", "gene")
>
> reported_reads <- lapply(resolutions,
+ function(by) {
+ reported_reads <- reportReads(sg, by=by)
+ unlist(reported_reads$TOYREADS.hits)
+ })
>
> ## The set of reported reads is the same at all levels of resolution:
> unique_reported_reads <- lapply(reported_reads, unique)
> stopifnot(identical(unique_reported_reads,
+ rep(unique_reported_reads[1], 4)))
>
> ## Extract ambigous reads for each level of resolution:
> ambiguous_reads <- lapply(reported_reads,
+ function(x) unique(x[duplicated(x)]))
> names(ambiguous_reads) <- resolutions
> ambiguous_reads
$sgedge
[1] "r001" "r003" "r004" "r005" "r006" "r027" "r028" "r029" "r031" "r032"
[11] "r033" "r034" "r035" "r021" "r022" "r023"
$rsgedge
[1] "r001" "r003" "r004" "r005" "r027" "r028" "r029" "r031" "r032" "r033"
[11] "r034" "r035" "r021" "r022" "r023"
$tx
[1] "r001" "r003" "r004" "r005" "r027" "r028" "r029" "r031" "r032" "r033"
[11] "r034" "r035" "r021" "r022" "r023"
$gene
character(0)
>
> ## Reads that are ambiguous at the "rsgedge" level must also be
> ## ambiguous at the "sgedge" level:
> stopifnot(all(ambiguous_reads$rsgedge %in% ambiguous_reads$sgedge))
>
> ## However, there is no reason why reads that are ambiguous at the
> ## "tx" level should also be ambiguous at the "sgedge" or "rsgedge"
> ## level!
>
> ## ---------------------------------------------------------------------
> ## 7. Remove the reads from 'sg'.
> ## ---------------------------------------------------------------------
> sg <- removeReads(sg)
> countReads(sg)
DataFrame with 41 rows and 2 columns
sgedge_id ex_or_in
<character> <factor>
1 geneA:1,3 ex
2 geneA:1,2 ex
3 geneA:2,4 in
4 geneA:4,5 ex
5 geneB:1,3 ex
... ... ...
37 geneE:11,12 ex
38 geneE:2,5 in
39 geneE:6,7 in
40 geneE:7,8 ex
41 geneE:8,11 in
>
>
>
>
>
> dev.off()
null device
1
>