R: Extract the transcript paths of a splicing graph
txpath-methods
R Documentation
Extract the transcript paths of a splicing graph
Description
txpath extracts the transcript paths of the splicing graph
of a given gene from a SplicingGraphs object.
Usage
txpath(x, as.matrix=FALSE)
## Related utility:
txweight(x)
txweight(x) <- value
Arguments
x
A SplicingGraphs object of length 1
or a GRangesList object for txpath.
A SplicingGraphs object for txweight.
as.matrix
TODO
value
A numeric vector containing the weights to assign to each
transcript in x.
Details
TODO
Value
A named list-like object with one list element per transcript in the gene.
Each list element is an integer vector that describes the path
of the transcript i.e. the Splicing Site ids that it goes thru.
Author(s)
H. Pages
See Also
This man page is part of the SplicingGraphs package.
Please see ?`SplicingGraphs-package` for an overview of the
package and for an index of its man pages.
Other topics related to this man page and documented in other packages:
The GRangesList class defined in the
GenomicRanges package.
The GAlignments and
GAlignmentPairs classes
defined in the GenomicAlignments package.
findOverlaps-methods in the
GenomicRanges package.
encodeOverlaps-methods in the
GenomicAlignments package.
The ScanBamParam function defined in the
Rsamtools package.
Examples
## ---------------------------------------------------------------------
## 1. Make SplicingGraphs object 'sg' from toy gene model (see
## '?SplicingGraphs')
## ---------------------------------------------------------------------
example(SplicingGraphs)
sg
## 'sg' has 1 element per gene and 'names(sg)' gives the gene ids.
names(sg)
## ---------------------------------------------------------------------
## 2. txpath()
## ---------------------------------------------------------------------
## Note that the list elements in the returned IntegerList object
## always consist of an even number of Splicing Site ids in ascending
## order.
txpath(sg["geneB"])
txpath(sg["geneD"])
strand(sg)
txpath(sg["geneD"], as.matrix=TRUE) # splicing matrix
## ---------------------------------------------------------------------
## 3. txweight()
## ---------------------------------------------------------------------
txweight(sg)
plot(sg["geneD"])
txweight(sg) <- 5
txweight(sg)
plot(sg["geneD"]) # FIXME: Edges not rendered with correct width!
plot(sgraph(sg["geneD"], as.igraph=TRUE)) # need to use this for now
txweight(sg)[8:11] <- 5 * (4:1)
txweight(sg)
plot(sgraph(sg["geneD"], tx_id.as.edge.label=TRUE, as.igraph=TRUE))
## ---------------------------------------------------------------------
## 4. An advanced example
## ---------------------------------------------------------------------
## [TODO: Counting "unambiguous compatible hits" per transcript should be
## supported by countReads(). Simplify the code below when countReads()
## supports this.]
## Here we show how to find "unambiguous compatible hits" between a set
## of RNA-seq reads and a set of transcripts, that is, hits that are
## compatible with the splicing of exactly 1 transcript. Then we set the
## transcript weights based on the number of unambiguous compatible hits
## they received and we finally plot some splicing graphs that show
## the weighted transcripts.
## Note that the code we use to find the unambiguous compatible hits
## uses findOverlaps() and encodeOverlaps() defined in the IRanges and
## GenomicRanges packages. It only requires that the transcripts are
## represented as a GRangesList object and the reads as a GAlignments
## (single-end) or GAlignmentPairs (paired-end) object, and therefore is
## not specific to SplicingGraphs.
## First we load toy reads (single-end) from a BAM file. We filter out
## secondary alignments, reads not passing quality controls, and PCR or
## optical duplicates (see ?scanBamFlag in the Rsamtools package for
## more information):
flag0 <- scanBamFlag(isSecondaryAlignment=FALSE,
isNotPassingQualityControls=FALSE,
isDuplicate=FALSE)
param0 <- ScanBamParam(flag=flag0)
gal <- readGAlignments(toy_reads_bam(), use.names=TRUE, param=param0)
gal
## Put the reads in a GRangesList object:
grl <- grglist(gal, order.as.in.query=TRUE)
## Put the transcripts in a GRangesList object (made of exons grouped
## by transcript):
ex_by_tx <- unlist(sg)
## Most of the times the RNA-seq protocol is unstranded so the strand
## reported in the BAM file (and propagated to 'grl') for each alignment
## is meaningless. Thus we should call findOverlaps() with
## 'ignore.strand=TRUE':
ov0 <- findOverlaps(grl, ex_by_tx, ignore.strand=TRUE)
## Encode the overlaps (this compare the fragmentation of the reads with
## the splicing of the transcripts):
ovenc0 <- encodeOverlaps(grl, ex_by_tx, hits=ov0,
flip.query.if.wrong.strand=TRUE)
ov0_is_compat <- isCompatibleWithSplicing(ovenc0)
## Keep compatible overlaps only:
ov1 <- ov0[ov0_is_compat]
## Only keep overlaps that are compatible with exactly 1 transcript:
ov2 <- ov1[queryHits(ov1) %in% which(countQueryHits(ov1) == 1L)]
nhit_per_tx <- countSubjectHits(ov2)
names(nhit_per_tx) <- names(txweight(sg))
nhit_per_tx
txweight(sg) <- 2 * nhit_per_tx
plot(sgraph(sg["geneA"], tx_id.as.edge.label=TRUE, as.igraph=TRUE))
plot(sgraph(sg["geneB"], tx_id.as.edge.label=TRUE, as.igraph=TRUE))
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(SplicingGraphs)
Loading required package: GenomicFeatures
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: Biostrings
Loading required package: XVector
Loading required package: Rsamtools
Loading required package: Rgraphviz
Loading required package: graph
Attaching package: 'graph'
The following object is masked from 'package:Biostrings':
complement
Loading required package: grid
Attaching package: 'Rgraphviz'
The following objects are masked from 'package:IRanges':
from, to
The following objects are masked from 'package:S4Vectors':
from, to
Warning messages:
1: replacing previous import 'IRanges::from' by 'Rgraphviz::from' when loading 'SplicingGraphs'
2: replacing previous import 'IRanges::to' by 'Rgraphviz::to' when loading 'SplicingGraphs'
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/SplicingGraphs/txpath-methods.Rd_%03d_medium.png", width=480, height=480)
> ### Name: txpath-methods
> ### Title: Extract the transcript paths of a splicing graph
> ### Aliases: txpath-methods txpath txpath,GRangesList-method
> ### txpath,SplicingGraphs-method txweight txweight,SplicingGraphs-method
> ### txweight<- txweight<-,SplicingGraphs-method
>
> ### ** Examples
>
> ## ---------------------------------------------------------------------
> ## 1. Make SplicingGraphs object 'sg' from toy gene model (see
> ## '?SplicingGraphs')
> ## ---------------------------------------------------------------------
> example(SplicingGraphs)
SplcnG> ## ---------------------------------------------------------------------
SplcnG> ## 1. Load a toy gene model as a TxDb object
SplcnG> ## ---------------------------------------------------------------------
SplcnG>
SplcnG> library(GenomicFeatures)
SplcnG> suppressWarnings(
SplcnG+ toy_genes_txdb <- makeTxDbFromGFF(toy_genes_gff())
SplcnG+ )
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
SplcnG> ## ---------------------------------------------------------------------
SplcnG> ## 2. Compute all the splicing graphs (1 graph per gene) and return them
SplcnG> ## in a SplicingGraphs object
SplcnG> ## ---------------------------------------------------------------------
SplcnG>
SplcnG> ## Extract the exons grouped by transcript:
SplcnG> ex_by_tx <- exonsBy(toy_genes_txdb, by="tx", use.names=TRUE)
SplcnG> ## Extract the transcripts grouped by gene:
SplcnG> tx_by_gn <- transcriptsBy(toy_genes_txdb, by="gene")
SplcnG> sg <- SplicingGraphs(ex_by_tx, tx_by_gn)
SplcnG> sg
SplicingGraphs object with 5 gene(s) and 13 transcript(s)
SplcnG> ## Alternatively 'sg' can be constructed directly from the TxDb
SplcnG> ## object:
SplcnG> sg2 <- SplicingGraphs(toy_genes_txdb) # same as 'sg'
SplcnG> sg2
SplicingGraphs object with 5 gene(s) and 13 transcript(s)
SplcnG> ## Note that because SplicingGraphs objects have a slot that is an
SplcnG> ## environment (for caching the bubbles), they cannot be compared with
SplcnG> ## 'identical()' (will always return FALSE). 'all.equal()' should be
SplcnG> ## used instead:
SplcnG> stopifnot(isTRUE(all.equal(sg2, sg)))
SplcnG> ## 'sg' has 1 element per gene and 'names(sg)' gives the gene ids:
SplcnG> length(sg)
[1] 5
SplcnG> names(sg)
[1] "geneA" "geneB" "geneC" "geneD" "geneE"
SplcnG> ## ---------------------------------------------------------------------
SplcnG> ## 3. Basic manipulation of a SplicingGraphs object
SplcnG> ## ---------------------------------------------------------------------
SplcnG>
SplcnG> ## Basic accessors:
SplcnG> seqnames(sg)
geneA geneB geneC geneD geneE
chrX chrX chrX chrX chrX
Levels: chrX
SplcnG> strand(sg)
geneA geneB geneC geneD geneE
+ - + + +
Levels: + - *
SplcnG> seqinfo(sg)
Seqinfo object with 1 sequence from an unspecified genome; no seqlengths:
seqnames seqlengths isCircular genome
chrX NA NA <NA>
SplcnG> ## Number of transcripts per gene:
SplcnG> elementNROWS(sg)
geneA geneB geneC geneD geneE
2 2 3 4 2
SplcnG> ## The transcripts of a given gene can be extracted with [[. The result
SplcnG> ## is an *unnamed* GRangesList object containing the exons grouped by
SplcnG> ## transcript:
SplcnG> sg[["geneD"]]
GRangesList object of length 4:
[[1]]
GRanges object with 2 ranges and 5 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank start_SSid
<Rle> <IRanges> <Rle> | <integer> <character> <integer> <integer>
[1] chrX [601, 630] + | 10 Dx2 1 1
[2] chrX [666, 675] + | 12 Dx4 2 5
end_SSid
<integer>
[1] 3
[2] 6
[[2]]
GRanges object with 2 ranges and 5 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank start_SSid
[1] chrX [601, 620] + | 9 Dx1 1 1
[2] chrX [651, 700] + | 11 Dx3 2 4
end_SSid
[1] 2
[2] 8
[[3]]
GRanges object with 3 ranges and 5 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank start_SSid
[1] chrX [601, 620] + | 9 Dx1 1 1
[2] chrX [666, 675] + | 12 Dx4 2 5
[3] chrX [691, 700] + | 13 Dx5 3 7
end_SSid
[1] 2
[2] 6
[3] 8
...
<1 more element>
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
SplcnG> ## See '?plotTranscripts' for how to plot those transcripts.
SplcnG>
SplcnG> ## The transcripts of all the genes can be extracted with unlist(). The
SplcnG> ## result is a *named* GRangesList object containing the exons grouped
SplcnG> ## by transcript. The names on the object are the gene ids:
SplcnG> ex_by_tx <- unlist(sg)
SplcnG> ex_by_tx
GRangesList object of length 13:
$geneA
GRanges object with 1 range and 5 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank start_SSid
<Rle> <IRanges> <Rle> | <integer> <character> <integer> <integer>
[1] chrX [11, 50] + | 2 Ax2 1 1
end_SSid
<integer>
[1] 3
$geneA
GRanges object with 2 ranges and 5 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank start_SSid
[1] chrX [11, 40] + | 1 Ax1 1 1
[2] chrX [71, 100] + | 3 Ax3 2 4
end_SSid
[1] 2
[2] 5
$geneB
GRanges object with 2 ranges and 5 metadata columns:
seqnames ranges strand | exon_id exon_name exon_rank start_SSid
[1] chrX [251, 300] - | 23 Bx1 1 3
[2] chrX [201, 230] - | 20 Bx2 2 6
end_SSid
[1] 1
[2] 4
...
<10 more elements>
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> sg
SplicingGraphs object with 5 gene(s) and 13 transcript(s)
>
> ## 'sg' has 1 element per gene and 'names(sg)' gives the gene ids.
> names(sg)
[1] "geneA" "geneB" "geneC" "geneD" "geneE"
>
> ## ---------------------------------------------------------------------
> ## 2. txpath()
> ## ---------------------------------------------------------------------
> ## Note that the list elements in the returned IntegerList object
> ## always consist of an even number of Splicing Site ids in ascending
> ## order.
> txpath(sg["geneB"])
IntegerList of length 2
[["B1"]] 1 3 4 6
[["B2"]] 2 3 4 5
> txpath(sg["geneD"])
IntegerList of length 4
[["D4"]] 1 3 5 6
[["D1"]] 1 2 4 8
[["D2"]] 1 2 5 6 7 8
[["D3"]] 1 3 4 8
> strand(sg)
geneA geneB geneC geneD geneE
+ - + + +
Levels: + - *
>
> txpath(sg["geneD"], as.matrix=TRUE) # splicing matrix
R 1 2 3 4 5 6 7 8 L
D4 TRUE TRUE FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE
D1 TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE TRUE
D2 TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
D3 TRUE TRUE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
>
> ## ---------------------------------------------------------------------
> ## 3. txweight()
> ## ---------------------------------------------------------------------
> txweight(sg)
NULL
> plot(sg["geneD"])
>
> txweight(sg) <- 5
> txweight(sg)
A1 A2 B1 B2 C1 C2 C3 D4 D1 D2 D3 E1 E2
5 5 5 5 5 5 5 5 5 5 5 5 5
> plot(sg["geneD"]) # FIXME: Edges not rendered with correct width!
> plot(sgraph(sg["geneD"], as.igraph=TRUE)) # need to use this for now
Attaching package: 'igraph'
The following objects are masked from 'package:graph':
degree, edges, intersection, union
The following object is masked from 'package:Rsamtools':
path
The following object is masked from 'package:Biostrings':
union
The following object is masked from 'package:GenomicRanges':
union
The following objects are masked from 'package:IRanges':
simplify, union
The following objects are masked from 'package:S4Vectors':
compare, union
The following objects are masked from 'package:BiocGenerics':
normalize, union
The following objects are masked from 'package:stats':
decompose, spectrum
The following object is masked from 'package:base':
union
>
> txweight(sg)[8:11] <- 5 * (4:1)
> txweight(sg)
A1 A2 B1 B2 C1 C2 C3 D4 D1 D2 D3 E1 E2
5 5 5 5 5 5 5 20 15 10 5 5 5
> plot(sgraph(sg["geneD"], tx_id.as.edge.label=TRUE, as.igraph=TRUE))
>
> ## ---------------------------------------------------------------------
> ## 4. An advanced example
> ## ---------------------------------------------------------------------
> ## [TODO: Counting "unambiguous compatible hits" per transcript should be
> ## supported by countReads(). Simplify the code below when countReads()
> ## supports this.]
> ## Here we show how to find "unambiguous compatible hits" between a set
> ## of RNA-seq reads and a set of transcripts, that is, hits that are
> ## compatible with the splicing of exactly 1 transcript. Then we set the
> ## transcript weights based on the number of unambiguous compatible hits
> ## they received and we finally plot some splicing graphs that show
> ## the weighted transcripts.
> ## Note that the code we use to find the unambiguous compatible hits
> ## uses findOverlaps() and encodeOverlaps() defined in the IRanges and
> ## GenomicRanges packages. It only requires that the transcripts are
> ## represented as a GRangesList object and the reads as a GAlignments
> ## (single-end) or GAlignmentPairs (paired-end) object, and therefore is
> ## not specific to SplicingGraphs.
>
> ## First we load toy reads (single-end) from a BAM file. We filter out
> ## secondary alignments, reads not passing quality controls, and PCR or
> ## optical duplicates (see ?scanBamFlag in the Rsamtools package for
> ## more information):
> flag0 <- scanBamFlag(isSecondaryAlignment=FALSE,
+ isNotPassingQualityControls=FALSE,
+ isDuplicate=FALSE)
> param0 <- ScanBamParam(flag=flag0)
> gal <- readGAlignments(toy_reads_bam(), use.names=TRUE, param=param0)
> gal
GAlignments object with 42 alignments and 0 metadata columns:
seqnames strand cigar qwidth start end width
<Rle> <Rle> <character> <integer> <integer> <integer> <integer>
r001 chrX - 10M 10 12 21 10
r002 chrX + 7M15N3M 10 15 39 25
r003 chrX + 10M 10 21 30 10
r004 chrX - 10M 10 26 35 10
r005 chrX + 10M 10 31 40 10
... ... ... ... ... ... ... ...
r038 chrX - 10M 10 278 287 10
r039 chrX + 10M 10 280 289 10
r040 chrX + 10M 10 285 294 10
r041 chrX - 5M12N5M 10 288 309 22
r042 chrX + 10M 10 290 299 10
njunc
<integer>
r001 0
r002 1
r003 0
r004 0
r005 0
... ...
r038 0
r039 0
r040 0
r041 1
r042 0
-------
seqinfo: 1 sequence from an unspecified genome
>
> ## Put the reads in a GRangesList object:
> grl <- grglist(gal, order.as.in.query=TRUE)
>
> ## Put the transcripts in a GRangesList object (made of exons grouped
> ## by transcript):
> ex_by_tx <- unlist(sg)
>
> ## Most of the times the RNA-seq protocol is unstranded so the strand
> ## reported in the BAM file (and propagated to 'grl') for each alignment
> ## is meaningless. Thus we should call findOverlaps() with
> ## 'ignore.strand=TRUE':
> ov0 <- findOverlaps(grl, ex_by_tx, ignore.strand=TRUE)
>
> ## Encode the overlaps (this compare the fragmentation of the reads with
> ## the splicing of the transcripts):
> ovenc0 <- encodeOverlaps(grl, ex_by_tx, hits=ov0,
+ flip.query.if.wrong.strand=TRUE)
> ov0_is_compat <- isCompatibleWithSplicing(ovenc0)
>
> ## Keep compatible overlaps only:
> ov1 <- ov0[ov0_is_compat]
>
> ## Only keep overlaps that are compatible with exactly 1 transcript:
> ov2 <- ov1[queryHits(ov1) %in% which(countQueryHits(ov1) == 1L)]
> nhit_per_tx <- countSubjectHits(ov2)
> names(nhit_per_tx) <- names(txweight(sg))
> nhit_per_tx
A1 A2 B1 B2 C1 C2 C3 D4 D1 D2 D3 E1 E2
1 5 13 0 0 0 0 0 0 0 0 0 0
>
> txweight(sg) <- 2 * nhit_per_tx
> plot(sgraph(sg["geneA"], tx_id.as.edge.label=TRUE, as.igraph=TRUE))
> plot(sgraph(sg["geneB"], tx_id.as.edge.label=TRUE, as.igraph=TRUE))
>
>
>
>
>
> dev.off()
null device
1
>