Last data update: 2014.03.03

R: SplicingGraphs objects
SplicingGraphs-classR Documentation

SplicingGraphs objects

Description

The SplicingGraphs class is a container for storing splicing graphs together with the gene model that they are based on.

Usage

## Constructor:

SplicingGraphs(x, grouping=NULL, min.ntx=2, max.ntx=NA, check.introns=TRUE)

## SplicingGraphs basic API:

## S4 method for signature 'SplicingGraphs'
length(x)

## S4 method for signature 'SplicingGraphs'
names(x)

## S4 method for signature 'SplicingGraphs'
seqnames(x)

## S4 method for signature 'SplicingGraphs'
strand(x)

## S4 method for signature 'SplicingGraphs,ANY,ANY'
x[i, j, ... , drop=TRUE]

## S4 method for signature 'SplicingGraphs,ANY,ANY'
x[[i, j, ...]]

## S4 method for signature 'SplicingGraphs'
elementNROWS(x)

## S4 method for signature 'SplicingGraphs'
unlist(x, recursive=TRUE, use.names=TRUE)

## S4 method for signature 'SplicingGraphs'
seqinfo(x)

Arguments

x

For SplicingGraphs: A GRangesList object containing the exons of one or more genes grouped by transcript. Alternatively, x can be a TxDb object. See Details section below.

For the methods in the SplicingGraphs basic API: A SplicingGraphs object.

grouping

An optional object that represents the grouping by gene of the top-level elements (i.e. the transcripts) in x. See Details section below.

min.ntx, max.ntx

Single integers (or NA for max.ntx) specifying the minimum and maximum number of transcripts a gene must have to be considered for inclusion in the object returned by SplicingGraphs. A value of NA for max.ntx means no maximum.

check.introns

If TRUE, SplicingGraphs checks that, within each transcript, exons are ordered from 5' to 3' with gaps of at least 1 nucleotide between them.

i, j, ..., drop

A SplicingGraphs object is a list-like object and therefore it can be subsetted like a list. When subsetting with [, the result is another SplicingGraphs object containing only the selected genes. When subsetting with [[, the result is an unnamed GRangesList object containing the exons grouped by transcript. Like for list, subsetting only accepts 1 argument (i). The drop argument is ignored and trying to pass any additional argument (to j or in ...) will raise an error.

recursive, use.names

A SplicingGraphs object is a list-like object and therefore it can be unlisted with unlist. The result is a GRangesList object containing the exons grouped by transcript. By default this object has names on it, and the names are the gene ids. Note that because each element in this object represents a transcript (and not a gene), the names are not unique. If use.names=FALSE is used, the result has no names on it. The recursive agument is ignored.

Details

The Splicing graph theory only applies to genes that have all the exons of all their transcripts on the same chromosome and strand. In particular, in its current form, the splicing graph theory cannot describe trans-splicing events. The SplicingGraphs constructor will reject genes that do not satisfy this.

The first argument of the SplicingGraphs constructor, x, can be either a GRangesList object or a TxDb object.

When x is a GRangesList object, it must contain the exons of one or more genes grouped by transcripts. More precisely, each top-level element in x must contain the genomic ranges of the exons for a particular transcript. Typically x will be obtained from a TxDb object txdb with exonsBy(txdb, by="tx", use.names=TRUE).

grouping is an optional argument that is only supported when x is a GRangesList object. It represents the grouping by gene of the top-level elements (i.e. the transcripts) in GRangesList object x. It can be either:

  • Missing (i.e. NULL). In that case, all the transcripts in x are considered to belong to the same gene and the SplicingGraphs object returned by SplicingGraphs will be unnamed.

  • A list of integer or character vectors, or an IntegerList, or a CharacterList object, of length the number of genes to process, and where grouping[[i]] is a vector of valid subscripts in x pointing to all the transcripts of the i-th gene.

  • A factor, character vector, or integer vector, of the same length as x and 1 level per gene.

  • A named GRangesList object containing transcripts grouped by genes i.e. each top-level element in grouping contains the genomic ranges of the transcripts for a particular gene. In that case, the grouping is inferred from the tx_id (or alternatively tx_name) metadata column of unlist(grouping) and all the values in that column must be in names(x). If x was obtained with exonsBy(txdb, by="tx", use.names=TRUE), then the GRangesList object used for grouping would typically be obtained with transcriptsBy(txdb, by="gene").

  • A data.frame or DataFrame with 2 character vector columns: a gene_id column (factor, character vector, or integer vector), and a tx_id (or alternatively tx_name) column. In that case, x must be named and all the values in the tx_id (or tx_name) column must be in names(x).

Value

For SplicingGraphs: a SplicingGraphs object with 1 element per gene.

For length: the number of genes in x, which is also the number of splicing graphs in x.

For names: the gene ids. Note that the names on a SplicingGraphs object are always unique and cannot be modified.

For seqnames: a named factor of the length of x containing the name of the chromosome for each gene.

For strand: a named factor of the length of x containing the strand for each gene.

For elementNROWS: the number of transcripts per gene.

For seqinfo: the seqinfo of the GRangesList or TxDb object that was used to construct the SplicingGraphs object.

Author(s)

H. Pages

References

Heber, S., Alekseyev, M., Sze, S., Tang, H., and Pevzner, P. A. Splicing graphs and EST assembly problem Bioinformatics Date: Jul 2002 Vol: 18 Pages: S181-S188

Sammeth, M. (2009) Complete Alternative Splicing Events Are Bubbles in Splicing Graphs J. Comput. Biol. Date: Aug 2009 Vol: 16 Pages: 1117-1140

See Also

This man page is part of the SplicingGraphs package. Please see ?`SplicingGraphs-package` for an overview of the package and for an index of its man pages.

Other topics related to this man page and documented in other packages:

  • The exonsBy and transcriptsBy functions, and the TxDb class, defined in the GenomicFeatures package.

  • The GRangesList class defined in the GenomicRanges package.

  • The IntegerList and CharacterList classes defined in the IRanges package.

  • The DataFrame class defined in the S4Vectors package.

Examples

## ---------------------------------------------------------------------
## 1. Load a toy gene model as a TxDb object
## ---------------------------------------------------------------------

library(GenomicFeatures)
suppressWarnings(
  toy_genes_txdb <- makeTxDbFromGFF(toy_genes_gff())
)

## ---------------------------------------------------------------------
## 2. Compute all the splicing graphs (1 graph per gene) and return them
##    in a SplicingGraphs object
## ---------------------------------------------------------------------

## Extract the exons grouped by transcript:
ex_by_tx <- exonsBy(toy_genes_txdb, by="tx", use.names=TRUE)

## Extract the transcripts grouped by gene:
tx_by_gn <- transcriptsBy(toy_genes_txdb, by="gene")

sg <- SplicingGraphs(ex_by_tx, tx_by_gn)
sg

## Alternatively 'sg' can be constructed directly from the TxDb
## object:
sg2 <- SplicingGraphs(toy_genes_txdb)  # same as 'sg'
sg2

## Note that because SplicingGraphs objects have a slot that is an
## environment (for caching the bubbles), they cannot be compared with
## 'identical()' (will always return FALSE). 'all.equal()' should be
## used instead:
stopifnot(isTRUE(all.equal(sg2, sg)))

## 'sg' has 1 element per gene and 'names(sg)' gives the gene ids:
length(sg)
names(sg)

## ---------------------------------------------------------------------
## 3. Basic manipulation of a SplicingGraphs object
## ---------------------------------------------------------------------

## Basic accessors:
seqnames(sg)
strand(sg)
seqinfo(sg)

## Number of transcripts per gene:
elementNROWS(sg)

## The transcripts of a given gene can be extracted with [[. The result
## is an *unnamed* GRangesList object containing the exons grouped by
## transcript:
sg[["geneD"]]

## See '?plotTranscripts' for how to plot those transcripts.

## The transcripts of all the genes can be extracted with unlist(). The
## result is a *named* GRangesList object containing the exons grouped
## by transcript. The names on the object are the gene ids:
ex_by_tx <- unlist(sg)
ex_by_tx

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(SplicingGraphs)
Loading required package: GenomicFeatures
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: Biostrings
Loading required package: XVector
Loading required package: Rsamtools
Loading required package: Rgraphviz
Loading required package: graph

Attaching package: 'graph'

The following object is masked from 'package:Biostrings':

    complement

Loading required package: grid

Attaching package: 'Rgraphviz'

The following objects are masked from 'package:IRanges':

    from, to

The following objects are masked from 'package:S4Vectors':

    from, to

Warning messages:
1: replacing previous import 'IRanges::from' by 'Rgraphviz::from' when loading 'SplicingGraphs' 
2: replacing previous import 'IRanges::to' by 'Rgraphviz::to' when loading 'SplicingGraphs' 
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/SplicingGraphs/SplicingGraphs-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: SplicingGraphs-class
> ### Title: SplicingGraphs objects
> ### Aliases: class:GeneModel GeneModel-class class:SplicingGraphs
> ###   SplicingGraphs-class seqnames,GeneModel-method
> ###   strand,GeneModel-method seqinfo,GeneModel-method
> ###   isCircular<-,GeneModel-method length,SplicingGraphs-method
> ###   names,SplicingGraphs-method [,SplicingGraphs,ANY,ANY-method
> ###   [[,SplicingGraphs,ANY,ANY-method elementNROWS,SplicingGraphs-method
> ###   unlist,SplicingGraphs-method seqnames,SplicingGraphs-method
> ###   strand,SplicingGraphs-method seqinfo,SplicingGraphs-method
> ###   isCircular<-,SplicingGraphs-method show,SplicingGraphs-method
> ###   SplicingGraphs SplicingGraphs,GRangesList-method
> ###   SplicingGraphs,TxDb-method
> 
> ### ** Examples
> 
> ## ---------------------------------------------------------------------
> ## 1. Load a toy gene model as a TxDb object
> ## ---------------------------------------------------------------------
> 
> library(GenomicFeatures)
> suppressWarnings(
+   toy_genes_txdb <- makeTxDbFromGFF(toy_genes_gff())
+ )
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
> 
> ## ---------------------------------------------------------------------
> ## 2. Compute all the splicing graphs (1 graph per gene) and return them
> ##    in a SplicingGraphs object
> ## ---------------------------------------------------------------------
> 
> ## Extract the exons grouped by transcript:
> ex_by_tx <- exonsBy(toy_genes_txdb, by="tx", use.names=TRUE)
> 
> ## Extract the transcripts grouped by gene:
> tx_by_gn <- transcriptsBy(toy_genes_txdb, by="gene")
> 
> sg <- SplicingGraphs(ex_by_tx, tx_by_gn)
> sg
SplicingGraphs object with 5 gene(s) and 13 transcript(s)
> 
> ## Alternatively 'sg' can be constructed directly from the TxDb
> ## object:
> sg2 <- SplicingGraphs(toy_genes_txdb)  # same as 'sg'
> sg2
SplicingGraphs object with 5 gene(s) and 13 transcript(s)
> 
> ## Note that because SplicingGraphs objects have a slot that is an
> ## environment (for caching the bubbles), they cannot be compared with
> ## 'identical()' (will always return FALSE). 'all.equal()' should be
> ## used instead:
> stopifnot(isTRUE(all.equal(sg2, sg)))
> 
> ## 'sg' has 1 element per gene and 'names(sg)' gives the gene ids:
> length(sg)
[1] 5
> names(sg)
[1] "geneA" "geneB" "geneC" "geneD" "geneE"
> 
> ## ---------------------------------------------------------------------
> ## 3. Basic manipulation of a SplicingGraphs object
> ## ---------------------------------------------------------------------
> 
> ## Basic accessors:
> seqnames(sg)
geneA geneB geneC geneD geneE 
 chrX  chrX  chrX  chrX  chrX 
Levels: chrX
> strand(sg)
geneA geneB geneC geneD geneE 
    +     -     +     +     + 
Levels: + - *
> seqinfo(sg)
Seqinfo object with 1 sequence from an unspecified genome; no seqlengths:
  seqnames seqlengths isCircular genome
  chrX             NA         NA   <NA>
> 
> ## Number of transcripts per gene:
> elementNROWS(sg)
geneA geneB geneC geneD geneE 
    2     2     3     4     2 
> 
> ## The transcripts of a given gene can be extracted with [[. The result
> ## is an *unnamed* GRangesList object containing the exons grouped by
> ## transcript:
> sg[["geneD"]]
GRangesList object of length 4:
[[1]] 
GRanges object with 2 ranges and 5 metadata columns:
      seqnames     ranges strand |   exon_id   exon_name exon_rank start_SSid
         <Rle>  <IRanges>  <Rle> | <integer> <character> <integer>  <integer>
  [1]     chrX [601, 630]      + |        10         Dx2         1          1
  [2]     chrX [666, 675]      + |        12         Dx4         2          5
       end_SSid
      <integer>
  [1]         3
  [2]         6

[[2]] 
GRanges object with 2 ranges and 5 metadata columns:
      seqnames     ranges strand | exon_id exon_name exon_rank start_SSid
  [1]     chrX [601, 620]      + |       9       Dx1         1          1
  [2]     chrX [651, 700]      + |      11       Dx3         2          4
      end_SSid
  [1]        2
  [2]        8

[[3]] 
GRanges object with 3 ranges and 5 metadata columns:
      seqnames     ranges strand | exon_id exon_name exon_rank start_SSid
  [1]     chrX [601, 620]      + |       9       Dx1         1          1
  [2]     chrX [666, 675]      + |      12       Dx4         2          5
  [3]     chrX [691, 700]      + |      13       Dx5         3          7
      end_SSid
  [1]        2
  [2]        6
  [3]        8

...
<1 more element>
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> 
> ## See '?plotTranscripts' for how to plot those transcripts.
> 
> ## The transcripts of all the genes can be extracted with unlist(). The
> ## result is a *named* GRangesList object containing the exons grouped
> ## by transcript. The names on the object are the gene ids:
> ex_by_tx <- unlist(sg)
> ex_by_tx
GRangesList object of length 13:
$geneA 
GRanges object with 1 range and 5 metadata columns:
      seqnames    ranges strand |   exon_id   exon_name exon_rank start_SSid
         <Rle> <IRanges>  <Rle> | <integer> <character> <integer>  <integer>
  [1]     chrX  [11, 50]      + |         2         Ax2         1          1
       end_SSid
      <integer>
  [1]         3

$geneA 
GRanges object with 2 ranges and 5 metadata columns:
      seqnames    ranges strand | exon_id exon_name exon_rank start_SSid
  [1]     chrX [11,  40]      + |       1       Ax1         1          1
  [2]     chrX [71, 100]      + |       3       Ax3         2          4
      end_SSid
  [1]        2
  [2]        5

$geneB 
GRanges object with 2 ranges and 5 metadata columns:
      seqnames     ranges strand | exon_id exon_name exon_rank start_SSid
  [1]     chrX [251, 300]      - |      23       Bx1         1          3
  [2]     chrX [201, 230]      - |      20       Bx2         2          6
      end_SSid
  [1]        1
  [2]        4

...
<10 more elements>
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>