Last data update: 2014.03.03

R: TxDb objects
TxDb-classR Documentation

TxDb objects

Description

The TxDb class is a container for storing transcript annotations.

See ?FeatureDb for a more generic container for storing genomic locations of an arbitrary type of genomic features.

See ?makeTxDbFromUCSC and ?makeTxDbFromBiomart for convenient ways to make TxDb objects from UCSC or BioMart online resources.

See ?makeTxDbFromGFF for making a TxDb object from annotations available as a GFF3 or GTF file.

Methods

In the code snippets below, x is a TxDb object.

metadata(x): Return x's metadata in a data frame.

seqlevels0(x): Get the sequence levels originally in x. This ignores any change the user might have made to the sequence levels with the seqlevels setter.

seqlevels(x), seqlevels(x) <- value: Get or set the sequence levels in x.

seqinfo(x), seqinfo(x) <- value: Get or set the information about the underlying sequences. Note that, for now, the setter only supports replacement of the sequence names, i.e., except for their sequence names (accessed with seqnames(value) and seqnames(seqinfo(x)), respectively), Seqinfo objects value (supplied) and seqinfo(x) (current) must be identical.

isActiveSeq(x): Return the currently active sequences for this txdb object as a named logical vector. Only active sequences will be tapped when using the supplied accessor methods. Inactive sequences will be ignored. By default, all available sequences will be active.

isActiveSeq(x) <- value: Allows the user to change which sequences will be actively accessed by the accessor methods by altering the contents of this named logical vector.

seqlevelsStyle(x), seqlevelsStyle(x) <- value: Get or set the seqname style for x. See the seqlevelsStyle generic getter and setter in the GenomeInfoDb package for more information.

as.list(x): Dump the entire db into a list of data frames, say txdb_dump, that can then be used to recreate the original db with do.call(makeTxDb, txdb_dump) with no loss of information (except possibly for some of the metadata). Note that the transcripts are dumped in the same order in all the data frames.

Author(s)

Herv<c3><83><c2><a9> Pag<c3><83><c2><a8>s, Marc Carlson

See Also

  • makeTxDbFromUCSC, makeTxDbFromBiomart, makeTxDbFromGRanges, and makeTxDbFromGFF, for convenient ways to make a TxDb object from UCSC or BioMart online resources, or from a GRanges object, or from a GFF or GTF file.

  • saveDb and loadDb in the AnnotationDbi package for saving and loading a TxDb object as an SQLite file.

  • transcripts, transcriptsBy, and transcriptsByOverlaps, for how to extract genomic features from a TxDb object.

  • transcriptLengths for extracting the transcript lengths from a TxDb object.

  • select-methods for how to use the simple "select" interface to extract information from a TxDb object.

  • The FeatureDb class for storing genomic locations of an arbitrary type of genomic features.

  • The Seqinfo class in the GenomeInfoDb package.

Examples

txdb_file <- system.file("extdata", "Biomart_Ensembl_sample.sqlite",
                         package="GenomicFeatures")
txdb <- loadDb(txdb_file)
txdb

## Use of seqinfo():
seqlevelsStyle(txdb)
seqinfo(txdb)
seqlevels(txdb)
seqlengths(txdb)  # shortcut for 'seqlengths(seqinfo(txdb))'
isCircular(txdb)  # shortcut for 'isCircular(seqinfo(txdb))'
names(which(isCircular(txdb)))

## You can set user-supplied seqlevels on 'txdb' to restrict any further
## operations to a subset of chromosomes:
seqlevels(txdb) <- c("Y", "6")
## Then you can restore the seqlevels stored in the db:
seqlevels(txdb) <- seqlevels0(txdb)

## Use of as.list():
txdb_dump <- as.list(txdb)
txdb_dump
txdb1 <- do.call(makeTxDb, txdb_dump)
stopifnot(identical(as.list(txdb1), txdb_dump))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GenomicFeatures)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomicFeatures/TxDb-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: TxDb-class
> ### Title: TxDb objects
> ### Aliases: TxDb-class class:TxDb TxDb species,TxDb-method
> ###   organism,TxDb-method seqlevels0,TxDb-method seqlevels<-,TxDb-method
> ###   seqinfo,TxDb-method isActiveSeq isActiveSeq<- isActiveSeq,TxDb-method
> ###   isActiveSeq<-,TxDb-method show,TxDb-method as.list,TxDb-method
> ### Keywords: methods classes
> 
> ### ** Examples
> 
> txdb_file <- system.file("extdata", "Biomart_Ensembl_sample.sqlite",
+                          package="GenomicFeatures")
> txdb <- loadDb(txdb_file)
> txdb
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: BioMart
# Organism: Homo sapiens
# Taxonomy ID: 9606
# Resource URL: www.ensembl.org:80
# BioMart database: ENSEMBL_MART_ENSEMBL
# BioMart database version: Ensembl Genes 83
# BioMart dataset: hsapiens_gene_ensembl
# BioMart dataset description: Homo sapiens genes (GRCh38.p5)
# BioMart dataset version: GRCh38.p5
# Full dataset: no
# miRBase build ID: NA
# transcript_nrow: 6
# exon_nrow: 56
# cds_nrow: 48
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2015-12-16 00:22:26 -0800 (Wed, 16 Dec 2015)
# GenomicFeatures version at creation time: 1.22.6
# RSQLite version at creation time: 1.0.0
# DBSCHEMAVERSION: 1.1
> 
> ## Use of seqinfo():
> seqlevelsStyle(txdb)
[1] "NCBI"    "Ensembl"
> seqinfo(txdb)
Seqinfo object with 6 sequences from an unspecified genome:
  seqnames                seqlengths isCircular genome
  CHR_HSCHR6_MHC_APD_CTG1  170845044      FALSE   <NA>
  3                        198295559      FALSE   <NA>
  6                        170805979      FALSE   <NA>
  13                       114364328      FALSE   <NA>
  16                        90338345      FALSE   <NA>
  Y                         57227415      FALSE   <NA>
> seqlevels(txdb)
[1] "CHR_HSCHR6_MHC_APD_CTG1" "3"                      
[3] "6"                       "13"                     
[5] "16"                      "Y"                      
> seqlengths(txdb)  # shortcut for 'seqlengths(seqinfo(txdb))'
CHR_HSCHR6_MHC_APD_CTG1                       3                       6 
              170845044               198295559               170805979 
                     13                      16                       Y 
              114364328                90338345                57227415 
> isCircular(txdb)  # shortcut for 'isCircular(seqinfo(txdb))'
CHR_HSCHR6_MHC_APD_CTG1                       3                       6 
                  FALSE                   FALSE                   FALSE 
                     13                      16                       Y 
                  FALSE                   FALSE                   FALSE 
> names(which(isCircular(txdb)))
character(0)
> 
> ## You can set user-supplied seqlevels on 'txdb' to restrict any further
> ## operations to a subset of chromosomes:
> seqlevels(txdb) <- c("Y", "6")
> ## Then you can restore the seqlevels stored in the db:
> seqlevels(txdb) <- seqlevels0(txdb)
> 
> ## Use of as.list():
> txdb_dump <- as.list(txdb)
> txdb_dump
$transcripts
  tx_id         tx_name                 tx_type                tx_chrom
1     1 ENST00000435657          protein_coding CHR_HSCHR6_MHC_APD_CTG1
2     2 ENST00000013894 nonsense_mediated_decay                       3
3     3 ENST00000313243          protein_coding                       6
4     4 ENST00000384428                misc_RNA                      13
5     5 ENST00000268655          protein_coding                      16
6     6 ENST00000478783    processed_transcript                       Y
  tx_strand tx_start   tx_end
1         - 31844536 31862971
2         + 43690951 43711864
3         - 10762723 10838495
4         - 23152586 23152686
5         +  3401420  3409370
6         +  2977819  2979350

$splicings
   tx_id exon_rank exon_id       exon_name              exon_chrom exon_strand
1      1         1      30 ENSE00001949146 CHR_HSCHR6_MHC_APD_CTG1           -
2      1         2      29 ENSE00002223385 CHR_HSCHR6_MHC_APD_CTG1           -
3      1         3      28 ENSE00003787816 CHR_HSCHR6_MHC_APD_CTG1           -
4      1         4      27 ENSE00003298651 CHR_HSCHR6_MHC_APD_CTG1           -
5      1         5      26 ENSE00003689458 CHR_HSCHR6_MHC_APD_CTG1           -
6      1         6      25 ENSE00003556077 CHR_HSCHR6_MHC_APD_CTG1           -
7      1         7      24 ENSE00003657481 CHR_HSCHR6_MHC_APD_CTG1           -
8      1         8      23 ENSE00003622036 CHR_HSCHR6_MHC_APD_CTG1           -
9      1         9      22 ENSE00003464449 CHR_HSCHR6_MHC_APD_CTG1           -
10     1        10      21 ENSE00003473326 CHR_HSCHR6_MHC_APD_CTG1           -
11     1        11      20 ENSE00001604430 CHR_HSCHR6_MHC_APD_CTG1           -
12     1        12      19 ENSE00001679938 CHR_HSCHR6_MHC_APD_CTG1           -
13     1        13      18 ENSE00003672925 CHR_HSCHR6_MHC_APD_CTG1           -
14     1        14      17 ENSE00003659779 CHR_HSCHR6_MHC_APD_CTG1           -
15     1        15      16 ENSE00001730983 CHR_HSCHR6_MHC_APD_CTG1           -
16     1        16      15 ENSE00003230483 CHR_HSCHR6_MHC_APD_CTG1           -
17     1        17      14 ENSE00003672609 CHR_HSCHR6_MHC_APD_CTG1           -
18     1        18      13 ENSE00003543668 CHR_HSCHR6_MHC_APD_CTG1           -
19     1        19      12 ENSE00003534752 CHR_HSCHR6_MHC_APD_CTG1           -
20     1        20      11 ENSE00003569257 CHR_HSCHR6_MHC_APD_CTG1           -
21     1        21      10 ENSE00003510150 CHR_HSCHR6_MHC_APD_CTG1           -
22     1        22       9 ENSE00003625416 CHR_HSCHR6_MHC_APD_CTG1           -
23     1        23       8 ENSE00003681445 CHR_HSCHR6_MHC_APD_CTG1           -
24     1        24       7 ENSE00003535202 CHR_HSCHR6_MHC_APD_CTG1           -
25     1        25       6 ENSE00001749230 CHR_HSCHR6_MHC_APD_CTG1           -
26     1        26       5 ENSE00001652388 CHR_HSCHR6_MHC_APD_CTG1           -
27     1        27       4 ENSE00001759190 CHR_HSCHR6_MHC_APD_CTG1           -
28     1        28       3 ENSE00003688015 CHR_HSCHR6_MHC_APD_CTG1           -
29     1        29       2 ENSE00001797329 CHR_HSCHR6_MHC_APD_CTG1           -
30     1        30       1 ENSE00003574628 CHR_HSCHR6_MHC_APD_CTG1           -
31     2         1      31 ENSE00001633386                       3           +
32     2         2      32 ENSE00003527253                       3           +
33     2         3      33 ENSE00001728436                       3           +
34     2         4      34 ENSE00001611146                       3           +
35     2         5      35 ENSE00003565985                       3           +
36     2         6      36 ENSE00001740024                       3           +
37     3         1      50 ENSE00001481374                       6           -
38     3         2      49 ENSE00003696993                       6           -
39     3         3      48 ENSE00003697955                       6           -
40     3         4      47 ENSE00003694962                       6           -
41     3         5      46 ENSE00003699609                       6           -
42     3         6      45 ENSE00003699541                       6           -
43     3         7      44 ENSE00003695477                       6           -
44     3         8      43 ENSE00003701805                       6           -
45     3         9      42 ENSE00003712208                       6           -
46     3        10      41 ENSE00003700335                       6           -
47     3        11      40 ENSE00003698201                       6           -
48     3        12      39 ENSE00003700727                       6           -
49     3        13      38 ENSE00003699774                       6           -
50     3        14      37 ENSE00003702156                       6           -
51     4         1      51 ENSE00001499436                      13           -
52     5         1      52 ENSE00001838202                      16           +
53     5         2      53 ENSE00003488193                      16           +
54     5         3      54 ENSE00000666899                      16           +
55     6         1      55 ENSE00001900413                       Y           +
56     6         2      56 ENSE00001880607                       Y           +
   exon_start exon_end cds_id cds_start  cds_end
1    31862564 31862971     NA        NA       NA
2    31861849 31862268     29  31861849 31862235
3    31860004 31860138     28  31860004 31860138
4    31859775 31859913     27  31859775 31859913
5    31859410 31859534     26  31859410 31859534
6    31859235 31859319     25  31859235 31859319
7    31858889 31858989     24  31858889 31858989
8    31858628 31858755     23  31858628 31858755
9    31852588 31852752     22  31852588 31852752
10   31852265 31852346     21  31852265 31852346
11   31851614 31851733     20  31851614 31851733
12   31851413 31851521     19  31851413 31851521
13   31851233 31851327     18  31851233 31851327
14   31850114 31850204     17  31850114 31850204
15   31849739 31849863     16  31849739 31849863
16   31849548 31849651     15  31849548 31849651
17   31849303 31849461     14  31849303 31849461
18   31849105 31849195     13  31849105 31849195
19   31848865 31848970     12  31848865 31848970
20   31848709 31848779     11  31848709 31848779
21   31848499 31848624     10  31848499 31848624
22   31848037 31848141      9  31848037 31848141
23   31847888 31847956      8  31847888 31847956
24   31847723 31847801      7  31847723 31847801
25   31847459 31847586      6  31847459 31847586
26   31847016 31847171      5  31847016 31847171
27   31846626 31846832      4  31846626 31846832
28   31846443 31846554      3  31846443 31846554
29   31845985 31846310      2  31845985 31846310
30   31844536 31844680      1  31844612 31844680
31   43690951 43691039     30  43690993 43691039
32   43699276 43699361     31  43699276 43699361
33   43699502 43699533     32  43699502 43699503
34   43701912 43702008     NA        NA       NA
35   43702215 43702587     NA        NA       NA
36   43711709 43711864     NA        NA       NA
37   10838342 10838495     NA        NA       NA
38   10830548 10830877     45  10830548 10830648
39   10818886 10818940     44  10818886 10818940
40   10817850 10817971     43  10817850 10817971
41   10813644 10813723     42  10813644 10813723
42   10808810 10808942     41  10808810 10808942
43   10803720 10803891     40  10803720 10803891
44   10801892 10802059     39  10801892 10802059
45   10795998 10796309     38  10795998 10796309
46   10791675 10791847     37  10791675 10791847
47   10784424 10784572     36  10784424 10784572
48   10775328 10775459     35  10775328 10775459
49   10770111 10770230     34  10770111 10770230
50   10762723 10764606     33  10764452 10764606
51   23152586 23152686     NA        NA       NA
52    3401420  3402406     46   3402005  3402406
53    3404426  3404648     47   3404426  3404648
54    3408321  3409370     48   3408321  3408919
55    2977819  2978080     NA        NA       NA
56    2978810  2979350     NA        NA       NA

$genes
  tx_id         gene_id
1     1 ENSG00000231116
2     2 ENSG00000011198
3     3 ENSG00000111837
4     4 ENSG00000207157
5     5 ENSG00000103343
6     6 ENSG00000067646

$chrominfo
                    chrom    length is_circular
1 CHR_HSCHR6_MHC_APD_CTG1 170845044       FALSE
2                       3 198295559       FALSE
3                       6 170805979       FALSE
4                      13 114364328       FALSE
5                      16  90338345       FALSE
6                       Y  57227415       FALSE

> txdb1 <- do.call(makeTxDb, txdb_dump)
> stopifnot(identical(as.list(txdb1), txdb_dump))
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>