The TxDb class is a container for storing transcript annotations.
See ?FeatureDb for a more generic container for storing
genomic locations of an arbitrary type of genomic features.
See ?makeTxDbFromUCSC and
?makeTxDbFromBiomart for convenient ways to
make TxDb objects from UCSC or BioMart online resources.
See ?makeTxDbFromGFF for making a TxDb
object from annotations available as a GFF3 or GTF file.
Methods
In the code snippets below, x is a TxDb object.
metadata(x):
Return x's metadata in a data frame.
seqlevels0(x):
Get the sequence levels originally in x. This ignores any
change the user might have made to the sequence levels with the
seqlevels setter.
seqlevels(x), seqlevels(x) <- value:
Get or set the sequence levels in x.
seqinfo(x), seqinfo(x) <- value:
Get or set the information about the underlying sequences.
Note that, for now, the setter only supports replacement of the
sequence names, i.e., except for their sequence names (accessed with
seqnames(value) and seqnames(seqinfo(x)), respectively),
Seqinfo objects value (supplied) and
seqinfo(x) (current) must be identical.
isActiveSeq(x):
Return the currently active sequences for this txdb object as a
named logical vector. Only active sequences will be tapped when
using the supplied accessor methods. Inactive sequences will be
ignored. By default, all available sequences will be active.
isActiveSeq(x) <- value:
Allows the user to change which sequences will be actively
accessed by the accessor methods by altering the contents of this
named logical vector.
seqlevelsStyle(x), seqlevelsStyle(x) <- value:
Get or set the seqname style for x.
See the seqlevelsStyle generic getter and setter
in the GenomeInfoDb package for more information.
as.list(x):
Dump the entire db into a list of data frames, say txdb_dump,
that can then be used to recreate the original db with
do.call(makeTxDb, txdb_dump) with no loss of information
(except possibly for some of the metadata).
Note that the transcripts are dumped in the same order in all the
data frames.
Author(s)
Herv<c3><83><c2><a9> Pag<c3><83><c2><a8>s, Marc Carlson
See Also
makeTxDbFromUCSC, makeTxDbFromBiomart,
makeTxDbFromGRanges, and makeTxDbFromGFF,
for convenient ways to make a TxDb object from UCSC or BioMart
online resources, or from a GRanges object,
or from a GFF or GTF file.
saveDb and
loadDb in the AnnotationDbi
package for saving and loading a TxDb object as an SQLite file.
transcripts, transcriptsBy,
and transcriptsByOverlaps,
for how to extract genomic features from a TxDb object.
transcriptLengths for extracting the transcript
lengths from a TxDb object.
select-methods for how to use the
simple "select" interface to extract information from a
TxDb object.
The FeatureDb class for storing genomic locations
of an arbitrary type of genomic features.
The Seqinfo class in the GenomeInfoDb
package.
Examples
txdb_file <- system.file("extdata", "Biomart_Ensembl_sample.sqlite",
package="GenomicFeatures")
txdb <- loadDb(txdb_file)
txdb
## Use of seqinfo():
seqlevelsStyle(txdb)
seqinfo(txdb)
seqlevels(txdb)
seqlengths(txdb) # shortcut for 'seqlengths(seqinfo(txdb))'
isCircular(txdb) # shortcut for 'isCircular(seqinfo(txdb))'
names(which(isCircular(txdb)))
## You can set user-supplied seqlevels on 'txdb' to restrict any further
## operations to a subset of chromosomes:
seqlevels(txdb) <- c("Y", "6")
## Then you can restore the seqlevels stored in the db:
seqlevels(txdb) <- seqlevels0(txdb)
## Use of as.list():
txdb_dump <- as.list(txdb)
txdb_dump
txdb1 <- do.call(makeTxDb, txdb_dump)
stopifnot(identical(as.list(txdb1), txdb_dump))
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(GenomicFeatures)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomicFeatures/TxDb-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: TxDb-class
> ### Title: TxDb objects
> ### Aliases: TxDb-class class:TxDb TxDb species,TxDb-method
> ### organism,TxDb-method seqlevels0,TxDb-method seqlevels<-,TxDb-method
> ### seqinfo,TxDb-method isActiveSeq isActiveSeq<- isActiveSeq,TxDb-method
> ### isActiveSeq<-,TxDb-method show,TxDb-method as.list,TxDb-method
> ### Keywords: methods classes
>
> ### ** Examples
>
> txdb_file <- system.file("extdata", "Biomart_Ensembl_sample.sqlite",
+ package="GenomicFeatures")
> txdb <- loadDb(txdb_file)
> txdb
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: BioMart
# Organism: Homo sapiens
# Taxonomy ID: 9606
# Resource URL: www.ensembl.org:80
# BioMart database: ENSEMBL_MART_ENSEMBL
# BioMart database version: Ensembl Genes 83
# BioMart dataset: hsapiens_gene_ensembl
# BioMart dataset description: Homo sapiens genes (GRCh38.p5)
# BioMart dataset version: GRCh38.p5
# Full dataset: no
# miRBase build ID: NA
# transcript_nrow: 6
# exon_nrow: 56
# cds_nrow: 48
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2015-12-16 00:22:26 -0800 (Wed, 16 Dec 2015)
# GenomicFeatures version at creation time: 1.22.6
# RSQLite version at creation time: 1.0.0
# DBSCHEMAVERSION: 1.1
>
> ## Use of seqinfo():
> seqlevelsStyle(txdb)
[1] "NCBI" "Ensembl"
> seqinfo(txdb)
Seqinfo object with 6 sequences from an unspecified genome:
seqnames seqlengths isCircular genome
CHR_HSCHR6_MHC_APD_CTG1 170845044 FALSE <NA>
3 198295559 FALSE <NA>
6 170805979 FALSE <NA>
13 114364328 FALSE <NA>
16 90338345 FALSE <NA>
Y 57227415 FALSE <NA>
> seqlevels(txdb)
[1] "CHR_HSCHR6_MHC_APD_CTG1" "3"
[3] "6" "13"
[5] "16" "Y"
> seqlengths(txdb) # shortcut for 'seqlengths(seqinfo(txdb))'
CHR_HSCHR6_MHC_APD_CTG1 3 6
170845044 198295559 170805979
13 16 Y
114364328 90338345 57227415
> isCircular(txdb) # shortcut for 'isCircular(seqinfo(txdb))'
CHR_HSCHR6_MHC_APD_CTG1 3 6
FALSE FALSE FALSE
13 16 Y
FALSE FALSE FALSE
> names(which(isCircular(txdb)))
character(0)
>
> ## You can set user-supplied seqlevels on 'txdb' to restrict any further
> ## operations to a subset of chromosomes:
> seqlevels(txdb) <- c("Y", "6")
> ## Then you can restore the seqlevels stored in the db:
> seqlevels(txdb) <- seqlevels0(txdb)
>
> ## Use of as.list():
> txdb_dump <- as.list(txdb)
> txdb_dump
$transcripts
tx_id tx_name tx_type tx_chrom
1 1 ENST00000435657 protein_coding CHR_HSCHR6_MHC_APD_CTG1
2 2 ENST00000013894 nonsense_mediated_decay 3
3 3 ENST00000313243 protein_coding 6
4 4 ENST00000384428 misc_RNA 13
5 5 ENST00000268655 protein_coding 16
6 6 ENST00000478783 processed_transcript Y
tx_strand tx_start tx_end
1 - 31844536 31862971
2 + 43690951 43711864
3 - 10762723 10838495
4 - 23152586 23152686
5 + 3401420 3409370
6 + 2977819 2979350
$splicings
tx_id exon_rank exon_id exon_name exon_chrom exon_strand
1 1 1 30 ENSE00001949146 CHR_HSCHR6_MHC_APD_CTG1 -
2 1 2 29 ENSE00002223385 CHR_HSCHR6_MHC_APD_CTG1 -
3 1 3 28 ENSE00003787816 CHR_HSCHR6_MHC_APD_CTG1 -
4 1 4 27 ENSE00003298651 CHR_HSCHR6_MHC_APD_CTG1 -
5 1 5 26 ENSE00003689458 CHR_HSCHR6_MHC_APD_CTG1 -
6 1 6 25 ENSE00003556077 CHR_HSCHR6_MHC_APD_CTG1 -
7 1 7 24 ENSE00003657481 CHR_HSCHR6_MHC_APD_CTG1 -
8 1 8 23 ENSE00003622036 CHR_HSCHR6_MHC_APD_CTG1 -
9 1 9 22 ENSE00003464449 CHR_HSCHR6_MHC_APD_CTG1 -
10 1 10 21 ENSE00003473326 CHR_HSCHR6_MHC_APD_CTG1 -
11 1 11 20 ENSE00001604430 CHR_HSCHR6_MHC_APD_CTG1 -
12 1 12 19 ENSE00001679938 CHR_HSCHR6_MHC_APD_CTG1 -
13 1 13 18 ENSE00003672925 CHR_HSCHR6_MHC_APD_CTG1 -
14 1 14 17 ENSE00003659779 CHR_HSCHR6_MHC_APD_CTG1 -
15 1 15 16 ENSE00001730983 CHR_HSCHR6_MHC_APD_CTG1 -
16 1 16 15 ENSE00003230483 CHR_HSCHR6_MHC_APD_CTG1 -
17 1 17 14 ENSE00003672609 CHR_HSCHR6_MHC_APD_CTG1 -
18 1 18 13 ENSE00003543668 CHR_HSCHR6_MHC_APD_CTG1 -
19 1 19 12 ENSE00003534752 CHR_HSCHR6_MHC_APD_CTG1 -
20 1 20 11 ENSE00003569257 CHR_HSCHR6_MHC_APD_CTG1 -
21 1 21 10 ENSE00003510150 CHR_HSCHR6_MHC_APD_CTG1 -
22 1 22 9 ENSE00003625416 CHR_HSCHR6_MHC_APD_CTG1 -
23 1 23 8 ENSE00003681445 CHR_HSCHR6_MHC_APD_CTG1 -
24 1 24 7 ENSE00003535202 CHR_HSCHR6_MHC_APD_CTG1 -
25 1 25 6 ENSE00001749230 CHR_HSCHR6_MHC_APD_CTG1 -
26 1 26 5 ENSE00001652388 CHR_HSCHR6_MHC_APD_CTG1 -
27 1 27 4 ENSE00001759190 CHR_HSCHR6_MHC_APD_CTG1 -
28 1 28 3 ENSE00003688015 CHR_HSCHR6_MHC_APD_CTG1 -
29 1 29 2 ENSE00001797329 CHR_HSCHR6_MHC_APD_CTG1 -
30 1 30 1 ENSE00003574628 CHR_HSCHR6_MHC_APD_CTG1 -
31 2 1 31 ENSE00001633386 3 +
32 2 2 32 ENSE00003527253 3 +
33 2 3 33 ENSE00001728436 3 +
34 2 4 34 ENSE00001611146 3 +
35 2 5 35 ENSE00003565985 3 +
36 2 6 36 ENSE00001740024 3 +
37 3 1 50 ENSE00001481374 6 -
38 3 2 49 ENSE00003696993 6 -
39 3 3 48 ENSE00003697955 6 -
40 3 4 47 ENSE00003694962 6 -
41 3 5 46 ENSE00003699609 6 -
42 3 6 45 ENSE00003699541 6 -
43 3 7 44 ENSE00003695477 6 -
44 3 8 43 ENSE00003701805 6 -
45 3 9 42 ENSE00003712208 6 -
46 3 10 41 ENSE00003700335 6 -
47 3 11 40 ENSE00003698201 6 -
48 3 12 39 ENSE00003700727 6 -
49 3 13 38 ENSE00003699774 6 -
50 3 14 37 ENSE00003702156 6 -
51 4 1 51 ENSE00001499436 13 -
52 5 1 52 ENSE00001838202 16 +
53 5 2 53 ENSE00003488193 16 +
54 5 3 54 ENSE00000666899 16 +
55 6 1 55 ENSE00001900413 Y +
56 6 2 56 ENSE00001880607 Y +
exon_start exon_end cds_id cds_start cds_end
1 31862564 31862971 NA NA NA
2 31861849 31862268 29 31861849 31862235
3 31860004 31860138 28 31860004 31860138
4 31859775 31859913 27 31859775 31859913
5 31859410 31859534 26 31859410 31859534
6 31859235 31859319 25 31859235 31859319
7 31858889 31858989 24 31858889 31858989
8 31858628 31858755 23 31858628 31858755
9 31852588 31852752 22 31852588 31852752
10 31852265 31852346 21 31852265 31852346
11 31851614 31851733 20 31851614 31851733
12 31851413 31851521 19 31851413 31851521
13 31851233 31851327 18 31851233 31851327
14 31850114 31850204 17 31850114 31850204
15 31849739 31849863 16 31849739 31849863
16 31849548 31849651 15 31849548 31849651
17 31849303 31849461 14 31849303 31849461
18 31849105 31849195 13 31849105 31849195
19 31848865 31848970 12 31848865 31848970
20 31848709 31848779 11 31848709 31848779
21 31848499 31848624 10 31848499 31848624
22 31848037 31848141 9 31848037 31848141
23 31847888 31847956 8 31847888 31847956
24 31847723 31847801 7 31847723 31847801
25 31847459 31847586 6 31847459 31847586
26 31847016 31847171 5 31847016 31847171
27 31846626 31846832 4 31846626 31846832
28 31846443 31846554 3 31846443 31846554
29 31845985 31846310 2 31845985 31846310
30 31844536 31844680 1 31844612 31844680
31 43690951 43691039 30 43690993 43691039
32 43699276 43699361 31 43699276 43699361
33 43699502 43699533 32 43699502 43699503
34 43701912 43702008 NA NA NA
35 43702215 43702587 NA NA NA
36 43711709 43711864 NA NA NA
37 10838342 10838495 NA NA NA
38 10830548 10830877 45 10830548 10830648
39 10818886 10818940 44 10818886 10818940
40 10817850 10817971 43 10817850 10817971
41 10813644 10813723 42 10813644 10813723
42 10808810 10808942 41 10808810 10808942
43 10803720 10803891 40 10803720 10803891
44 10801892 10802059 39 10801892 10802059
45 10795998 10796309 38 10795998 10796309
46 10791675 10791847 37 10791675 10791847
47 10784424 10784572 36 10784424 10784572
48 10775328 10775459 35 10775328 10775459
49 10770111 10770230 34 10770111 10770230
50 10762723 10764606 33 10764452 10764606
51 23152586 23152686 NA NA NA
52 3401420 3402406 46 3402005 3402406
53 3404426 3404648 47 3404426 3404648
54 3408321 3409370 48 3408321 3408919
55 2977819 2978080 NA NA NA
56 2978810 2979350 NA NA NA
$genes
tx_id gene_id
1 1 ENSG00000231116
2 2 ENSG00000011198
3 3 ENSG00000111837
4 4 ENSG00000207157
5 5 ENSG00000103343
6 6 ENSG00000067646
$chrominfo
chrom length is_circular
1 CHR_HSCHR6_MHC_APD_CTG1 170845044 FALSE
2 3 198295559 FALSE
3 6 170805979 FALSE
4 13 114364328 FALSE
5 16 90338345 FALSE
6 Y 57227415 FALSE
> txdb1 <- do.call(makeTxDb, txdb_dump)
> stopifnot(identical(as.list(txdb1), txdb_dump))
>
>
>
>
>
> dev.off()
null device
1
>