Last data update: 2014.03.03

R: MultiDb and OrganismDb objects
MultiDb-classR Documentation

MultiDb and OrganismDb objects

Description

The OrganismDb class is a container for storing knowledge about existing Annotation packages and the relationships between these resources. The purpose of this object and it's associated methods is to provide a means by which users can conveniently query for data from several different annotation resources at the same time using a familiar interface.

The supporting methods select, columns and keys are used together to extract data from an OrganismDb object in a manner that should be consistent with how these are used on the supporting annotation resources.

The family of seqinfo style getters (seqinfo, seqlevels, seqlengths, isCircular, genome, and seqnameStyle) is also supported for OrganismDb objects provided that the object in question has an embedded TxDb object.

Methods

In the code snippets below, x is a OrganismDb object.

keytypes(x): allows the user to discover which keytypes can be passed in to select or keys and the keytype argument.

keys(x, keytype, pattern, column, fuzzy): Return keys for the database contained in the TxDb object .

The keytype argument specifies the kind of keys that will be returned and is always required.

If keys is used with pattern, it will pattern match on the keytype.

But if the column argument is also provided along with the pattern argument, then pattern will be matched against the values in column instead.

If keys is called with column and no pattern argument, then it will return all keys that have corresponding values in the column argument.

Thus, the behavior of keys all depends on how many arguments are specified.

Use of the fuzzy argument will toggle fuzzy matching to TRUE or FALSE. If pattern is not used, fuzzy is ignored.

columns(x): shows which kinds of data can be returned for the OrganismDb object.

select(x, keys, columns, keytype): When all the appropriate arguments are specifiedm select will retrieve the matching data as a data.frame based on parameters for selected keys and columns and keytype arguments.

mapIds(x, keys, columns, keytype, ..., multiVals): When all the appropriate arguments are specifiedm mapIds will retrieve the matching data as a vector or list based on parameters for selected keys and columns and keytype arguments. The multiVals argument can be used to choose the format of the values returned. Possible values for multiVals are:

first:

This value means that when there are multiple matches only the 1st thing that comes back will be returned. This is the default behavior

list:

This will just returns a list object to the end user

filter:

This will remove all elements that contain multiple matches and will therefore return a shorter vector than what came in whenever some of the keys match more than one value

asNA:

This will return an NA value whenever there are multiple matches

CharacterList:

This just returns a SimpleCharacterList object

FUN:

You can also supply a function to the multiVals argument for custom behaviors. The function must take a single argument and return a single value. This function will be applied to all the elements and will serve a 'rule' that for which thing to keep when there is more than one element. So for example this example function will always grab the last element in each result: last <- function(x){x[[length(x)]]}

selectByRanges(x, ranges, columns, overlaps, ignore.strand): When all the appropriate arguments are specified, selectByRanges will return an annotated GRanges object that has been generated based on what you passed in to the ranges argument and whether that overlapped with what you specified in the overlaps argument. Internally this function will get annotation features and overlaps by calling the appropriate annotation methods indicated by the overlaps argument. The value for overlaps can be any of: gene, tx, exons, cds, 5utr, introns or 3utr. The default value is 'tx' which will return to you, your annotated ranges based on whether the overlapped with the transcript ranges of any gene in the associated TxDb object based on the gene models it contains. Also: the number of ranges returned to you will match the number of genes that your ranges argument overlapped for the type of overlap that you specified. So if some of your ranges are large and overlap several features then you will get many duplicated ranges returned with one for each gene that has an overlapping feature. The columns values that you request will be returned in the mcols for the annotated GRanges object that is the return value for this function. Finally, the ignore.strand argument is provided to indicate whether or not findOverlaps should ignore or respect the strand.

selectRangesById(x, keys, columns, keytype, feature): When all the appropriate arguments are specified, selectRangesById will return a GRangesList object that correspond to gene models GRanges for the keys that you specify with the keys and keytype arguments. The annotation ranges retrieved for this will be specified by the feature argument and can be: gene, tx, exon or cds. The default is 'tx' which will return the transcript ranges for each gene as a GRanges object in the list. Extra data can also be returned in the mcols values for those GRanges by using the columns argument.

resources(x): shows where the db files are for resources that are used to store the data for the OrganismDb object.

TxDb(x): Accessor for the TxDb object of a OrganismDb object.

TxDb(x) <- value: Allows you to swap in an alternative TxDb for a given OrganismDb object. This is most often useful when combined with saveDb(TxDb, file), which returns the saved TxDb, so that you can save a TxDb to disc and then assign the saved version right into your OrganismDb object.

Author(s)

Marc Carlson

See Also

  • AnnotationDb-class for more descriptsion of methods select,keytypes,keys and columns.

  • makeOrganismPackage for functions used to generate an OrganismDb based package.

  • rangeBasedAccessors for the range based methods used in extracting data from a OrganismDb object.

  • seqlevels .

  • seqlengths .

  • isCircular .

  • genome .

Examples

## load a package that creates an OrganismDb
library(Homo.sapiens)
ls(2)
## then the methods can be used on this object.
columns <- columns(Homo.sapiens)[c(7,10,11,12)]
keys <- head(keys(org.Hs.eg.db, "ENTREZID"))
keytype <- "ENTREZID"
res <- select(Homo.sapiens, keys, columns, keytype)
head(res)
res <-  mapIds(Homo.sapiens, keys=c('1','10'), column='ALIAS',
               keytype='ENTREZID',  multiVals="CharacterList")

## get symbols for ranges in question:
ranges <-  GRanges(seqnames=Rle(c('chr11'), c(2)),
                    IRanges(start=c(107899550, 108025550),
                            end=c(108291889, 108050000)), strand='*',
                    seqinfo=seqinfo(Homo.sapiens))
selectByRanges(Homo.sapiens, ranges, 'SYMBOL')

## Or extract the gene model for the 'A1BG' gene:
selectRangesById(Homo.sapiens, 'A1BG', keytype='SYMBOL')


## Get the DB connections or DB file paths associated with those for
## each.
dbconn(Homo.sapiens)
dbfile(Homo.sapiens)

## extract the taxonomyId
taxonomyId(Homo.sapiens)

##extract the resources
resources(Homo.sapiens)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(OrganismDbi)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: GenomicFeatures
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/OrganismDbi/OrganismDb.Rd_%03d_medium.png", width=480, height=480)
> ### Name: MultiDb-class
> ### Title: MultiDb and OrganismDb objects
> ### Aliases: MultiDb-class class:MultiDb MultiDb OrganismDb-class
> ###   class:OrganismDb OrganismDb columns,MultiDb-method
> ###   keytypes,MultiDb-method keys,MultiDb-method select,MultiDb-method
> ###   mapIds,MultiDb-method selectByRanges selectByRanges,MultiDb-method
> ###   selectRangesById selectRangesById,MultiDb-method
> ###   metadata,MultiDb-method dbconn,MultiDb-method dbfile,MultiDb-method
> ###   taxonomyId,MultiDb-method seqinfo,MultiDb-method TxDb
> ###   TxDb,OrganismDb-method TxDb<- TxDb<-,OrganismDb-method resources
> ###   resources,MultiDb-method
> 
> ### ** Examples
> 
> ## load a package that creates an OrganismDb
> library(Homo.sapiens)
Loading required package: GO.db

Loading required package: org.Hs.eg.db

Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene
> ls(2)
[1] "Homo.sapiens"
> ## then the methods can be used on this object.
> columns <- columns(Homo.sapiens)[c(7,10,11,12)]
> keys <- head(keys(org.Hs.eg.db, "ENTREZID"))
> keytype <- "ENTREZID"
> res <- select(Homo.sapiens, keys, columns, keytype)
'select()' returned 1:many mapping between keys and columns
> head(res)
  ENTREZID         ENSEMBL ENSEMBLPROT ENSEMBLTRANS CDSSTART
1        1 ENSG00000121410        <NA>         <NA> 58864770
2        1 ENSG00000121410        <NA>         <NA> 58864658
3        1 ENSG00000121410        <NA>         <NA> 58864294
4        1 ENSG00000121410        <NA>         <NA> 58863649
5        1 ENSG00000121410        <NA>         <NA> 58862757
6        1 ENSG00000121410        <NA>         <NA> 58861736
> res <-  mapIds(Homo.sapiens, keys=c('1','10'), column='ALIAS',
+                keytype='ENTREZID',  multiVals="CharacterList")
'select()' returned 1:many mapping between keys and columns
> 
> ## get symbols for ranges in question:
> ranges <-  GRanges(seqnames=Rle(c('chr11'), c(2)),
+                     IRanges(start=c(107899550, 108025550),
+                             end=c(108291889, 108050000)), strand='*',
+                     seqinfo=seqinfo(Homo.sapiens))
> selectByRanges(Homo.sapiens, ranges, 'SYMBOL')
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:1 mapping between keys and columns
GRanges object with 6 ranges and 1 metadata column:
      seqnames                 ranges strand |          SYMBOL
         <Rle>              <IRanges>  <Rle> | <CharacterList>
  [1]    chr11 [107899550, 108291889]      * |        C11orf65
  [2]    chr11 [107899550, 108291889]      * |           ACAT1
  [3]    chr11 [107899550, 108291889]      * |             ATM
  [4]    chr11 [107899550, 108291889]      * |            NPAT
  [5]    chr11 [107899550, 108291889]      * |            CUL5
  [6]    chr11 [108025550, 108050000]      * |            NPAT
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome
> 
> ## Or extract the gene model for the 'A1BG' gene:
> selectRangesById(Homo.sapiens, 'A1BG', keytype='SYMBOL')
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:1 mapping between keys and columns
GRangesList object of length 1:
$A1BG 
GRanges object with 2 ranges and 2 metadata columns:
      seqnames               ranges strand |     tx_id     tx_name
         <Rle>            <IRanges>  <Rle> | <integer> <character>
  [1]    chr19 [58858172, 58864865]      - |     70455  uc002qsd.4
  [2]    chr19 [58859832, 58874214]      - |     70456  uc002qsf.2

-------
seqinfo: 93 sequences (1 circular) from hg19 genome
> 
> 
> ## Get the DB connections or DB file paths associated with those for
> ## each.
> dbconn(Homo.sapiens)
$GO.db
<SQLiteConnection>

$org.Hs.eg.db
<SQLiteConnection>

$TxDb.Hsapiens.UCSC.hg19.knownGene
<SQLiteConnection>

> dbfile(Homo.sapiens)
                                                                                                                GO.db 
                                                           "/home/ddbj/local/lib64/R/library/GO.db/extdata/GO.sqlite" 
                                                                                                         org.Hs.eg.db 
                                             "/home/ddbj/local/lib64/R/library/org.Hs.eg.db/extdata/org.Hs.eg.sqlite" 
                                                                                    TxDb.Hsapiens.UCSC.hg19.knownGene 
"/home/ddbj/local/lib64/R/library/TxDb.Hsapiens.UCSC.hg19.knownGene/extdata/TxDb.Hsapiens.UCSC.hg19.knownGene.sqlite" 
> 
> ## extract the taxonomyId
> taxonomyId(Homo.sapiens)
[1] 9606
> 
> ##extract the resources
> resources(Homo.sapiens)
                            GO.db                      org.Hs.eg.db 
                               ""                                "" 
TxDb.Hsapiens.UCSC.hg19.knownGene 
                               "" 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>