Last data update: 2014.03.03

R: Make a OrganismDb object from annotations available at the...
makeOrganismDbFromUCSCR Documentation

Make a OrganismDb object from annotations available at the UCSC Genome Browser

Description

The makeOrganismDbFromUCSC function allows the user to make a OrganismDb object from transcript annotations available at the UCSC Genome Browser.

Usage

makeOrganismDbFromUCSC(
        genome="hg19",
        tablename="knownGene",
        transcript_ids=NULL,
        circ_seqs=DEFAULT_CIRC_SEQS,
        url="http://genome.ucsc.edu/cgi-bin/",
        goldenPath_url="http://hgdownload.cse.ucsc.edu/goldenPath",
        miRBaseBuild=NA)

Arguments

genome

genome abbreviation used by UCSC and obtained by ucscGenomes()[ , "db"]. For example: "hg19".

tablename

name of the UCSC table containing the transcript annotations to retrieve. Use the supportedUCSCtables utility function to get the list of supported tables. Note that not all tables are available for all genomes.

transcript_ids

optionally, only retrieve transcript annotation data for the specified set of transcript ids. If this is used, then the meta information displayed for the resulting OrganismDb object will say 'Full dataset: no'. Otherwise it will say 'Full dataset: yes'.

circ_seqs

a character vector to list out which chromosomes should be marked as circular.

url,goldenPath_url

use to specify the location of an alternate UCSC Genome Browser.

miRBaseBuild

specify the string for the appropriate build Information from mirbase.db to use for microRNAs. This can be learned by calling supportedMiRBaseBuildValues. By default, this value will be set to NA, which will inactivate the microRNAs accessor.

Details

makeOrganismDbFromUCSC is a convenience function that feeds data from the UCSC source to the lower level OrganismDb function. See ?makeOrganismDbFromBiomart for a similar function that feeds data from a BioMart database.

Value

A OrganismDb object.

Author(s)

M. Carlson and H. Pages

See Also

  • makeOrganismDbFromBiomart for convenient ways to make a OrganismDb object from BioMart online resources.

  • ucscGenomes in the rtracklayer package.

  • DEFAULT_CIRC_SEQS.

  • The supportedMiRBaseBuildValues function for listing all the possible values for the miRBaseBuild argument.

  • The OrganismDb class.

Examples

## Display the list of genomes available at UCSC:
library(rtracklayer)
ucscGenomes()[ , "db"]

## Display the list of tables supported by makeOrganismDbFromUCSC():
supportedUCSCtables()

## Not run: 
## Retrieving a full transcript dataset for Yeast from UCSC:
odb1 <- makeOrganismDbFromUCSC(genome="sacCer2", tablename="ensGene")

## End(Not run)

## Retrieving an incomplete transcript dataset for Mouse from UCSC
## (only transcripts linked to Entrez Gene ID 22290):
transcript_ids <- c(
    "uc009uzf.1",
    "uc009uzg.1",
    "uc009uzh.1",
    "uc009uzi.1",
    "uc009uzj.1"
)

odb2 <- makeOrganismDbFromUCSC(genome="mm9", tablename="knownGene",
                          transcript_ids=transcript_ids)
odb2

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(OrganismDbi)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: GenomicFeatures
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/OrganismDbi/makeOrganismDbFromUCSC.Rd_%03d_medium.png", width=480, height=480)
> ### Name: makeOrganismDbFromUCSC
> ### Title: Make a OrganismDb object from annotations available at the UCSC
> ###   Genome Browser
> ### Aliases: makeOrganismDbFromUCSC
> 
> ### ** Examples
> 
> ## Display the list of genomes available at UCSC:
> library(rtracklayer)
> ucscGenomes()[ , "db"]
  [1] "hg38"     "hg19"     "hg18"     "hg17"     "hg16"     "vicPac2" 
  [7] "vicPac1"  "dasNov3"  "papHam1"  "panPan1"  "aptMan1"  "otoGar3" 
 [13] "papAnu2"  "felCat8"  "felCat5"  "felCat4"  "felCat3"  "panTro4" 
 [19] "panTro3"  "panTro2"  "panTro1"  "criGri1"  "bosTau8"  "bosTau7" 
 [25] "bosTau6"  "bosTau4"  "bosTau3"  "bosTau2"  "macFas5"  "canFam3" 
 [31] "canFam2"  "canFam1"  "turTru2"  "loxAfr3"  "musFur1"  "nomLeu3" 
 [37] "nomLeu2"  "nomLeu1"  "gorGor4"  "gorGor3"  "cavPor3"  "eriEur2" 
 [43] "eriEur1"  "equCab2"  "equCab1"  "dipOrd1"  "triMan1"  "calJac3" 
 [49] "calJac1"  "pteVam1"  "myoLuc2"  "balAcu1"  "mm10"     "mm9"     
 [55] "mm8"      "mm7"      "micMur2"  "micMur1"  "hetGla2"  "hetGla1" 
 [61] "monDom5"  "monDom4"  "monDom1"  "ponAbe2"  "ailMel1"  "susScr3" 
 [67] "susScr2"  "ochPri3"  "ochPri2"  "ornAna2"  "ornAna1"  "oryCun2" 
 [73] "rn6"      "rn5"      "rn4"      "rn3"      "rheMac8"  "rheMac3" 
 [79] "rheMac2"  "proCap1"  "oviAri3"  "oviAri1"  "sorAra2"  "sorAra1" 
 [85] "choHof1"  "speTri2"  "saiBol1"  "tarSyr2"  "tarSyr1"  "sarHar1" 
 [91] "echTel2"  "echTel1"  "tupBel1"  "macEug2"  "cerSim1"  "allMis1" 
 [97] "gadMor1"  "melUnd1"  "galGal4"  "galGal3"  "galGal2"  "latCha1" 
[103] "calMil1"  "fr3"      "fr2"      "fr1"      "petMar2"  "petMar1" 
[109] "anoCar2"  "anoCar1"  "oryLat2"  "geoFor1"  "oreNil2"  "chrPic1" 
[115] "gasAcu1"  "tetNig2"  "tetNig1"  "melGal1"  "xenTro7"  "xenTro3" 
[121] "xenTro2"  "xenTro1"  "taeGut2"  "taeGut1"  "danRer10" "danRer7" 
[127] "danRer6"  "danRer5"  "danRer4"  "danRer3"  "ci2"      "ci1"     
[133] "braFlo1"  "strPur2"  "strPur1"  "apiMel2"  "apiMel1"  "anoGam1" 
[139] "droAna2"  "droAna1"  "droEre1"  "droGri1"  "dm6"      "dm3"     
[145] "dm2"      "dm1"      "droMoj2"  "droMoj1"  "droPer1"  "dp3"     
[151] "dp2"      "droSec1"  "droSim1"  "droVir2"  "droVir1"  "droYak2" 
[157] "droYak1"  "caePb2"   "caePb1"   "cb3"      "cb1"      "ce11"    
[163] "ce10"     "ce6"      "ce4"      "ce2"      "caeJap1"  "caeRem3" 
[169] "caeRem2"  "priPac1"  "aplCal1"  "sacCer3"  "sacCer2"  "sacCer1" 
[175] "eboVir3" 
> 
> ## Display the list of tables supported by makeOrganismDbFromUCSC():
> supportedUCSCtables()
                                               track           subtrack
knownGene                                 UCSC Genes               <NA>
knownGeneOld3                         Old UCSC Genes               <NA>
ccdsGene                                        CCDS               <NA>
refGene                                 RefSeq Genes               <NA>
xenoRefGene                             Other RefSeq               <NA>
vegaGene                                  Vega Genes Vega Protein Genes
vegaPseudoGene                            Vega Genes   Vega Pseudogenes
ensGene                                Ensembl Genes               <NA>
acembly                                AceView Genes               <NA>
sibGene                                    SIB Genes               <NA>
nscanPasaGene                                 N-SCAN    N-SCAN PASA-EST
nscanGene                                     N-SCAN             N-SCAN
sgpGene                                    SGP Genes               <NA>
geneid                                  Geneid Genes               <NA>
genscan                                Genscan Genes               <NA>
exoniphy                                    Exoniphy               <NA>
augustusHints                               Augustus     Augustus Hints
augustusXRA                                 Augustus   Augustus De Novo
augustusAbinitio                            Augustus Augustus Ab Initio
acescan                                      ACEScan               <NA>
lincRNAsTranscripts              lincRNAsTranscripts               <NA>
wgEncodeGencodeManualV3                Gencode Genes     Gencode Manual
wgEncodeGencodeAutoV3                  Gencode Genes       Gencode Auto
wgEncodeGencodePolyaV3                 Gencode Genes      Gencode PolyA
wgEncodeGencodeBasicV19            GENCODE Genes V19               <NA>
wgEncodeGencodeCompV19             GENCODE Genes V19               <NA>
wgEncodeGencodePseudoGeneV19       GENCODE Genes V19               <NA>
wgEncodeGencode2wayConsPseudoV19   GENCODE Genes V19               <NA>
wgEncodeGencodePolyaV19            GENCODE Genes V19               <NA>
wgEncodeGencodeBasicV17            GENCODE Genes V17               <NA>
wgEncodeGencodeCompV17             GENCODE Genes V17               <NA>
wgEncodeGencodePseudoGeneV17       GENCODE Genes V17               <NA>
wgEncodeGencode2wayConsPseudoV17   GENCODE Genes V17               <NA>
wgEncodeGencodePolyaV17            GENCODE Genes V17               <NA>
wgEncodeGencodeBasicV14            GENCODE Genes V14               <NA>
wgEncodeGencodeCompV14             GENCODE Genes V14               <NA>
wgEncodeGencodePseudoGeneV14       GENCODE Genes V14               <NA>
wgEncodeGencode2wayConsPseudoV14   GENCODE Genes V14               <NA>
wgEncodeGencodePolyaV14            GENCODE Genes V14               <NA>
wgEncodeGencodeBasicV7              GENCODE Genes V7               <NA>
wgEncodeGencodeCompV7               GENCODE Genes V7               <NA>
wgEncodeGencodePseudoGeneV7         GENCODE Genes V7               <NA>
wgEncodeGencode2wayConsPseudoV7     GENCODE Genes V7               <NA>
wgEncodeGencodePolyaV7              GENCODE Genes V7               <NA>
flyBaseGene                            FlyBase Genes               <NA>
sgdGene                                    SGD Genes               <NA>
> 
> ## Not run: 
> ##D ## Retrieving a full transcript dataset for Yeast from UCSC:
> ##D odb1 <- makeOrganismDbFromUCSC(genome="sacCer2", tablename="ensGene")
> ## End(Not run)
> 
> ## Retrieving an incomplete transcript dataset for Mouse from UCSC
> ## (only transcripts linked to Entrez Gene ID 22290):
> transcript_ids <- c(
+     "uc009uzf.1",
+     "uc009uzg.1",
+     "uc009uzh.1",
+     "uc009uzi.1",
+     "uc009uzj.1"
+ )
> 
> odb2 <- makeOrganismDbFromUCSC(genome="mm9", tablename="knownGene",
+                           transcript_ids=transcript_ids)
Error in `genome<-`(`*tmp*`, value = "mm9") : 
  Failed to set session genome to 'mm9'
Calls: makeOrganismDbFromUCSC -> makeTxDbFromUCSC -> genome<- -> genome<-
Execution halted