Last data update: 2014.03.03

R: Full genome sequences for Drosophila melanogaster (UCSC...
BSgenome.Dmelanogaster.UCSC.dm3R Documentation

Full genome sequences for Drosophila melanogaster (UCSC version dm3)

Description

Full genome sequences for Drosophila melanogaster (Fly) as provided by UCSC (dm3, Apr. 2006) and stored in Biostrings objects.

Note

This BSgenome data package was made from the following source data files:

chromFa.tar.gz from http://hgdownload.cse.ucsc.edu/goldenPath/dm3/bigZips/
  

See ?BSgenomeForge and the BSgenomeForge vignette (vignette("BSgenomeForge")) in the BSgenome software package for how to make a BSgenome data package.

Author(s)

The Bioconductor Dev Team

See Also

  • BSgenome objects and the available.genomes function in the BSgenome software package.

  • DNAString objects in the Biostrings package.

  • The BSgenomeForge vignette (vignette("BSgenomeForge")) in the BSgenome software package for how to make a BSgenome data package.

Examples

BSgenome.Dmelanogaster.UCSC.dm3
genome <- BSgenome.Dmelanogaster.UCSC.dm3
seqlengths(genome)
genome$chr2L  # same as genome[["chr2L"]]

## ---------------------------------------------------------------------
## Upstream sequences
## ---------------------------------------------------------------------
## Starting with BioC 3.0, the upstream1000, upstream2000, and
## upstream5000 sequences for dm3 are not included in the BSgenome data
## package anymore. However they can easily be extracted from the full
## genome sequences with something like:

library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
gn <- sort(genes(txdb))
up1000 <- flank(gn, width=1000)
up1000seqs <- getSeq(genome, up1000)

## IMPORTANT: Make sure you use a TxDb package (or TranscriptDb object),
## that contains a gene model based on the exact same reference genome
## as the BSgenome object you pass to getSeq(). Note that you can make
## your own custom TranscriptDb object from various annotation resources.
## See the makeTranscriptDbFromUCSC(), makeTranscriptDbFromBiomart(),
## and makeTranscriptDbFromGFF() functions in the GenomicFeatures
## package.

## ---------------------------------------------------------------------
## Genome-wide motif searching
## ---------------------------------------------------------------------
## See the GenomeSearching vignette in the BSgenome software
## package for some examples of genome-wide motif searching using
## Biostrings and the BSgenome data packages:
if (interactive())
    vignette("GenomeSearching", package="BSgenome")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(BSgenome.Dmelanogaster.UCSC.dm3)
Loading required package: BSgenome
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BSgenome.Dmelanogaster.UCSC.dm3/package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: BSgenome.Dmelanogaster.UCSC.dm3
> ### Title: Full genome sequences for Drosophila melanogaster (UCSC version
> ###   dm3)
> ### Aliases: BSgenome.Dmelanogaster.UCSC.dm3-package
> ###   BSgenome.Dmelanogaster.UCSC.dm3 Dmelanogaster
> ### Keywords: package data
> 
> ### ** Examples
> 
> BSgenome.Dmelanogaster.UCSC.dm3
Fly genome:
# organism: Drosophila melanogaster (Fly)
# provider: UCSC
# provider version: dm3
# release date: Apr. 2006
# release name: BDGP Release 5
# 15 sequences:
#   chr2L     chr2R     chr3L     chr3R     chr4      chrX      chrU     
#   chrM      chr2LHet  chr2RHet  chr3LHet  chr3RHet  chrXHet   chrYHet  
#   chrUextra                                                            
# (use 'seqnames()' to see all the sequence names, use the '$' or '[[' operator
# to access a given sequence)
> genome <- BSgenome.Dmelanogaster.UCSC.dm3
> seqlengths(genome)
    chr2L     chr2R     chr3L     chr3R      chr4      chrX      chrU      chrM 
 23011544  21146708  24543557  27905053   1351857  22422827  10049037     19517 
 chr2LHet  chr2RHet  chr3LHet  chr3RHet   chrXHet   chrYHet chrUextra 
   368872   3288761   2555491   2517507    204112    347038  29004656 
> genome$chr2L  # same as genome[["chr2L"]]
  23011544-letter "DNAString" instance
seq: CGACAATGCACGACAGAGGAAGCAGAACAGATATTT...GCATATTTGCAAATTTTGATGAACCCCCCTTTCAAA
> 
> ## ---------------------------------------------------------------------
> ## Upstream sequences
> ## ---------------------------------------------------------------------
> ## Starting with BioC 3.0, the upstream1000, upstream2000, and
> ## upstream5000 sequences for dm3 are not included in the BSgenome data
> ## package anymore. However they can easily be extracted from the full
> ## genome sequences with something like:
> 
> library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
> gn <- sort(genes(txdb))
> up1000 <- flank(gn, width=1000)
Warning message:
In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 3 out-of-bound ranges located on sequences
  chr3R, chr3LHet, and chrYHet. Note that only ranges located on a
  non-circular sequence whose length is not NA can be considered
  out-of-bound (use seqlengths() and isCircular() to get the lengths and
  circularity flags of the underlying sequences). You can use trim() to
  trim these ranges. See ?`trim,GenomicRanges-method` for more
  information.
> up1000seqs <- getSeq(genome, up1000)
Error in loadFUN(x, seqname, ranges) : 
  trying to load regions beyond the boundaries of non-circular sequence "chr3R"
Calls: getSeq ... loadSubseqsFromStrandedSequence -> loadFUN -> loadFUN
Execution halted