R: Full genome sequences for Drosophila melanogaster (UCSC...
BSgenome.Dmelanogaster.UCSC.dm3
R Documentation
Full genome sequences for Drosophila melanogaster (UCSC version dm3)
Description
Full genome sequences for Drosophila melanogaster (Fly) as provided by UCSC (dm3, Apr. 2006) and stored in Biostrings objects.
Note
This BSgenome data package was made from the following source data files:
chromFa.tar.gz from http://hgdownload.cse.ucsc.edu/goldenPath/dm3/bigZips/
See ?BSgenomeForge and the BSgenomeForge
vignette (vignette("BSgenomeForge")) in the BSgenome
software package for how to make a BSgenome data package.
Author(s)
The Bioconductor Dev Team
See Also
BSgenome objects and the
available.genomes function
in the BSgenome software package.
DNAString objects in the Biostrings
package.
The BSgenomeForge vignette (vignette("BSgenomeForge"))
in the BSgenome software package for how to make a BSgenome
data package.
Examples
BSgenome.Dmelanogaster.UCSC.dm3
genome <- BSgenome.Dmelanogaster.UCSC.dm3
seqlengths(genome)
genome$chr2L # same as genome[["chr2L"]]
## ---------------------------------------------------------------------
## Upstream sequences
## ---------------------------------------------------------------------
## Starting with BioC 3.0, the upstream1000, upstream2000, and
## upstream5000 sequences for dm3 are not included in the BSgenome data
## package anymore. However they can easily be extracted from the full
## genome sequences with something like:
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
gn <- sort(genes(txdb))
up1000 <- flank(gn, width=1000)
up1000seqs <- getSeq(genome, up1000)
## IMPORTANT: Make sure you use a TxDb package (or TranscriptDb object),
## that contains a gene model based on the exact same reference genome
## as the BSgenome object you pass to getSeq(). Note that you can make
## your own custom TranscriptDb object from various annotation resources.
## See the makeTranscriptDbFromUCSC(), makeTranscriptDbFromBiomart(),
## and makeTranscriptDbFromGFF() functions in the GenomicFeatures
## package.
## ---------------------------------------------------------------------
## Genome-wide motif searching
## ---------------------------------------------------------------------
## See the GenomeSearching vignette in the BSgenome software
## package for some examples of genome-wide motif searching using
## Biostrings and the BSgenome data packages:
if (interactive())
vignette("GenomeSearching", package="BSgenome")
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(BSgenome.Dmelanogaster.UCSC.dm3)
Loading required package: BSgenome
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BSgenome.Dmelanogaster.UCSC.dm3/package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: BSgenome.Dmelanogaster.UCSC.dm3
> ### Title: Full genome sequences for Drosophila melanogaster (UCSC version
> ### dm3)
> ### Aliases: BSgenome.Dmelanogaster.UCSC.dm3-package
> ### BSgenome.Dmelanogaster.UCSC.dm3 Dmelanogaster
> ### Keywords: package data
>
> ### ** Examples
>
> BSgenome.Dmelanogaster.UCSC.dm3
Fly genome:
# organism: Drosophila melanogaster (Fly)
# provider: UCSC
# provider version: dm3
# release date: Apr. 2006
# release name: BDGP Release 5
# 15 sequences:
# chr2L chr2R chr3L chr3R chr4 chrX chrU
# chrM chr2LHet chr2RHet chr3LHet chr3RHet chrXHet chrYHet
# chrUextra
# (use 'seqnames()' to see all the sequence names, use the '$' or '[[' operator
# to access a given sequence)
> genome <- BSgenome.Dmelanogaster.UCSC.dm3
> seqlengths(genome)
chr2L chr2R chr3L chr3R chr4 chrX chrU chrM
23011544 21146708 24543557 27905053 1351857 22422827 10049037 19517
chr2LHet chr2RHet chr3LHet chr3RHet chrXHet chrYHet chrUextra
368872 3288761 2555491 2517507 204112 347038 29004656
> genome$chr2L # same as genome[["chr2L"]]
23011544-letter "DNAString" instance
seq: CGACAATGCACGACAGAGGAAGCAGAACAGATATTT...GCATATTTGCAAATTTTGATGAACCCCCCTTTCAAA
>
> ## ---------------------------------------------------------------------
> ## Upstream sequences
> ## ---------------------------------------------------------------------
> ## Starting with BioC 3.0, the upstream1000, upstream2000, and
> ## upstream5000 sequences for dm3 are not included in the BSgenome data
> ## package anymore. However they can easily be extracted from the full
> ## genome sequences with something like:
>
> library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
> gn <- sort(genes(txdb))
> up1000 <- flank(gn, width=1000)
Warning message:
In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
GRanges object contains 3 out-of-bound ranges located on sequences
chr3R, chr3LHet, and chrYHet. Note that only ranges located on a
non-circular sequence whose length is not NA can be considered
out-of-bound (use seqlengths() and isCircular() to get the lengths and
circularity flags of the underlying sequences). You can use trim() to
trim these ranges. See ?`trim,GenomicRanges-method` for more
information.
> up1000seqs <- getSeq(genome, up1000)
Error in loadFUN(x, seqname, ranges) :
trying to load regions beyond the boundaries of non-circular sequence "chr3R"
Calls: getSeq ... loadSubseqsFromStrandedSequence -> loadFUN -> loadFUN
Execution halted