R: Full genome sequences for Homo sapiens (UCSC version hg18)
BSgenome.Hsapiens.UCSC.hg18
R Documentation
Full genome sequences for Homo sapiens (UCSC version hg18)
Description
Full genome sequences for Homo sapiens (Human) as provided by UCSC (hg18, Mar. 2006) and stored in Biostrings objects.
Note
This BSgenome data package was made from the following source data files:
chromFa.zip from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/
See ?BSgenomeForge and the BSgenomeForge
vignette (vignette("BSgenomeForge")) in the BSgenome
software package for how to make a BSgenome data package.
Author(s)
The Bioconductor Dev Team
See Also
BSgenome objects and the
available.genomes function
in the BSgenome software package.
DNAString objects in the Biostrings
package.
The BSgenomeForge vignette (vignette("BSgenomeForge"))
in the BSgenome software package for how to make a BSgenome
data package.
Examples
BSgenome.Hsapiens.UCSC.hg18
genome <- BSgenome.Hsapiens.UCSC.hg18
seqlengths(genome)
genome$chr1 # same as genome[["chr1"]]
## ---------------------------------------------------------------------
## Upstream sequences
## ---------------------------------------------------------------------
## Starting with BioC 3.0, the upstream1000, upstream2000, and
## upstream5000 sequences for hg18 are not included in the BSgenome data
## package anymore. However they can easily be extracted from the full
## genome sequences with something like:
library(TxDb.Hsapiens.UCSC.hg18.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg18.knownGene
gn <- sort(genes(txdb))
up1000 <- flank(gn, width=1000)
up1000seqs <- getSeq(genome, up1000)
## IMPORTANT: Make sure you use a TxDb package (or TranscriptDb object),
## that contains a gene model based on the exact same reference genome
## as the BSgenome object you pass to getSeq(). Note that you can make
## your own custom TranscriptDb object from various annotation resources.
## See the makeTranscriptDbFromUCSC(), makeTranscriptDbFromBiomart(),
## and makeTranscriptDbFromGFF() functions in the GenomicFeatures
## package.
## ---------------------------------------------------------------------
## Genome-wide motif searching
## ---------------------------------------------------------------------
## See the GenomeSearching vignette in the BSgenome software
## package for some examples of genome-wide motif searching using
## Biostrings and the BSgenome data packages:
if (interactive())
vignette("GenomeSearching", package="BSgenome")
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(BSgenome.Hsapiens.UCSC.hg18)
Loading required package: BSgenome
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BSgenome.Hsapiens.UCSC.hg18/package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: BSgenome.Hsapiens.UCSC.hg18
> ### Title: Full genome sequences for Homo sapiens (UCSC version hg18)
> ### Aliases: BSgenome.Hsapiens.UCSC.hg18-package
> ### BSgenome.Hsapiens.UCSC.hg18 Hsapiens
> ### Keywords: package data
>
> ### ** Examples
>
> BSgenome.Hsapiens.UCSC.hg18
Human genome:
# organism: Homo sapiens (Human)
# provider: UCSC
# provider version: hg18
# release date: Mar. 2006
# release name: NCBI Build 36.1
# 49 sequences:
# chr1 chr2 chr3 chr4 chr5
# chr6 chr7 chr8 chr9 chr10
# chr11 chr12 chr13 chr14 chr15
# chr16 chr17 chr18 chr19 chr20
# chr21 chr22 chrX chrY chrM
# chr5_h2_hap1 chr6_cox_hap1 chr6_qbl_hap2 chr22_h2_hap1 chr1_random
# chr2_random chr3_random chr4_random chr5_random chr6_random
# chr7_random chr8_random chr9_random chr10_random chr11_random
# chr13_random chr15_random chr16_random chr17_random chr18_random
# chr19_random chr21_random chr22_random chrX_random
# (use 'seqnames()' to see all the sequence names, use the '$' or '[[' operator
# to access a given sequence)
> genome <- BSgenome.Hsapiens.UCSC.hg18
> seqlengths(genome)
chr1 chr2 chr3 chr4 chr5
247249719 242951149 199501827 191273063 180857866
chr6 chr7 chr8 chr9 chr10
170899992 158821424 146274826 140273252 135374737
chr11 chr12 chr13 chr14 chr15
134452384 132349534 114142980 106368585 100338915
chr16 chr17 chr18 chr19 chr20
88827254 78774742 76117153 63811651 62435964
chr21 chr22 chrX chrY chrM
46944323 49691432 154913754 57772954 16571
chr5_h2_hap1 chr6_cox_hap1 chr6_qbl_hap2 chr22_h2_hap1 chr1_random
1794870 4731698 4565931 63661 1663265
chr2_random chr3_random chr4_random chr5_random chr6_random
185571 749256 842648 143687 1875562
chr7_random chr8_random chr9_random chr10_random chr11_random
549659 943810 1146434 113275 215294
chr13_random chr15_random chr16_random chr17_random chr18_random
186858 784346 105485 2617613 4262
chr19_random chr21_random chr22_random chrX_random
301858 1679693 257318 1719168
> genome$chr1 # same as genome[["chr1"]]
247249719-letter "DNAString" instance
seq: TAACCCTAACCCTAACCCTAACCCTAACCCTAACCC...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>
> ## ---------------------------------------------------------------------
> ## Upstream sequences
> ## ---------------------------------------------------------------------
> ## Starting with BioC 3.0, the upstream1000, upstream2000, and
> ## upstream5000 sequences for hg18 are not included in the BSgenome data
> ## package anymore. However they can easily be extracted from the full
> ## genome sequences with something like:
>
> library(TxDb.Hsapiens.UCSC.hg18.knownGene)
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> txdb <- TxDb.Hsapiens.UCSC.hg18.knownGene
> gn <- sort(genes(txdb))
> up1000 <- flank(gn, width=1000)
> up1000seqs <- getSeq(genome, up1000)
>
> ## IMPORTANT: Make sure you use a TxDb package (or TranscriptDb object),
> ## that contains a gene model based on the exact same reference genome
> ## as the BSgenome object you pass to getSeq(). Note that you can make
> ## your own custom TranscriptDb object from various annotation resources.
> ## See the makeTranscriptDbFromUCSC(), makeTranscriptDbFromBiomart(),
> ## and makeTranscriptDbFromGFF() functions in the GenomicFeatures
> ## package.
>
> ## ---------------------------------------------------------------------
> ## Genome-wide motif searching
> ## ---------------------------------------------------------------------
> ## See the GenomeSearching vignette in the BSgenome software
> ## package for some examples of genome-wide motif searching using
> ## Biostrings and the BSgenome data packages:
> #if (interactive())
> vignette("GenomeSearching", package="BSgenome")
>
>
>
>
>
> dev.off()
null device
1
>
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/%{ <-- HERE (.*?)}/ at /usr/bin/run-mailcap line 528.