R: Full genome sequences for Homo sapiens (UCSC version hg19)
BSgenome.Hsapiens.UCSC.hg19
R Documentation
Full genome sequences for Homo sapiens (UCSC version hg19)
Description
Full genome sequences for Homo sapiens (Human) as provided by UCSC (hg19, Feb. 2009) and stored in Biostrings objects.
Note
This BSgenome data package was made from the following source data files:
chromFa.zip from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/
See ?BSgenomeForge and the BSgenomeForge
vignette (vignette("BSgenomeForge")) in the BSgenome
software package for how to make a BSgenome data package.
Author(s)
The Bioconductor Dev Team
See Also
BSgenome objects and the
available.genomes function
in the BSgenome software package.
DNAString objects in the Biostrings
package.
The BSgenomeForge vignette (vignette("BSgenomeForge"))
in the BSgenome software package for how to make a BSgenome
data package.
Examples
BSgenome.Hsapiens.UCSC.hg19
genome <- BSgenome.Hsapiens.UCSC.hg19
seqlengths(genome)
genome$chr1 # same as genome[["chr1"]]
## ---------------------------------------------------------------------
## Upstream sequences
## ---------------------------------------------------------------------
## Starting with BioC 3.0, the upstream1000, upstream2000, and
## upstream5000 sequences for hg19 are not included in the BSgenome data
## package anymore. However they can easily be extracted from the full
## genome sequences with something like:
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
gn <- sort(genes(txdb))
up1000 <- flank(gn, width=1000)
up1000seqs <- getSeq(genome, up1000)
## IMPORTANT: Make sure you use a TxDb package (or TranscriptDb object),
## that contains a gene model based on the exact same reference genome
## as the BSgenome object you pass to getSeq(). Note that you can make
## your own custom TranscriptDb object from various annotation resources.
## See the makeTranscriptDbFromUCSC(), makeTranscriptDbFromBiomart(),
## and makeTranscriptDbFromGFF() functions in the GenomicFeatures
## package.
## ---------------------------------------------------------------------
## Genome-wide motif searching
## ---------------------------------------------------------------------
## See the GenomeSearching vignette in the BSgenome software
## package for some examples of genome-wide motif searching using
## Biostrings and the BSgenome data packages:
if (interactive())
vignette("GenomeSearching", package="BSgenome")
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(BSgenome.Hsapiens.UCSC.hg19)
Loading required package: BSgenome
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BSgenome.Hsapiens.UCSC.hg19/package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: BSgenome.Hsapiens.UCSC.hg19
> ### Title: Full genome sequences for Homo sapiens (UCSC version hg19)
> ### Aliases: BSgenome.Hsapiens.UCSC.hg19-package
> ### BSgenome.Hsapiens.UCSC.hg19 Hsapiens
> ### Keywords: package data
>
> ### ** Examples
>
> BSgenome.Hsapiens.UCSC.hg19
Human genome:
# organism: Homo sapiens (Human)
# provider: UCSC
# provider version: hg19
# release date: Feb. 2009
# release name: Genome Reference Consortium GRCh37
# 93 sequences:
# chr1 chr2 chr3
# chr4 chr5 chr6
# chr7 chr8 chr9
# chr10 chr11 chr12
# chr13 chr14 chr15
# ... ... ...
# chrUn_gl000235 chrUn_gl000236 chrUn_gl000237
# chrUn_gl000238 chrUn_gl000239 chrUn_gl000240
# chrUn_gl000241 chrUn_gl000242 chrUn_gl000243
# chrUn_gl000244 chrUn_gl000245 chrUn_gl000246
# chrUn_gl000247 chrUn_gl000248 chrUn_gl000249
# (use 'seqnames()' to see all the sequence names, use the '$' or '[[' operator
# to access a given sequence)
> genome <- BSgenome.Hsapiens.UCSC.hg19
> seqlengths(genome)
chr1 chr2 chr3
249250621 243199373 198022430
chr4 chr5 chr6
191154276 180915260 171115067
chr7 chr8 chr9
159138663 146364022 141213431
chr10 chr11 chr12
135534747 135006516 133851895
chr13 chr14 chr15
115169878 107349540 102531392
chr16 chr17 chr18
90354753 81195210 78077248
chr19 chr20 chr21
59128983 63025520 48129895
chr22 chrX chrY
51304566 155270560 59373566
chrM chr1_gl000191_random chr1_gl000192_random
16571 106433 547496
chr4_ctg9_hap1 chr4_gl000193_random chr4_gl000194_random
590426 189789 191469
chr6_apd_hap1 chr6_cox_hap2 chr6_dbb_hap3
4622290 4795371 4610396
chr6_mann_hap4 chr6_mcf_hap5 chr6_qbl_hap6
4683263 4833398 4611984
chr6_ssto_hap7 chr7_gl000195_random chr8_gl000196_random
4928567 182896 38914
chr8_gl000197_random chr9_gl000198_random chr9_gl000199_random
37175 90085 169874
chr9_gl000200_random chr9_gl000201_random chr11_gl000202_random
187035 36148 40103
chr17_ctg5_hap1 chr17_gl000203_random chr17_gl000204_random
1680828 37498 81310
chr17_gl000205_random chr17_gl000206_random chr18_gl000207_random
174588 41001 4262
chr19_gl000208_random chr19_gl000209_random chr21_gl000210_random
92689 159169 27682
chrUn_gl000211 chrUn_gl000212 chrUn_gl000213
166566 186858 164239
chrUn_gl000214 chrUn_gl000215 chrUn_gl000216
137718 172545 172294
chrUn_gl000217 chrUn_gl000218 chrUn_gl000219
172149 161147 179198
chrUn_gl000220 chrUn_gl000221 chrUn_gl000222
161802 155397 186861
chrUn_gl000223 chrUn_gl000224 chrUn_gl000225
180455 179693 211173
chrUn_gl000226 chrUn_gl000227 chrUn_gl000228
15008 128374 129120
chrUn_gl000229 chrUn_gl000230 chrUn_gl000231
19913 43691 27386
chrUn_gl000232 chrUn_gl000233 chrUn_gl000234
40652 45941 40531
chrUn_gl000235 chrUn_gl000236 chrUn_gl000237
34474 41934 45867
chrUn_gl000238 chrUn_gl000239 chrUn_gl000240
39939 33824 41933
chrUn_gl000241 chrUn_gl000242 chrUn_gl000243
42152 43523 43341
chrUn_gl000244 chrUn_gl000245 chrUn_gl000246
39929 36651 38154
chrUn_gl000247 chrUn_gl000248 chrUn_gl000249
36422 39786 38502
> genome$chr1 # same as genome[["chr1"]]
249250621-letter "DNAString" instance
seq: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>
> ## ---------------------------------------------------------------------
> ## Upstream sequences
> ## ---------------------------------------------------------------------
> ## Starting with BioC 3.0, the upstream1000, upstream2000, and
> ## upstream5000 sequences for hg19 are not included in the BSgenome data
> ## package anymore. However they can easily be extracted from the full
> ## genome sequences with something like:
>
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> gn <- sort(genes(txdb))
> up1000 <- flank(gn, width=1000)
> up1000seqs <- getSeq(genome, up1000)
>
> ## IMPORTANT: Make sure you use a TxDb package (or TranscriptDb object),
> ## that contains a gene model based on the exact same reference genome
> ## as the BSgenome object you pass to getSeq(). Note that you can make
> ## your own custom TranscriptDb object from various annotation resources.
> ## See the makeTranscriptDbFromUCSC(), makeTranscriptDbFromBiomart(),
> ## and makeTranscriptDbFromGFF() functions in the GenomicFeatures
> ## package.
>
> ## ---------------------------------------------------------------------
> ## Genome-wide motif searching
> ## ---------------------------------------------------------------------
> ## See the GenomeSearching vignette in the BSgenome software
> ## package for some examples of genome-wide motif searching using
> ## Biostrings and the BSgenome data packages:
> #if (interactive())
> vignette("GenomeSearching", package="BSgenome")
>
>
>
>
>
> dev.off()
null device
1
>
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/%{ <-- HERE (.*?)}/ at /usr/bin/run-mailcap line 528.