Last data update: 2014.03.03

R: Full genome sequences for Homo sapiens (UCSC version hg19)
BSgenome.Hsapiens.UCSC.hg19R Documentation

Full genome sequences for Homo sapiens (UCSC version hg19)

Description

Full genome sequences for Homo sapiens (Human) as provided by UCSC (hg19, Feb. 2009) and stored in Biostrings objects.

Note

This BSgenome data package was made from the following source data files:

chromFa.zip from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/
  

See ?BSgenomeForge and the BSgenomeForge vignette (vignette("BSgenomeForge")) in the BSgenome software package for how to make a BSgenome data package.

Author(s)

The Bioconductor Dev Team

See Also

  • BSgenome objects and the available.genomes function in the BSgenome software package.

  • DNAString objects in the Biostrings package.

  • The BSgenomeForge vignette (vignette("BSgenomeForge")) in the BSgenome software package for how to make a BSgenome data package.

Examples

BSgenome.Hsapiens.UCSC.hg19
genome <- BSgenome.Hsapiens.UCSC.hg19
seqlengths(genome)
genome$chr1  # same as genome[["chr1"]]

## ---------------------------------------------------------------------
## Upstream sequences
## ---------------------------------------------------------------------
## Starting with BioC 3.0, the upstream1000, upstream2000, and
## upstream5000 sequences for hg19 are not included in the BSgenome data
## package anymore. However they can easily be extracted from the full
## genome sequences with something like:

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
gn <- sort(genes(txdb))
up1000 <- flank(gn, width=1000)
up1000seqs <- getSeq(genome, up1000)

## IMPORTANT: Make sure you use a TxDb package (or TranscriptDb object),
## that contains a gene model based on the exact same reference genome
## as the BSgenome object you pass to getSeq(). Note that you can make
## your own custom TranscriptDb object from various annotation resources.
## See the makeTranscriptDbFromUCSC(), makeTranscriptDbFromBiomart(),
## and makeTranscriptDbFromGFF() functions in the GenomicFeatures
## package.

## ---------------------------------------------------------------------
## Genome-wide motif searching
## ---------------------------------------------------------------------
## See the GenomeSearching vignette in the BSgenome software
## package for some examples of genome-wide motif searching using
## Biostrings and the BSgenome data packages:
if (interactive())
    vignette("GenomeSearching", package="BSgenome")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(BSgenome.Hsapiens.UCSC.hg19)
Loading required package: BSgenome
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BSgenome.Hsapiens.UCSC.hg19/package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: BSgenome.Hsapiens.UCSC.hg19
> ### Title: Full genome sequences for Homo sapiens (UCSC version hg19)
> ### Aliases: BSgenome.Hsapiens.UCSC.hg19-package
> ###   BSgenome.Hsapiens.UCSC.hg19 Hsapiens
> ### Keywords: package data
> 
> ### ** Examples
> 
> BSgenome.Hsapiens.UCSC.hg19
Human genome:
# organism: Homo sapiens (Human)
# provider: UCSC
# provider version: hg19
# release date: Feb. 2009
# release name: Genome Reference Consortium GRCh37
# 93 sequences:
#   chr1                  chr2                  chr3                 
#   chr4                  chr5                  chr6                 
#   chr7                  chr8                  chr9                 
#   chr10                 chr11                 chr12                
#   chr13                 chr14                 chr15                
#   ...                   ...                   ...                  
#   chrUn_gl000235        chrUn_gl000236        chrUn_gl000237       
#   chrUn_gl000238        chrUn_gl000239        chrUn_gl000240       
#   chrUn_gl000241        chrUn_gl000242        chrUn_gl000243       
#   chrUn_gl000244        chrUn_gl000245        chrUn_gl000246       
#   chrUn_gl000247        chrUn_gl000248        chrUn_gl000249       
# (use 'seqnames()' to see all the sequence names, use the '$' or '[[' operator
# to access a given sequence)
> genome <- BSgenome.Hsapiens.UCSC.hg19
> seqlengths(genome)
                 chr1                  chr2                  chr3 
            249250621             243199373             198022430 
                 chr4                  chr5                  chr6 
            191154276             180915260             171115067 
                 chr7                  chr8                  chr9 
            159138663             146364022             141213431 
                chr10                 chr11                 chr12 
            135534747             135006516             133851895 
                chr13                 chr14                 chr15 
            115169878             107349540             102531392 
                chr16                 chr17                 chr18 
             90354753              81195210              78077248 
                chr19                 chr20                 chr21 
             59128983              63025520              48129895 
                chr22                  chrX                  chrY 
             51304566             155270560              59373566 
                 chrM  chr1_gl000191_random  chr1_gl000192_random 
                16571                106433                547496 
       chr4_ctg9_hap1  chr4_gl000193_random  chr4_gl000194_random 
               590426                189789                191469 
        chr6_apd_hap1         chr6_cox_hap2         chr6_dbb_hap3 
              4622290               4795371               4610396 
       chr6_mann_hap4         chr6_mcf_hap5         chr6_qbl_hap6 
              4683263               4833398               4611984 
       chr6_ssto_hap7  chr7_gl000195_random  chr8_gl000196_random 
              4928567                182896                 38914 
 chr8_gl000197_random  chr9_gl000198_random  chr9_gl000199_random 
                37175                 90085                169874 
 chr9_gl000200_random  chr9_gl000201_random chr11_gl000202_random 
               187035                 36148                 40103 
      chr17_ctg5_hap1 chr17_gl000203_random chr17_gl000204_random 
              1680828                 37498                 81310 
chr17_gl000205_random chr17_gl000206_random chr18_gl000207_random 
               174588                 41001                  4262 
chr19_gl000208_random chr19_gl000209_random chr21_gl000210_random 
                92689                159169                 27682 
       chrUn_gl000211        chrUn_gl000212        chrUn_gl000213 
               166566                186858                164239 
       chrUn_gl000214        chrUn_gl000215        chrUn_gl000216 
               137718                172545                172294 
       chrUn_gl000217        chrUn_gl000218        chrUn_gl000219 
               172149                161147                179198 
       chrUn_gl000220        chrUn_gl000221        chrUn_gl000222 
               161802                155397                186861 
       chrUn_gl000223        chrUn_gl000224        chrUn_gl000225 
               180455                179693                211173 
       chrUn_gl000226        chrUn_gl000227        chrUn_gl000228 
                15008                128374                129120 
       chrUn_gl000229        chrUn_gl000230        chrUn_gl000231 
                19913                 43691                 27386 
       chrUn_gl000232        chrUn_gl000233        chrUn_gl000234 
                40652                 45941                 40531 
       chrUn_gl000235        chrUn_gl000236        chrUn_gl000237 
                34474                 41934                 45867 
       chrUn_gl000238        chrUn_gl000239        chrUn_gl000240 
                39939                 33824                 41933 
       chrUn_gl000241        chrUn_gl000242        chrUn_gl000243 
                42152                 43523                 43341 
       chrUn_gl000244        chrUn_gl000245        chrUn_gl000246 
                39929                 36651                 38154 
       chrUn_gl000247        chrUn_gl000248        chrUn_gl000249 
                36422                 39786                 38502 
> genome$chr1  # same as genome[["chr1"]]
  249250621-letter "DNAString" instance
seq: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> 
> ## ---------------------------------------------------------------------
> ## Upstream sequences
> ## ---------------------------------------------------------------------
> ## Starting with BioC 3.0, the upstream1000, upstream2000, and
> ## upstream5000 sequences for hg19 are not included in the BSgenome data
> ## package anymore. However they can easily be extracted from the full
> ## genome sequences with something like:
> 
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> gn <- sort(genes(txdb))
> up1000 <- flank(gn, width=1000)
> up1000seqs <- getSeq(genome, up1000)
> 
> ## IMPORTANT: Make sure you use a TxDb package (or TranscriptDb object),
> ## that contains a gene model based on the exact same reference genome
> ## as the BSgenome object you pass to getSeq(). Note that you can make
> ## your own custom TranscriptDb object from various annotation resources.
> ## See the makeTranscriptDbFromUCSC(), makeTranscriptDbFromBiomart(),
> ## and makeTranscriptDbFromGFF() functions in the GenomicFeatures
> ## package.
> 
> ## ---------------------------------------------------------------------
> ## Genome-wide motif searching
> ## ---------------------------------------------------------------------
> ## See the GenomeSearching vignette in the BSgenome software
> ## package for some examples of genome-wide motif searching using
> ## Biostrings and the BSgenome data packages:
> #if (interactive())
>     vignette("GenomeSearching", package="BSgenome")
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
> 
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/%{ <-- HERE (.*?)}/ at /usr/bin/run-mailcap line 528.