Last data update: 2014.03.03

R: Full genome sequences for Homo sapiens (UCSC version hg18)
BSgenome.Hsapiens.UCSC.hg18R Documentation

Full genome sequences for Homo sapiens (UCSC version hg18)

Description

Full genome sequences for Homo sapiens (Human) as provided by UCSC (hg18, Mar. 2006) and stored in Biostrings objects.

Note

This BSgenome data package was made from the following source data files:

chromFa.zip from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/
  

See ?BSgenomeForge and the BSgenomeForge vignette (vignette("BSgenomeForge")) in the BSgenome software package for how to make a BSgenome data package.

Author(s)

The Bioconductor Dev Team

See Also

  • BSgenome objects and the available.genomes function in the BSgenome software package.

  • DNAString objects in the Biostrings package.

  • The BSgenomeForge vignette (vignette("BSgenomeForge")) in the BSgenome software package for how to make a BSgenome data package.

Examples

BSgenome.Hsapiens.UCSC.hg18
genome <- BSgenome.Hsapiens.UCSC.hg18
seqlengths(genome)
genome$chr1  # same as genome[["chr1"]]

## ---------------------------------------------------------------------
## Upstream sequences
## ---------------------------------------------------------------------
## Starting with BioC 3.0, the upstream1000, upstream2000, and
## upstream5000 sequences for hg18 are not included in the BSgenome data
## package anymore. However they can easily be extracted from the full
## genome sequences with something like:

library(TxDb.Hsapiens.UCSC.hg18.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg18.knownGene
gn <- sort(genes(txdb))
up1000 <- flank(gn, width=1000)
up1000seqs <- getSeq(genome, up1000)

## IMPORTANT: Make sure you use a TxDb package (or TranscriptDb object),
## that contains a gene model based on the exact same reference genome
## as the BSgenome object you pass to getSeq(). Note that you can make
## your own custom TranscriptDb object from various annotation resources.
## See the makeTranscriptDbFromUCSC(), makeTranscriptDbFromBiomart(),
## and makeTranscriptDbFromGFF() functions in the GenomicFeatures
## package.

## ---------------------------------------------------------------------
## Genome-wide motif searching
## ---------------------------------------------------------------------
## See the GenomeSearching vignette in the BSgenome software
## package for some examples of genome-wide motif searching using
## Biostrings and the BSgenome data packages:
if (interactive())
    vignette("GenomeSearching", package="BSgenome")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(BSgenome.Hsapiens.UCSC.hg18)
Loading required package: BSgenome
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BSgenome.Hsapiens.UCSC.hg18/package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: BSgenome.Hsapiens.UCSC.hg18
> ### Title: Full genome sequences for Homo sapiens (UCSC version hg18)
> ### Aliases: BSgenome.Hsapiens.UCSC.hg18-package
> ###   BSgenome.Hsapiens.UCSC.hg18 Hsapiens
> ### Keywords: package data
> 
> ### ** Examples
> 
> BSgenome.Hsapiens.UCSC.hg18
Human genome:
# organism: Homo sapiens (Human)
# provider: UCSC
# provider version: hg18
# release date: Mar. 2006
# release name: NCBI Build 36.1
# 49 sequences:
#   chr1          chr2          chr3          chr4          chr5         
#   chr6          chr7          chr8          chr9          chr10        
#   chr11         chr12         chr13         chr14         chr15        
#   chr16         chr17         chr18         chr19         chr20        
#   chr21         chr22         chrX          chrY          chrM         
#   chr5_h2_hap1  chr6_cox_hap1 chr6_qbl_hap2 chr22_h2_hap1 chr1_random  
#   chr2_random   chr3_random   chr4_random   chr5_random   chr6_random  
#   chr7_random   chr8_random   chr9_random   chr10_random  chr11_random 
#   chr13_random  chr15_random  chr16_random  chr17_random  chr18_random 
#   chr19_random  chr21_random  chr22_random  chrX_random                
# (use 'seqnames()' to see all the sequence names, use the '$' or '[[' operator
# to access a given sequence)
> genome <- BSgenome.Hsapiens.UCSC.hg18
> seqlengths(genome)
         chr1          chr2          chr3          chr4          chr5 
    247249719     242951149     199501827     191273063     180857866 
         chr6          chr7          chr8          chr9         chr10 
    170899992     158821424     146274826     140273252     135374737 
        chr11         chr12         chr13         chr14         chr15 
    134452384     132349534     114142980     106368585     100338915 
        chr16         chr17         chr18         chr19         chr20 
     88827254      78774742      76117153      63811651      62435964 
        chr21         chr22          chrX          chrY          chrM 
     46944323      49691432     154913754      57772954         16571 
 chr5_h2_hap1 chr6_cox_hap1 chr6_qbl_hap2 chr22_h2_hap1   chr1_random 
      1794870       4731698       4565931         63661       1663265 
  chr2_random   chr3_random   chr4_random   chr5_random   chr6_random 
       185571        749256        842648        143687       1875562 
  chr7_random   chr8_random   chr9_random  chr10_random  chr11_random 
       549659        943810       1146434        113275        215294 
 chr13_random  chr15_random  chr16_random  chr17_random  chr18_random 
       186858        784346        105485       2617613          4262 
 chr19_random  chr21_random  chr22_random   chrX_random 
       301858       1679693        257318       1719168 
> genome$chr1  # same as genome[["chr1"]]
  247249719-letter "DNAString" instance
seq: TAACCCTAACCCTAACCCTAACCCTAACCCTAACCC...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> 
> ## ---------------------------------------------------------------------
> ## Upstream sequences
> ## ---------------------------------------------------------------------
> ## Starting with BioC 3.0, the upstream1000, upstream2000, and
> ## upstream5000 sequences for hg18 are not included in the BSgenome data
> ## package anymore. However they can easily be extracted from the full
> ## genome sequences with something like:
> 
> library(TxDb.Hsapiens.UCSC.hg18.knownGene)
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> txdb <- TxDb.Hsapiens.UCSC.hg18.knownGene
> gn <- sort(genes(txdb))
> up1000 <- flank(gn, width=1000)
> up1000seqs <- getSeq(genome, up1000)
> 
> ## IMPORTANT: Make sure you use a TxDb package (or TranscriptDb object),
> ## that contains a gene model based on the exact same reference genome
> ## as the BSgenome object you pass to getSeq(). Note that you can make
> ## your own custom TranscriptDb object from various annotation resources.
> ## See the makeTranscriptDbFromUCSC(), makeTranscriptDbFromBiomart(),
> ## and makeTranscriptDbFromGFF() functions in the GenomicFeatures
> ## package.
> 
> ## ---------------------------------------------------------------------
> ## Genome-wide motif searching
> ## ---------------------------------------------------------------------
> ## See the GenomeSearching vignette in the BSgenome software
> ## package for some examples of genome-wide motif searching using
> ## Biostrings and the BSgenome data packages:
> #if (interactive())
>     vignette("GenomeSearching", package="BSgenome")
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
> 
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/%{ <-- HERE (.*?)}/ at /usr/bin/run-mailcap line 528.