Last data update: 2014.03.03

R: Full masked genome sequences for Homo sapiens (UCSC version...
BSgenome.Hsapiens.UCSC.hg18.maskedR Documentation

Full masked genome sequences for Homo sapiens (UCSC version hg18)

Description

Full genome sequences for Homo sapiens (Human) as provided by UCSC (hg18, Mar. 2006) and stored in Biostrings objects. The sequences are the same as in BSgenome.Hsapiens.UCSC.hg18, except that each of them has the 4 following masks on top: (1) the mask of assembly gaps (AGAPS mask), (2) the mask of intra-contig ambiguities (AMB mask), (3) the mask of repeats from RepeatMasker (RM mask), and (4) the mask of repeats from Tandem Repeats Finder (TRF mask). Only the AGAPS and AMB masks are "active" by default.

Note

The masks in this BSgenome data package were made from the following source data files:

AGAPS masks: all the chr*_gap.txt.gz files from ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/
RM masks: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/chromOut.zip
TRF masks: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/chromTrf.zip

  

See ?BSgenome.Hsapiens.UCSC.hg18 in the BSgenome.Hsapiens.UCSC.hg18 package for information about how the sequences were obtained.

See ?BSgenomeForge and the BSgenomeForge vignette (vignette("BSgenomeForge")) in the BSgenome software package for how to make a BSgenome data package.

Author(s)

The Bioconductor Dev Team

See Also

  • BSgenome.Hsapiens.UCSC.hg18 in the BSgenome.Hsapiens.UCSC.hg18 package for information about how the sequences were obtained.

  • BSgenome objects and the the available.genomes function in the BSgenome software package.

  • MaskedDNAString objects in the Biostrings package.

  • The BSgenomeForge vignette (vignette("BSgenomeForge")) in the BSgenome software package for how to make a BSgenome data package.

Examples

BSgenome.Hsapiens.UCSC.hg18.masked
genome <- BSgenome.Hsapiens.UCSC.hg18.masked
seqlengths(genome)
genome$chr1  # a MaskedDNAString object!
## To get rid of the masks altogether:
unmasked(genome$chr1)  # same as BSgenome.Hsapiens.UCSC.hg18$chr1

if ("AGAPS" %in% masknames(genome)) {

  ## Check that the assembly gaps contain only Ns:
  checkOnlyNsInGaps <- function(seq)
  {
    ## Replace all masks by the inverted AGAPS mask
    masks(seq) <- gaps(masks(seq)["AGAPS"])
    unique_letters <- uniqueLetters(seq)
    if (any(unique_letters != "N"))
        stop("assembly gaps contain more than just Ns")
  }

  ## A message will be printed each time a sequence is removed
  ## from the cache:
  options(verbose=TRUE)

  for (seqname in seqnames(genome)) {
    cat("Checking sequence", seqname, "... ")
    seq <- genome[[seqname]]
    checkOnlyNsInGaps(seq)
    cat("OK\n")
  }
}

## See the GenomeSearching vignette in the BSgenome software
## package for some examples of genome-wide motif searching using
## Biostrings and the BSgenome data packages:
if (interactive())
    vignette("GenomeSearching", package="BSgenome")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(BSgenome.Hsapiens.UCSC.hg18.masked)
Loading required package: BSgenome
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
Loading required package: BSgenome.Hsapiens.UCSC.hg18
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BSgenome.Hsapiens.UCSC.hg18.masked/package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: BSgenome.Hsapiens.UCSC.hg18.masked
> ### Title: Full masked genome sequences for Homo sapiens (UCSC version
> ###   hg18)
> ### Aliases: BSgenome.Hsapiens.UCSC.hg18.masked-package
> ###   BSgenome.Hsapiens.UCSC.hg18.masked
> ### Keywords: package data
> 
> ### ** Examples
> 
> BSgenome.Hsapiens.UCSC.hg18.masked
Human genome:
# organism: Homo sapiens (Human)
# provider: UCSC
# provider version: hg18
# release date: Mar. 2006
# release name: NCBI Build 36.1
# 49 sequences:
#   chr1          chr2          chr3          chr4          chr5         
#   chr6          chr7          chr8          chr9          chr10        
#   chr11         chr12         chr13         chr14         chr15        
#   chr16         chr17         chr18         chr19         chr20        
#   chr21         chr22         chrX          chrY          chrM         
#   chr5_h2_hap1  chr6_cox_hap1 chr6_qbl_hap2 chr22_h2_hap1 chr1_random  
#   chr2_random   chr3_random   chr4_random   chr5_random   chr6_random  
#   chr7_random   chr8_random   chr9_random   chr10_random  chr11_random 
#   chr13_random  chr15_random  chr16_random  chr17_random  chr18_random 
#   chr19_random  chr21_random  chr22_random  chrX_random                
# (use 'seqnames()' to see all the sequence names, use the '$' or '[[' operator
# to access a given sequence)
> genome <- BSgenome.Hsapiens.UCSC.hg18.masked
> seqlengths(genome)
         chr1          chr2          chr3          chr4          chr5 
    247249719     242951149     199501827     191273063     180857866 
         chr6          chr7          chr8          chr9         chr10 
    170899992     158821424     146274826     140273252     135374737 
        chr11         chr12         chr13         chr14         chr15 
    134452384     132349534     114142980     106368585     100338915 
        chr16         chr17         chr18         chr19         chr20 
     88827254      78774742      76117153      63811651      62435964 
        chr21         chr22          chrX          chrY          chrM 
     46944323      49691432     154913754      57772954         16571 
 chr5_h2_hap1 chr6_cox_hap1 chr6_qbl_hap2 chr22_h2_hap1   chr1_random 
      1794870       4731698       4565931         63661       1663265 
  chr2_random   chr3_random   chr4_random   chr5_random   chr6_random 
       185571        749256        842648        143687       1875562 
  chr7_random   chr8_random   chr9_random  chr10_random  chr11_random 
       549659        943810       1146434        113275        215294 
 chr13_random  chr15_random  chr16_random  chr17_random  chr18_random 
       186858        784346        105485       2617613          4262 
 chr19_random  chr21_random  chr22_random   chrX_random 
       301858       1679693        257318       1719168 
> genome$chr1  # a MaskedDNAString object!
  247249719-letter "MaskedDNAString" instance (# for masking)
seq: TAACCCTAACCCTAACCCTAACCCTAACCCTAACCC...####################################
masks:
  maskedwidth maskedratio active names                               desc
1    22250000 0.089989991   TRUE AGAPS                      assembly gaps
2           0 0.000000000   TRUE   AMB   intra-contig ambiguities (empty)
3   109628227 0.443390704  FALSE    RM                       RepeatMasker
4     1513562 0.006121592  FALSE   TRF Tandem Repeats Finder [period<=12]
all masks together:
  maskedwidth maskedratio
    131963053   0.5337238
all active masks together:
  maskedwidth maskedratio
     22250000  0.08998999
> ## To get rid of the masks altogether:
> unmasked(genome$chr1)  # same as BSgenome.Hsapiens.UCSC.hg18$chr1
  247249719-letter "DNAString" instance
seq: TAACCCTAACCCTAACCCTAACCCTAACCCTAACCC...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> 
> if ("AGAPS" %in% masknames(genome)) {
+ 
+   ## Check that the assembly gaps contain only Ns:
+   checkOnlyNsInGaps <- function(seq)
+   {
+     ## Replace all masks by the inverted AGAPS mask
+     masks(seq) <- gaps(masks(seq)["AGAPS"])
+     unique_letters <- uniqueLetters(seq)
+     if (any(unique_letters != "N"))
+         stop("assembly gaps contain more than just Ns")
+   }
+ 
+   ## A message will be printed each time a sequence is removed
+   ## from the cache:
+   options(verbose=TRUE)
+ 
+   for (seqname in seqnames(genome)) {
+     cat("Checking sequence", seqname, "... ")
+     seq <- genome[[seqname]]
+     checkOnlyNsInGaps(seq)
+     cat("OK\n")
+   }
+ }
Checking sequence chr1 ... OK
Checking sequence chr2 ... caching chr2
OK
Checking sequence chr3 ... caching chr3
OK
Checking sequence chr4 ... uncaching chr2
caching chr4
OK
Checking sequence chr5 ... uncaching chr3
caching chr5
OK
Checking sequence chr6 ... caching chr6
OK
Checking sequence chr7 ... caching chr7
OK
Checking sequence chr8 ... uncaching chr6
uncaching chr5
uncaching chr4
caching chr8
OK
Checking sequence chr9 ... caching chr9
OK
Checking sequence chr10 ... caching chr10
OK
Checking sequence chr11 ... caching chr11
OK
Checking sequence chr12 ... uncaching chr10
uncaching chr9
uncaching chr8
uncaching chr7
caching chr12
OK
Checking sequence chr13 ... caching chr13
OK
Checking sequence chr14 ... caching chr14
OK
Checking sequence chr15 ... caching chr15
OK
Checking sequence chr16 ... caching chr16
OK
Checking sequence chr17 ... caching chr17
OK
Checking sequence chr18 ... caching chr18
OK
Checking sequence chr19 ... caching chr19
OK
Checking sequence chr20 ... caching chr20
OK
Checking sequence chr21 ... caching chr21
OK
Checking sequence chr22 ... uncaching chr20
uncaching chr19
uncaching chr18
uncaching chr17
uncaching chr16
uncaching chr15
uncaching chr14
uncaching chr13
uncaching chr12
uncaching chr11
caching chr22
OK
Checking sequence chrX ... caching chrX
OK
Checking sequence chrY ... caching chrY
OK
Checking sequence chrM ... caching chrM
OK
Checking sequence chr5_h2_hap1 ... caching chr5_h2_hap1
OK
Checking sequence chr6_cox_hap1 ... caching chr6_cox_hap1
OK
Checking sequence chr6_qbl_hap2 ... caching chr6_qbl_hap2
OK
Checking sequence chr22_h2_hap1 ... caching chr22_h2_hap1
OK
Checking sequence chr1_random ... caching chr1_random
OK
Checking sequence chr2_random ... caching chr2_random
OK
Checking sequence chr3_random ... caching chr3_random
OK
Checking sequence chr4_random ... caching chr4_random
OK
Checking sequence chr5_random ... caching chr5_random
OK
Checking sequence chr6_random ... caching chr6_random
OK
Checking sequence chr7_random ... caching chr7_random
OK
Checking sequence chr8_random ... caching chr8_random
OK
Checking sequence chr9_random ... caching chr9_random
OK
Checking sequence chr10_random ... caching chr10_random
OK
Checking sequence chr11_random ... caching chr11_random
OK
Checking sequence chr13_random ... caching chr13_random
OK
Checking sequence chr15_random ... caching chr15_random
uncaching chr13_random
uncaching chr11_random
uncaching chr10_random
uncaching chr9_random
uncaching chr8_random
uncaching chr7_random
uncaching chr6_random
uncaching chr5_random
uncaching chr4_random
uncaching chr3_random
uncaching chr2_random
uncaching chr1_random
uncaching chr22_h2_hap1
uncaching chr6_qbl_hap2
uncaching chr6_cox_hap1
uncaching chr5_h2_hap1
uncaching chrM
uncaching chrY
uncaching chrX
OK
Checking sequence chr16_random ... caching chr16_random
OK
Checking sequence chr17_random ... caching chr17_random
OK
Checking sequence chr18_random ... caching chr18_random
OK
Checking sequence chr19_random ... caching chr19_random
OK
Checking sequence chr21_random ... caching chr21_random
OK
Checking sequence chr22_random ... caching chr22_random
OK
Checking sequence chrX_random ... caching chrX_random
OK
> 
> ## See the GenomeSearching vignette in the BSgenome software
> ## package for some examples of genome-wide motif searching using
> ## Biostrings and the BSgenome data packages:
> #if (interactive())
>     vignette("GenomeSearching", package="BSgenome")
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
> 
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/%{ <-- HERE (.*?)}/ at /usr/bin/run-mailcap line 528.