Last data update: 2014.03.03

R: Full masked genome sequences for Homo sapiens (UCSC version...
BSgenome.Hsapiens.UCSC.hg19.maskedR Documentation

Full masked genome sequences for Homo sapiens (UCSC version hg19)

Description

Full genome sequences for Homo sapiens (Human) as provided by UCSC (hg19, Feb. 2009) and stored in Biostrings objects. The sequences are the same as in BSgenome.Hsapiens.UCSC.hg19, except that each of them has the 4 following masks on top: (1) the mask of assembly gaps (AGAPS mask), (2) the mask of intra-contig ambiguities (AMB mask), (3) the mask of repeats from RepeatMasker (RM mask), and (4) the mask of repeats from Tandem Repeats Finder (TRF mask). Only the AGAPS and AMB masks are "active" by default.

Note

The masks in this BSgenome data package were made from the following source data files:

AGAPS masks: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/gap.txt.gz
RM masks: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromOut.tar.gz
TRF masks: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromTrf.tar.gz

  

See ?BSgenome.Hsapiens.UCSC.hg19 in the BSgenome.Hsapiens.UCSC.hg19 package for information about how the sequences were obtained.

See ?BSgenomeForge and the BSgenomeForge vignette (vignette("BSgenomeForge")) in the BSgenome software package for how to make a BSgenome data package.

Author(s)

The Bioconductor Dev Team

See Also

  • BSgenome.Hsapiens.UCSC.hg19 in the BSgenome.Hsapiens.UCSC.hg19 package for information about how the sequences were obtained.

  • BSgenome objects and the the available.genomes function in the BSgenome software package.

  • MaskedDNAString objects in the Biostrings package.

  • The BSgenomeForge vignette (vignette("BSgenomeForge")) in the BSgenome software package for how to make a BSgenome data package.

Examples

BSgenome.Hsapiens.UCSC.hg19.masked
genome <- BSgenome.Hsapiens.UCSC.hg19.masked
seqlengths(genome)
genome$chr1  # a MaskedDNAString object!
## To get rid of the masks altogether:
unmasked(genome$chr1)  # same as BSgenome.Hsapiens.UCSC.hg19$chr1

if ("AGAPS" %in% masknames(genome)) {

  ## Check that the assembly gaps contain only Ns:
  checkOnlyNsInGaps <- function(seq)
  {
    ## Replace all masks by the inverted AGAPS mask
    masks(seq) <- gaps(masks(seq)["AGAPS"])
    unique_letters <- uniqueLetters(seq)
    if (any(unique_letters != "N"))
        stop("assembly gaps contain more than just Ns")
  }

  ## A message will be printed each time a sequence is removed
  ## from the cache:
  options(verbose=TRUE)

  for (seqname in seqnames(genome)) {
    cat("Checking sequence", seqname, "... ")
    seq <- genome[[seqname]]
    checkOnlyNsInGaps(seq)
    cat("OK\n")
  }
}

## See the GenomeSearching vignette in the BSgenome software
## package for some examples of genome-wide motif searching using
## Biostrings and the BSgenome data packages:
if (interactive())
    vignette("GenomeSearching", package="BSgenome")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(BSgenome.Hsapiens.UCSC.hg19.masked)
Loading required package: BSgenome
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
Loading required package: BSgenome.Hsapiens.UCSC.hg19
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BSgenome.Hsapiens.UCSC.hg19.masked/package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: BSgenome.Hsapiens.UCSC.hg19.masked
> ### Title: Full masked genome sequences for Homo sapiens (UCSC version
> ###   hg19)
> ### Aliases: BSgenome.Hsapiens.UCSC.hg19.masked-package
> ###   BSgenome.Hsapiens.UCSC.hg19.masked
> ### Keywords: package data
> 
> ### ** Examples
> 
> BSgenome.Hsapiens.UCSC.hg19.masked
Human genome:
# organism: Homo sapiens (Human)
# provider: UCSC
# provider version: hg19
# release date: Feb. 2009
# release name: Genome Reference Consortium GRCh37
# 93 sequences:
#   chr1                  chr2                  chr3                 
#   chr4                  chr5                  chr6                 
#   chr7                  chr8                  chr9                 
#   chr10                 chr11                 chr12                
#   chr13                 chr14                 chr15                
#   ...                   ...                   ...                  
#   chrUn_gl000235        chrUn_gl000236        chrUn_gl000237       
#   chrUn_gl000238        chrUn_gl000239        chrUn_gl000240       
#   chrUn_gl000241        chrUn_gl000242        chrUn_gl000243       
#   chrUn_gl000244        chrUn_gl000245        chrUn_gl000246       
#   chrUn_gl000247        chrUn_gl000248        chrUn_gl000249       
# (use 'seqnames()' to see all the sequence names, use the '$' or '[[' operator
# to access a given sequence)
> genome <- BSgenome.Hsapiens.UCSC.hg19.masked
> seqlengths(genome)
                 chr1                  chr2                  chr3 
            249250621             243199373             198022430 
                 chr4                  chr5                  chr6 
            191154276             180915260             171115067 
                 chr7                  chr8                  chr9 
            159138663             146364022             141213431 
                chr10                 chr11                 chr12 
            135534747             135006516             133851895 
                chr13                 chr14                 chr15 
            115169878             107349540             102531392 
                chr16                 chr17                 chr18 
             90354753              81195210              78077248 
                chr19                 chr20                 chr21 
             59128983              63025520              48129895 
                chr22                  chrX                  chrY 
             51304566             155270560              59373566 
                 chrM  chr1_gl000191_random  chr1_gl000192_random 
                16571                106433                547496 
       chr4_ctg9_hap1  chr4_gl000193_random  chr4_gl000194_random 
               590426                189789                191469 
        chr6_apd_hap1         chr6_cox_hap2         chr6_dbb_hap3 
              4622290               4795371               4610396 
       chr6_mann_hap4         chr6_mcf_hap5         chr6_qbl_hap6 
              4683263               4833398               4611984 
       chr6_ssto_hap7  chr7_gl000195_random  chr8_gl000196_random 
              4928567                182896                 38914 
 chr8_gl000197_random  chr9_gl000198_random  chr9_gl000199_random 
                37175                 90085                169874 
 chr9_gl000200_random  chr9_gl000201_random chr11_gl000202_random 
               187035                 36148                 40103 
      chr17_ctg5_hap1 chr17_gl000203_random chr17_gl000204_random 
              1680828                 37498                 81310 
chr17_gl000205_random chr17_gl000206_random chr18_gl000207_random 
               174588                 41001                  4262 
chr19_gl000208_random chr19_gl000209_random chr21_gl000210_random 
                92689                159169                 27682 
       chrUn_gl000211        chrUn_gl000212        chrUn_gl000213 
               166566                186858                164239 
       chrUn_gl000214        chrUn_gl000215        chrUn_gl000216 
               137718                172545                172294 
       chrUn_gl000217        chrUn_gl000218        chrUn_gl000219 
               172149                161147                179198 
       chrUn_gl000220        chrUn_gl000221        chrUn_gl000222 
               161802                155397                186861 
       chrUn_gl000223        chrUn_gl000224        chrUn_gl000225 
               180455                179693                211173 
       chrUn_gl000226        chrUn_gl000227        chrUn_gl000228 
                15008                128374                129120 
       chrUn_gl000229        chrUn_gl000230        chrUn_gl000231 
                19913                 43691                 27386 
       chrUn_gl000232        chrUn_gl000233        chrUn_gl000234 
                40652                 45941                 40531 
       chrUn_gl000235        chrUn_gl000236        chrUn_gl000237 
                34474                 41934                 45867 
       chrUn_gl000238        chrUn_gl000239        chrUn_gl000240 
                39939                 33824                 41933 
       chrUn_gl000241        chrUn_gl000242        chrUn_gl000243 
                42152                 43523                 43341 
       chrUn_gl000244        chrUn_gl000245        chrUn_gl000246 
                39929                 36651                 38154 
       chrUn_gl000247        chrUn_gl000248        chrUn_gl000249 
                36422                 39786                 38502 
> genome$chr1  # a MaskedDNAString object!
  249250621-letter "MaskedDNAString" instance (# for masking)
seq: ####################################...####################################
masks:
  maskedwidth maskedratio active names                               desc
1    23970000  0.09616827   TRUE AGAPS                      assembly gaps
2           0  0.00000000   TRUE   AMB   intra-contig ambiguities (empty)
3   114014472  0.45742904  FALSE    RM                       RepeatMasker
4     1581889  0.00634658  FALSE   TRF Tandem Repeats Finder [period<=12]
all masks together:
  maskedwidth maskedratio
    138071094   0.5539448
all active masks together:
  maskedwidth maskedratio
     23970000  0.09616827
> ## To get rid of the masks altogether:
> unmasked(genome$chr1)  # same as BSgenome.Hsapiens.UCSC.hg19$chr1
  249250621-letter "DNAString" instance
seq: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> 
> if ("AGAPS" %in% masknames(genome)) {
+ 
+   ## Check that the assembly gaps contain only Ns:
+   checkOnlyNsInGaps <- function(seq)
+   {
+     ## Replace all masks by the inverted AGAPS mask
+     masks(seq) <- gaps(masks(seq)["AGAPS"])
+     unique_letters <- uniqueLetters(seq)
+     if (any(unique_letters != "N"))
+         stop("assembly gaps contain more than just Ns")
+   }
+ 
+   ## A message will be printed each time a sequence is removed
+   ## from the cache:
+   options(verbose=TRUE)
+ 
+   for (seqname in seqnames(genome)) {
+     cat("Checking sequence", seqname, "... ")
+     seq <- genome[[seqname]]
+     checkOnlyNsInGaps(seq)
+     cat("OK\n")
+   }
+ }
Checking sequence chr1 ... OK
Checking sequence chr2 ... caching chr2
OK
Checking sequence chr3 ... caching chr3
OK
Checking sequence chr4 ... uncaching chr2
caching chr4
OK
Checking sequence chr5 ... uncaching chr3
caching chr5
OK
Checking sequence chr6 ... caching chr6
OK
Checking sequence chr7 ... caching chr7
OK
Checking sequence chr8 ... uncaching chr6
uncaching chr5
uncaching chr4
caching chr8
OK
Checking sequence chr9 ... caching chr9
OK
Checking sequence chr10 ... caching chr10
OK
Checking sequence chr11 ... caching chr11
OK
Checking sequence chr12 ... uncaching chr10
uncaching chr9
uncaching chr8
uncaching chr7
caching chr12
OK
Checking sequence chr13 ... caching chr13
OK
Checking sequence chr14 ... caching chr14
OK
Checking sequence chr15 ... caching chr15
OK
Checking sequence chr16 ... caching chr16
OK
Checking sequence chr17 ... caching chr17
OK
Checking sequence chr18 ... caching chr18
OK
Checking sequence chr19 ... caching chr19
OK
Checking sequence chr20 ... caching chr20
OK
Checking sequence chr21 ... uncaching chr19
uncaching chr18
uncaching chr17
uncaching chr16
uncaching chr15
uncaching chr14
uncaching chr13
caching chr21
OK
Checking sequence chr22 ... caching chr22
OK
Checking sequence chrX ... caching chrX
OK
Checking sequence chrY ... caching chrY
OK
Checking sequence chrM ... caching chrM
OK
Checking sequence chr1_gl000191_random ... caching chr1_gl000191_random
OK
Checking sequence chr1_gl000192_random ... caching chr1_gl000192_random
OK
Checking sequence chr4_ctg9_hap1 ... caching chr4_ctg9_hap1
OK
Checking sequence chr4_gl000193_random ... caching chr4_gl000193_random
OK
Checking sequence chr4_gl000194_random ... caching chr4_gl000194_random
OK
Checking sequence chr6_apd_hap1 ... caching chr6_apd_hap1
OK
Checking sequence chr6_cox_hap2 ... caching chr6_cox_hap2
OK
Checking sequence chr6_dbb_hap3 ... caching chr6_dbb_hap3
OK
Checking sequence chr6_mann_hap4 ... caching chr6_mann_hap4
OK
Checking sequence chr6_mcf_hap5 ... uncaching chr6_dbb_hap3
uncaching chr6_cox_hap2
uncaching chr6_apd_hap1
uncaching chr4_gl000194_random
uncaching chr4_gl000193_random
uncaching chr4_ctg9_hap1
uncaching chr1_gl000192_random
uncaching chr1_gl000191_random
uncaching chrM
uncaching chrY
uncaching chrX
uncaching chr22
caching chr6_mcf_hap5
OK
Checking sequence chr6_qbl_hap6 ... caching chr6_qbl_hap6
OK
Checking sequence chr6_ssto_hap7 ... caching chr6_ssto_hap7
OK
Checking sequence chr7_gl000195_random ... caching chr7_gl000195_random
OK
Checking sequence chr8_gl000196_random ... caching chr8_gl000196_random
OK
Checking sequence chr8_gl000197_random ... caching chr8_gl000197_random
OK
Checking sequence chr9_gl000198_random ... caching chr9_gl000198_random
OK
Checking sequence chr9_gl000199_random ... caching chr9_gl000199_random
OK
Checking sequence chr9_gl000200_random ... caching chr9_gl000200_random
OK
Checking sequence chr9_gl000201_random ... caching chr9_gl000201_random
OK
Checking sequence chr11_gl000202_random ... caching chr11_gl000202_random
OK
Checking sequence chr17_ctg5_hap1 ... caching chr17_ctg5_hap1
OK
Checking sequence chr17_gl000203_random ... caching chr17_gl000203_random
OK
Checking sequence chr17_gl000204_random ... caching chr17_gl000204_random
OK
Checking sequence chr17_gl000205_random ... uncaching chr17_gl000203_random
uncaching chr17_ctg5_hap1
uncaching chr11_gl000202_random
uncaching chr9_gl000201_random
uncaching chr9_gl000200_random
uncaching chr9_gl000199_random
uncaching chr9_gl000198_random
uncaching chr8_gl000197_random
uncaching chr8_gl000196_random
uncaching chr7_gl000195_random
uncaching chr6_ssto_hap7
uncaching chr6_qbl_hap6
uncaching chr6_mcf_hap5
uncaching chr6_mann_hap4
caching chr17_gl000205_random
OK
Checking sequence chr17_gl000206_random ... caching chr17_gl000206_random
OK
Checking sequence chr18_gl000207_random ... caching chr18_gl000207_random
OK
Checking sequence chr19_gl000208_random ... caching chr19_gl000208_random
OK
Checking sequence chr19_gl000209_random ... caching chr19_gl000209_random
OK
Checking sequence chr21_gl000210_random ... caching chr21_gl000210_random
OK
Checking sequence chrUn_gl000211 ... caching chrUn_gl000211
OK
Checking sequence chrUn_gl000212 ... caching chrUn_gl000212
OK
Checking sequence chrUn_gl000213 ... caching chrUn_gl000213
OK
Checking sequence chrUn_gl000214 ... caching chrUn_gl000214
OK
Checking sequence chrUn_gl000215 ... caching chrUn_gl000215
OK
Checking sequence chrUn_gl000216 ... caching chrUn_gl000216
OK
Checking sequence chrUn_gl000217 ... caching chrUn_gl000217
OK
Checking sequence chrUn_gl000218 ... caching chrUn_gl000218
uncaching chrUn_gl000217
uncaching chrUn_gl000216
uncaching chrUn_gl000215
uncaching chrUn_gl000214
uncaching chrUn_gl000213
uncaching chrUn_gl000212
uncaching chrUn_gl000211
uncaching chr21_gl000210_random
uncaching chr19_gl000209_random
uncaching chr19_gl000208_random
uncaching chr18_gl000207_random
uncaching chr17_gl000206_random
OK
Checking sequence chrUn_gl000219 ... caching chrUn_gl000219
OK
Checking sequence chrUn_gl000220 ... caching chrUn_gl000220
OK
Checking sequence chrUn_gl000221 ... caching chrUn_gl000221
OK
Checking sequence chrUn_gl000222 ... caching chrUn_gl000222
OK
Checking sequence chrUn_gl000223 ... caching chrUn_gl000223
OK
Checking sequence chrUn_gl000224 ... caching chrUn_gl000224
OK
Checking sequence chrUn_gl000225 ... caching chrUn_gl000225
OK
Checking sequence chrUn_gl000226 ... caching chrUn_gl000226
OK
Checking sequence chrUn_gl000227 ... caching chrUn_gl000227
OK
Checking sequence chrUn_gl000228 ... caching chrUn_gl000228
OK
Checking sequence chrUn_gl000229 ... caching chrUn_gl000229
OK
Checking sequence chrUn_gl000230 ... caching chrUn_gl000230
OK
Checking sequence chrUn_gl000231 ... caching chrUn_gl000231
OK
Checking sequence chrUn_gl000232 ... caching chrUn_gl000232
uncaching chrUn_gl000231
uncaching chrUn_gl000230
uncaching chrUn_gl000229
uncaching chrUn_gl000228
uncaching chrUn_gl000227
uncaching chrUn_gl000226
uncaching chrUn_gl000225
uncaching chrUn_gl000224
uncaching chrUn_gl000223
uncaching chrUn_gl000222
uncaching chrUn_gl000221
uncaching chrUn_gl000220
uncaching chrUn_gl000219
OK
Checking sequence chrUn_gl000233 ... caching chrUn_gl000233
OK
Checking sequence chrUn_gl000234 ... caching chrUn_gl000234
OK
Checking sequence chrUn_gl000235 ... caching chrUn_gl000235
OK
Checking sequence chrUn_gl000236 ... caching chrUn_gl000236
OK
Checking sequence chrUn_gl000237 ... caching chrUn_gl000237
OK
Checking sequence chrUn_gl000238 ... caching chrUn_gl000238
OK
Checking sequence chrUn_gl000239 ... caching chrUn_gl000239
OK
Checking sequence chrUn_gl000240 ... caching chrUn_gl000240
OK
Checking sequence chrUn_gl000241 ... caching chrUn_gl000241
OK
Checking sequence chrUn_gl000242 ... caching chrUn_gl000242
OK
Checking sequence chrUn_gl000243 ... caching chrUn_gl000243
OK
Checking sequence chrUn_gl000244 ... caching chrUn_gl000244
OK
Checking sequence chrUn_gl000245 ... caching chrUn_gl000245
OK
Checking sequence chrUn_gl000246 ... uncaching chrUn_gl000244
uncaching chrUn_gl000243
uncaching chrUn_gl000242
uncaching chrUn_gl000241
uncaching chrUn_gl000240
uncaching chrUn_gl000239
uncaching chrUn_gl000238
uncaching chrUn_gl000237
uncaching chrUn_gl000236
uncaching chrUn_gl000235
uncaching chrUn_gl000234
uncaching chrUn_gl000233
caching chrUn_gl000246
OK
Checking sequence chrUn_gl000247 ... caching chrUn_gl000247
OK
Checking sequence chrUn_gl000248 ... caching chrUn_gl000248
OK
Checking sequence chrUn_gl000249 ... caching chrUn_gl000249
OK
> 
> ## See the GenomeSearching vignette in the BSgenome software
> ## package for some examples of genome-wide motif searching using
> ## Biostrings and the BSgenome data packages:
> #if (interactive())
>     vignette("GenomeSearching", package="BSgenome")
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
> 
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/%{ <-- HERE (.*?)}/ at /usr/bin/run-mailcap line 528.