Last data update: 2014.03.03

R: Full masked genome sequences for Mus musculus (UCSC version...
BSgenome.Mmusculus.UCSC.mm10.maskedR Documentation

Full masked genome sequences for Mus musculus (UCSC version mm10)

Description

Full genome sequences for Mus musculus (Mouse) as provided by UCSC (mm10, Dec. 2011) and stored in Biostrings objects. The sequences are the same as in BSgenome.Mmusculus.UCSC.mm10, except that each of them has the 2 following masks on top: (1) the mask of assembly gaps (AGAPS mask), and (2) the mask of intra-contig ambiguities (AMB mask).

Note

The masks in this BSgenome data package were made from the following source data files:

AGAPS masks: http://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/gap.txt.gz

  

See ?BSgenome.Mmusculus.UCSC.mm10 in the BSgenome.Mmusculus.UCSC.mm10 package for information about how the sequences were obtained.

See ?BSgenomeForge and the BSgenomeForge vignette (vignette("BSgenomeForge")) in the BSgenome software package for how to make a BSgenome data package.

Author(s)

The Bioconductor Dev Team

See Also

  • BSgenome.Mmusculus.UCSC.mm10 in the BSgenome.Mmusculus.UCSC.mm10 package for information about how the sequences were obtained.

  • BSgenome objects and the the available.genomes function in the BSgenome software package.

  • MaskedDNAString objects in the Biostrings package.

  • The BSgenomeForge vignette (vignette("BSgenomeForge")) in the BSgenome software package for how to make a BSgenome data package.

Examples

BSgenome.Mmusculus.UCSC.mm10.masked
genome <- BSgenome.Mmusculus.UCSC.mm10.masked
seqlengths(genome)
genome$chr1  # a MaskedDNAString object!
## To get rid of the masks altogether:
unmasked(genome$chr1)  # same as BSgenome.Mmusculus.UCSC.mm10$chr1

if ("AGAPS" %in% masknames(genome)) {

  ## Check that the assembly gaps contain only Ns:
  checkOnlyNsInGaps <- function(seq)
  {
    ## Replace all masks by the inverted AGAPS mask
    masks(seq) <- gaps(masks(seq)["AGAPS"])
    unique_letters <- uniqueLetters(seq)
    if (any(unique_letters != "N"))
        stop("assembly gaps contain more than just Ns")
  }

  ## A message will be printed each time a sequence is removed
  ## from the cache:
  options(verbose=TRUE)

  for (seqname in seqnames(genome)) {
    cat("Checking sequence", seqname, "... ")
    seq <- genome[[seqname]]
    checkOnlyNsInGaps(seq)
    cat("OK\n")
  }
}

## See the GenomeSearching vignette in the BSgenome software
## package for some examples of genome-wide motif searching using
## Biostrings and the BSgenome data packages:
if (interactive())
    vignette("GenomeSearching", package="BSgenome")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(BSgenome.Mmusculus.UCSC.mm10.masked)
Loading required package: BSgenome
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
Loading required package: BSgenome.Mmusculus.UCSC.mm10
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BSgenome.Mmusculus.UCSC.mm10.masked/package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: BSgenome.Mmusculus.UCSC.mm10.masked
> ### Title: Full masked genome sequences for Mus musculus (UCSC version
> ###   mm10)
> ### Aliases: BSgenome.Mmusculus.UCSC.mm10.masked-package
> ###   BSgenome.Mmusculus.UCSC.mm10.masked
> ### Keywords: package data
> 
> ### ** Examples
> 
> BSgenome.Mmusculus.UCSC.mm10.masked
Mouse genome:
# organism: Mus musculus (Mouse)
# provider: UCSC
# provider version: mm10
# release date: Dec. 2011
# release name: Genome Reference Consortium GRCm38
# 66 sequences:
#   chr1                 chr2                 chr3                
#   chr4                 chr5                 chr6                
#   chr7                 chr8                 chr9                
#   chr10                chr11                chr12               
#   chr13                chr14                chr15               
#   ...                  ...                  ...                 
#   chrUn_GL456372       chrUn_GL456378       chrUn_GL456379      
#   chrUn_GL456381       chrUn_GL456382       chrUn_GL456383      
#   chrUn_GL456385       chrUn_GL456387       chrUn_GL456389      
#   chrUn_GL456390       chrUn_GL456392       chrUn_GL456393      
#   chrUn_GL456394       chrUn_GL456396       chrUn_JH584304      
# (use 'seqnames()' to see all the sequence names, use the '$' or '[[' operator
# to access a given sequence)
> genome <- BSgenome.Mmusculus.UCSC.mm10.masked
> seqlengths(genome)
                chr1                 chr2                 chr3 
           195471971            182113224            160039680 
                chr4                 chr5                 chr6 
           156508116            151834684            149736546 
                chr7                 chr8                 chr9 
           145441459            129401213            124595110 
               chr10                chr11                chr12 
           130694993            122082543            120129022 
               chr13                chr14                chr15 
           120421639            124902244            104043685 
               chr16                chr17                chr18 
            98207768             94987271             90702639 
               chr19                 chrX                 chrY 
            61431566            171031299             91744698 
                chrM chr1_GL456210_random chr1_GL456211_random 
               16299               169725               241735 
chr1_GL456212_random chr1_GL456213_random chr1_GL456221_random 
              153618                39340               206961 
chr4_GL456216_random chr4_GL456350_random chr4_JH584292_random 
               66673               227966                14945 
chr4_JH584293_random chr4_JH584294_random chr4_JH584295_random 
              207968               191905                 1976 
chr5_GL456354_random chr5_JH584296_random chr5_JH584297_random 
              195993               199368               205776 
chr5_JH584298_random chr5_JH584299_random chr7_GL456219_random 
              184189               953012               175968 
chrX_GL456233_random chrY_JH584300_random chrY_JH584301_random 
              336933               182347               259875 
chrY_JH584302_random chrY_JH584303_random       chrUn_GL456239 
              155838               158099                40056 
      chrUn_GL456359       chrUn_GL456360       chrUn_GL456366 
               22974                31704                47073 
      chrUn_GL456367       chrUn_GL456368       chrUn_GL456370 
               42057                20208                26764 
      chrUn_GL456372       chrUn_GL456378       chrUn_GL456379 
               28664                31602                72385 
      chrUn_GL456381       chrUn_GL456382       chrUn_GL456383 
               25871                23158                38659 
      chrUn_GL456385       chrUn_GL456387       chrUn_GL456389 
               35240                24685                28772 
      chrUn_GL456390       chrUn_GL456392       chrUn_GL456393 
               24668                23629                55711 
      chrUn_GL456394       chrUn_GL456396       chrUn_JH584304 
               24323                21240               114452 
> genome$chr1  # a MaskedDNAString object!
  195471971-letter "MaskedDNAString" instance (# for masking)
seq: ####################################...####################################
masks:
  maskedwidth maskedratio active names                             desc
1     3562779  0.01822655   TRUE AGAPS                    assembly gaps
2           0  0.00000000   TRUE   AMB intra-contig ambiguities (empty)
all masks together:
  maskedwidth maskedratio
      3562779  0.01822655
> ## To get rid of the masks altogether:
> unmasked(genome$chr1)  # same as BSgenome.Mmusculus.UCSC.mm10$chr1
  195471971-letter "DNAString" instance
seq: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> 
> if ("AGAPS" %in% masknames(genome)) {
+ 
+   ## Check that the assembly gaps contain only Ns:
+   checkOnlyNsInGaps <- function(seq)
+   {
+     ## Replace all masks by the inverted AGAPS mask
+     masks(seq) <- gaps(masks(seq)["AGAPS"])
+     unique_letters <- uniqueLetters(seq)
+     if (any(unique_letters != "N"))
+         stop("assembly gaps contain more than just Ns")
+   }
+ 
+   ## A message will be printed each time a sequence is removed
+   ## from the cache:
+   options(verbose=TRUE)
+ 
+   for (seqname in seqnames(genome)) {
+     cat("Checking sequence", seqname, "... ")
+     seq <- genome[[seqname]]
+     checkOnlyNsInGaps(seq)
+     cat("OK\n")
+   }
+ }
Checking sequence chr1 ... OK
Checking sequence chr2 ... caching chr2
OK
Checking sequence chr3 ... caching chr3
OK
Checking sequence chr4 ... uncaching chr2
caching chr4
OK
Checking sequence chr5 ... uncaching chr3
caching chr5
OK
Checking sequence chr6 ... caching chr6
OK
Checking sequence chr7 ... caching chr7
OK
Checking sequence chr8 ... uncaching chr6
uncaching chr5
uncaching chr4
caching chr8
OK
Checking sequence chr9 ... caching chr9
OK
Checking sequence chr10 ... caching chr10
OK
Checking sequence chr11 ... uncaching chr9
uncaching chr8
uncaching chr7
caching chr11
OK
Checking sequence chr12 ... caching chr12
OK
Checking sequence chr13 ... caching chr13
OK
Checking sequence chr14 ... caching chr14
OK
Checking sequence chr15 ... caching chr15
OK
Checking sequence chr16 ... caching chr16
OK
Checking sequence chr17 ... uncaching chr15
uncaching chr14
uncaching chr13
uncaching chr12
uncaching chr11
uncaching chr10
caching chr17
OK
Checking sequence chr18 ... caching chr18
OK
Checking sequence chr19 ... caching chr19
OK
Checking sequence chrX ... caching chrX
OK
Checking sequence chrY ... caching chrY
OK
Checking sequence chrM ... caching chrM
OK
Checking sequence chr1_GL456210_random ... caching chr1_GL456210_random
OK
Checking sequence chr1_GL456211_random ... caching chr1_GL456211_random
OK
Checking sequence chr1_GL456212_random ... caching chr1_GL456212_random
OK
Checking sequence chr1_GL456213_random ... caching chr1_GL456213_random
OK
Checking sequence chr1_GL456221_random ... caching chr1_GL456221_random
OK
Checking sequence chr4_GL456216_random ... caching chr4_GL456216_random
OK
Checking sequence chr4_GL456350_random ... caching chr4_GL456350_random
OK
Checking sequence chr4_JH584292_random ... caching chr4_JH584292_random
OK
Checking sequence chr4_JH584293_random ... uncaching chr4_GL456350_random
uncaching chr4_GL456216_random
uncaching chr1_GL456221_random
uncaching chr1_GL456213_random
uncaching chr1_GL456212_random
uncaching chr1_GL456211_random
uncaching chr1_GL456210_random
uncaching chrM
uncaching chrY
uncaching chrX
uncaching chr19
uncaching chr18
caching chr4_JH584293_random
OK
Checking sequence chr4_JH584294_random ... caching chr4_JH584294_random
OK
Checking sequence chr4_JH584295_random ... caching chr4_JH584295_random
OK
Checking sequence chr5_GL456354_random ... caching chr5_GL456354_random
OK
Checking sequence chr5_JH584296_random ... caching chr5_JH584296_random
OK
Checking sequence chr5_JH584297_random ... caching chr5_JH584297_random
OK
Checking sequence chr5_JH584298_random ... caching chr5_JH584298_random
OK
Checking sequence chr5_JH584299_random ... caching chr5_JH584299_random
OK
Checking sequence chr7_GL456219_random ... caching chr7_GL456219_random
OK
Checking sequence chrX_GL456233_random ... caching chrX_GL456233_random
OK
Checking sequence chrY_JH584300_random ... caching chrY_JH584300_random
OK
Checking sequence chrY_JH584301_random ... caching chrY_JH584301_random
OK
Checking sequence chrY_JH584302_random ... caching chrY_JH584302_random
OK
Checking sequence chrY_JH584303_random ... caching chrY_JH584303_random
OK
Checking sequence chrUn_GL456239 ... uncaching chrY_JH584302_random
uncaching chrY_JH584301_random
uncaching chrY_JH584300_random
uncaching chrX_GL456233_random
uncaching chr7_GL456219_random
uncaching chr5_JH584299_random
uncaching chr5_JH584298_random
uncaching chr5_JH584297_random
uncaching chr5_JH584296_random
uncaching chr5_GL456354_random
uncaching chr4_JH584295_random
uncaching chr4_JH584294_random
uncaching chr4_JH584293_random
uncaching chr4_JH584292_random
caching chrUn_GL456239
OK
Checking sequence chrUn_GL456359 ... caching chrUn_GL456359
OK
Checking sequence chrUn_GL456360 ... caching chrUn_GL456360
OK
Checking sequence chrUn_GL456366 ... caching chrUn_GL456366
OK
Checking sequence chrUn_GL456367 ... caching chrUn_GL456367
OK
Checking sequence chrUn_GL456368 ... caching chrUn_GL456368
OK
Checking sequence chrUn_GL456370 ... caching chrUn_GL456370
OK
Checking sequence chrUn_GL456372 ... caching chrUn_GL456372
OK
Checking sequence chrUn_GL456378 ... caching chrUn_GL456378
OK
Checking sequence chrUn_GL456379 ... caching chrUn_GL456379
OK
Checking sequence chrUn_GL456381 ... caching chrUn_GL456381
OK
Checking sequence chrUn_GL456382 ... caching chrUn_GL456382
OK
Checking sequence chrUn_GL456383 ... caching chrUn_GL456383
OK
Checking sequence chrUn_GL456385 ... caching chrUn_GL456385
uncaching chrUn_GL456383
uncaching chrUn_GL456382
uncaching chrUn_GL456381
uncaching chrUn_GL456379
uncaching chrUn_GL456378
uncaching chrUn_GL456372
uncaching chrUn_GL456370
uncaching chrUn_GL456368
uncaching chrUn_GL456367
uncaching chrUn_GL456366
uncaching chrUn_GL456360
uncaching chrUn_GL456359
OK
Checking sequence chrUn_GL456387 ... caching chrUn_GL456387
OK
Checking sequence chrUn_GL456389 ... caching chrUn_GL456389
OK
Checking sequence chrUn_GL456390 ... caching chrUn_GL456390
OK
Checking sequence chrUn_GL456392 ... caching chrUn_GL456392
OK
Checking sequence chrUn_GL456393 ... caching chrUn_GL456393
OK
Checking sequence chrUn_GL456394 ... caching chrUn_GL456394
OK
Checking sequence chrUn_GL456396 ... caching chrUn_GL456396
OK
Checking sequence chrUn_JH584304 ... caching chrUn_JH584304
OK
> 
> ## See the GenomeSearching vignette in the BSgenome software
> ## package for some examples of genome-wide motif searching using
> ## Biostrings and the BSgenome data packages:
> #if (interactive())
>     vignette("GenomeSearching", package="BSgenome")
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
> 
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/%{ <-- HERE (.*?)}/ at /usr/bin/run-mailcap line 528.

(atril:27094): GLib-GObject-WARNING **: invalid uninstantiatable type '(null)' in cast to 'EvMediaPlayerKeys'

(atril:27094): GLib-GObject-WARNING **: invalid unclassed pointer in cast to 'TotemScrsaver'