Last data update: 2014.03.03

R: Full masked genome sequences for Mus musculus (UCSC version...
BSgenome.Mmusculus.UCSC.mm9.maskedR Documentation

Full masked genome sequences for Mus musculus (UCSC version mm9)

Description

Full genome sequences for Mus musculus (Mouse) as provided by UCSC (mm9, Jul. 2007) and stored in Biostrings objects. The sequences are the same as in BSgenome.Mmusculus.UCSC.mm9, except that each of them has the 4 following masks on top: (1) the mask of assembly gaps (AGAPS mask), (2) the mask of intra-contig ambiguities (AMB mask), (3) the mask of repeats from RepeatMasker (RM mask), and (4) the mask of repeats from Tandem Repeats Finder (TRF mask). Only the AGAPS and AMB masks are "active" by default.

Note

The masks in this BSgenome data package were made from the following source data files:

AGAPS masks: all the chr*_gap.txt.gz files from ftp://hgdownload.cse.ucsc.edu/goldenPath/mm9/database/
RM masks: http://hgdownload.cse.ucsc.edu/goldenPath/mm9/bigZips/chromOut.tar.gz
TRF masks: http://hgdownload.cse.ucsc.edu/goldenPath/mm9/bigZips/chromTrf.tar.gz

  

See ?BSgenome.Mmusculus.UCSC.mm9 in the BSgenome.Mmusculus.UCSC.mm9 package for information about how the sequences were obtained.

See ?BSgenomeForge and the BSgenomeForge vignette (vignette("BSgenomeForge")) in the BSgenome software package for how to make a BSgenome data package.

Author(s)

The Bioconductor Dev Team

See Also

  • BSgenome.Mmusculus.UCSC.mm9 in the BSgenome.Mmusculus.UCSC.mm9 package for information about how the sequences were obtained.

  • BSgenome objects and the the available.genomes function in the BSgenome software package.

  • MaskedDNAString objects in the Biostrings package.

  • The BSgenomeForge vignette (vignette("BSgenomeForge")) in the BSgenome software package for how to make a BSgenome data package.

Examples

BSgenome.Mmusculus.UCSC.mm9.masked
genome <- BSgenome.Mmusculus.UCSC.mm9.masked
seqlengths(genome)
genome$chr1  # a MaskedDNAString object!
## To get rid of the masks altogether:
unmasked(genome$chr1)  # same as BSgenome.Mmusculus.UCSC.mm9$chr1

if ("AGAPS" %in% masknames(genome)) {

  ## Check that the assembly gaps contain only Ns:
  checkOnlyNsInGaps <- function(seq)
  {
    ## Replace all masks by the inverted AGAPS mask
    masks(seq) <- gaps(masks(seq)["AGAPS"])
    unique_letters <- uniqueLetters(seq)
    if (any(unique_letters != "N"))
        stop("assembly gaps contain more than just Ns")
  }

  ## A message will be printed each time a sequence is removed
  ## from the cache:
  options(verbose=TRUE)

  for (seqname in seqnames(genome)) {
    cat("Checking sequence", seqname, "... ")
    seq <- genome[[seqname]]
    checkOnlyNsInGaps(seq)
    cat("OK\n")
  }
}

## See the GenomeSearching vignette in the BSgenome software
## package for some examples of genome-wide motif searching using
## Biostrings and the BSgenome data packages:
if (interactive())
    vignette("GenomeSearching", package="BSgenome")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(BSgenome.Mmusculus.UCSC.mm9.masked)
Loading required package: BSgenome
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
Loading required package: BSgenome.Mmusculus.UCSC.mm9
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BSgenome.Mmusculus.UCSC.mm9.masked/package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: BSgenome.Mmusculus.UCSC.mm9.masked
> ### Title: Full masked genome sequences for Mus musculus (UCSC version mm9)
> ### Aliases: BSgenome.Mmusculus.UCSC.mm9.masked-package
> ###   BSgenome.Mmusculus.UCSC.mm9.masked
> ### Keywords: package data
> 
> ### ** Examples
> 
> BSgenome.Mmusculus.UCSC.mm9.masked
Mouse genome:
# organism: Mus musculus (Mouse)
# provider: UCSC
# provider version: mm9
# release date: Jul. 2007
# release name: NCBI Build 37
# 35 sequences:
#   chr1         chr2         chr3         chr4         chr5        
#   chr6         chr7         chr8         chr9         chr10       
#   chr11        chr12        chr13        chr14        chr15       
#   chr16        chr17        chr18        chr19        chrX        
#   chrY         chrM         chr1_random  chr3_random  chr4_random 
#   chr5_random  chr7_random  chr8_random  chr9_random  chr13_random
#   chr16_random chr17_random chrX_random  chrY_random  chrUn_random
# (use 'seqnames()' to see all the sequence names, use the '$' or '[[' operator
# to access a given sequence)
> genome <- BSgenome.Mmusculus.UCSC.mm9.masked
> seqlengths(genome)
        chr1         chr2         chr3         chr4         chr5         chr6 
   197195432    181748087    159599783    155630120    152537259    149517037 
        chr7         chr8         chr9        chr10        chr11        chr12 
   152524553    131738871    124076172    129993255    121843856    121257530 
       chr13        chr14        chr15        chr16        chr17        chr18 
   120284312    125194864    103494974     98319150     95272651     90772031 
       chr19         chrX         chrY         chrM  chr1_random  chr3_random 
    61342430    166650296     15902555        16299      1231697        41899 
 chr4_random  chr5_random  chr7_random  chr8_random  chr9_random chr13_random 
      160594       357350       362490       849593       449403       400311 
chr16_random chr17_random  chrX_random  chrY_random chrUn_random 
        3994       628739      1785075     58682461      5900358 
> genome$chr1  # a MaskedDNAString object!
  197195432-letter "MaskedDNAString" instance (# for masking)
seq: ####################################...GTAAAGAATTTGGTATTAAACTTAAAACTGGAATTC
masks:
  maskedwidth  maskedratio active names                               desc
1     5717956 2.899639e-02   TRUE AGAPS                      assembly gaps
2          47 2.383422e-07   TRUE   AMB           intra-contig ambiguities
3    84650265 4.292709e-01  FALSE    RM                       RepeatMasker
4     4014755 2.035927e-02  FALSE   TRF Tandem Repeats Finder [period<=12]
all masks together:
  maskedwidth maskedratio
     90481616   0.4588424
all active masks together:
  maskedwidth maskedratio
      5718003  0.02899663
> ## To get rid of the masks altogether:
> unmasked(genome$chr1)  # same as BSgenome.Mmusculus.UCSC.mm9$chr1
  197195432-letter "DNAString" instance
seq: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...GTAAAGAATTTGGTATTAAACTTAAAACTGGAATTC
> 
> if ("AGAPS" %in% masknames(genome)) {
+ 
+   ## Check that the assembly gaps contain only Ns:
+   checkOnlyNsInGaps <- function(seq)
+   {
+     ## Replace all masks by the inverted AGAPS mask
+     masks(seq) <- gaps(masks(seq)["AGAPS"])
+     unique_letters <- uniqueLetters(seq)
+     if (any(unique_letters != "N"))
+         stop("assembly gaps contain more than just Ns")
+   }
+ 
+   ## A message will be printed each time a sequence is removed
+   ## from the cache:
+   options(verbose=TRUE)
+ 
+   for (seqname in seqnames(genome)) {
+     cat("Checking sequence", seqname, "... ")
+     seq <- genome[[seqname]]
+     checkOnlyNsInGaps(seq)
+     cat("OK\n")
+   }
+ }
Checking sequence chr1 ... OK
Checking sequence chr2 ... caching chr2
OK
Checking sequence chr3 ... caching chr3
OK
Checking sequence chr4 ... uncaching chr2
caching chr4
OK
Checking sequence chr5 ... uncaching chr3
caching chr5
OK
Checking sequence chr6 ... caching chr6
OK
Checking sequence chr7 ... caching chr7
OK
Checking sequence chr8 ... uncaching chr6
uncaching chr5
uncaching chr4
caching chr8
OK
Checking sequence chr9 ... caching chr9
OK
Checking sequence chr10 ... caching chr10
OK
Checking sequence chr11 ... uncaching chr9
uncaching chr8
uncaching chr7
caching chr11
OK
Checking sequence chr12 ... caching chr12
OK
Checking sequence chr13 ... caching chr13
OK
Checking sequence chr14 ... caching chr14
OK
Checking sequence chr15 ... caching chr15
OK
Checking sequence chr16 ... caching chr16
OK
Checking sequence chr17 ... uncaching chr15
uncaching chr14
uncaching chr13
uncaching chr12
uncaching chr11
uncaching chr10
caching chr17
OK
Checking sequence chr18 ... caching chr18
OK
Checking sequence chr19 ... caching chr19
OK
Checking sequence chrX ... caching chrX
OK
Checking sequence chrY ... caching chrY
OK
Checking sequence chrM ... caching chrM
OK
Checking sequence chr1_random ... caching chr1_random
OK
Checking sequence chr3_random ... caching chr3_random
OK
Checking sequence chr4_random ... caching chr4_random
OK
Checking sequence chr5_random ... caching chr5_random
OK
Checking sequence chr7_random ... caching chr7_random
OK
Checking sequence chr8_random ... caching chr8_random
OK
Checking sequence chr9_random ... caching chr9_random
OK
Checking sequence chr13_random ... caching chr13_random
OK
Checking sequence chr16_random ... uncaching chr9_random
uncaching chr8_random
uncaching chr7_random
uncaching chr5_random
uncaching chr4_random
uncaching chr3_random
uncaching chr1_random
uncaching chrM
uncaching chrY
uncaching chrX
uncaching chr19
uncaching chr18
caching chr16_random
OK
Checking sequence chr17_random ... caching chr17_random
OK
Checking sequence chrX_random ... caching chrX_random
OK
Checking sequence chrY_random ... caching chrY_random
OK
Checking sequence chrUn_random ... caching chrUn_random
OK
> 
> ## See the GenomeSearching vignette in the BSgenome software
> ## package for some examples of genome-wide motif searching using
> ## Biostrings and the BSgenome data packages:
> #if (interactive())
>     vignette("GenomeSearching", package="BSgenome")
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
> 
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/%{ <-- HERE (.*?)}/ at /usr/bin/run-mailcap line 528.