R: Full masked genome sequences for Homo sapiens (UCSC version...
BSgenome.Hsapiens.UCSC.hg19.masked
R Documentation
Full masked genome sequences for Homo sapiens (UCSC version hg19)
Description
Full genome sequences for Homo sapiens (Human) as provided by UCSC (hg19, Feb. 2009) and stored in Biostrings objects. The sequences are the same as in BSgenome.Hsapiens.UCSC.hg19, except that each of them has the 4 following masks on top: (1) the mask of assembly gaps (AGAPS mask), (2) the mask of intra-contig ambiguities (AMB mask), (3) the mask of repeats from RepeatMasker (RM mask), and (4) the mask of repeats from Tandem Repeats Finder (TRF mask). Only the AGAPS and AMB masks are "active" by default.
Note
The masks in this BSgenome data package were made from the following
source data files:
See ?BSgenome.Hsapiens.UCSC.hg19 in the
BSgenome.Hsapiens.UCSC.hg19 package for information about how the sequences
were obtained.
See ?BSgenomeForge and the BSgenomeForge
vignette (vignette("BSgenomeForge")) in the BSgenome
software package for how to make a BSgenome data package.
Author(s)
The Bioconductor Dev Team
See Also
BSgenome.Hsapiens.UCSC.hg19 in the BSgenome.Hsapiens.UCSC.hg19 package
for information about how the sequences were obtained.
BSgenome objects and the
the available.genomes function
in the BSgenome software package.
MaskedDNAString objects in the Biostrings
package.
The BSgenomeForge vignette (vignette("BSgenomeForge"))
in the BSgenome software package for how to make a BSgenome
data package.
Examples
BSgenome.Hsapiens.UCSC.hg19.masked
genome <- BSgenome.Hsapiens.UCSC.hg19.masked
seqlengths(genome)
genome$chr1 # a MaskedDNAString object!
## To get rid of the masks altogether:
unmasked(genome$chr1) # same as BSgenome.Hsapiens.UCSC.hg19$chr1
if ("AGAPS" %in% masknames(genome)) {
## Check that the assembly gaps contain only Ns:
checkOnlyNsInGaps <- function(seq)
{
## Replace all masks by the inverted AGAPS mask
masks(seq) <- gaps(masks(seq)["AGAPS"])
unique_letters <- uniqueLetters(seq)
if (any(unique_letters != "N"))
stop("assembly gaps contain more than just Ns")
}
## A message will be printed each time a sequence is removed
## from the cache:
options(verbose=TRUE)
for (seqname in seqnames(genome)) {
cat("Checking sequence", seqname, "... ")
seq <- genome[[seqname]]
checkOnlyNsInGaps(seq)
cat("OK\n")
}
}
## See the GenomeSearching vignette in the BSgenome software
## package for some examples of genome-wide motif searching using
## Biostrings and the BSgenome data packages:
if (interactive())
vignette("GenomeSearching", package="BSgenome")
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(BSgenome.Hsapiens.UCSC.hg19.masked)
Loading required package: BSgenome
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
Loading required package: BSgenome.Hsapiens.UCSC.hg19
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BSgenome.Hsapiens.UCSC.hg19.masked/package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: BSgenome.Hsapiens.UCSC.hg19.masked
> ### Title: Full masked genome sequences for Homo sapiens (UCSC version
> ### hg19)
> ### Aliases: BSgenome.Hsapiens.UCSC.hg19.masked-package
> ### BSgenome.Hsapiens.UCSC.hg19.masked
> ### Keywords: package data
>
> ### ** Examples
>
> BSgenome.Hsapiens.UCSC.hg19.masked
Human genome:
# organism: Homo sapiens (Human)
# provider: UCSC
# provider version: hg19
# release date: Feb. 2009
# release name: Genome Reference Consortium GRCh37
# 93 sequences:
# chr1 chr2 chr3
# chr4 chr5 chr6
# chr7 chr8 chr9
# chr10 chr11 chr12
# chr13 chr14 chr15
# ... ... ...
# chrUn_gl000235 chrUn_gl000236 chrUn_gl000237
# chrUn_gl000238 chrUn_gl000239 chrUn_gl000240
# chrUn_gl000241 chrUn_gl000242 chrUn_gl000243
# chrUn_gl000244 chrUn_gl000245 chrUn_gl000246
# chrUn_gl000247 chrUn_gl000248 chrUn_gl000249
# (use 'seqnames()' to see all the sequence names, use the '$' or '[[' operator
# to access a given sequence)
> genome <- BSgenome.Hsapiens.UCSC.hg19.masked
> seqlengths(genome)
chr1 chr2 chr3
249250621 243199373 198022430
chr4 chr5 chr6
191154276 180915260 171115067
chr7 chr8 chr9
159138663 146364022 141213431
chr10 chr11 chr12
135534747 135006516 133851895
chr13 chr14 chr15
115169878 107349540 102531392
chr16 chr17 chr18
90354753 81195210 78077248
chr19 chr20 chr21
59128983 63025520 48129895
chr22 chrX chrY
51304566 155270560 59373566
chrM chr1_gl000191_random chr1_gl000192_random
16571 106433 547496
chr4_ctg9_hap1 chr4_gl000193_random chr4_gl000194_random
590426 189789 191469
chr6_apd_hap1 chr6_cox_hap2 chr6_dbb_hap3
4622290 4795371 4610396
chr6_mann_hap4 chr6_mcf_hap5 chr6_qbl_hap6
4683263 4833398 4611984
chr6_ssto_hap7 chr7_gl000195_random chr8_gl000196_random
4928567 182896 38914
chr8_gl000197_random chr9_gl000198_random chr9_gl000199_random
37175 90085 169874
chr9_gl000200_random chr9_gl000201_random chr11_gl000202_random
187035 36148 40103
chr17_ctg5_hap1 chr17_gl000203_random chr17_gl000204_random
1680828 37498 81310
chr17_gl000205_random chr17_gl000206_random chr18_gl000207_random
174588 41001 4262
chr19_gl000208_random chr19_gl000209_random chr21_gl000210_random
92689 159169 27682
chrUn_gl000211 chrUn_gl000212 chrUn_gl000213
166566 186858 164239
chrUn_gl000214 chrUn_gl000215 chrUn_gl000216
137718 172545 172294
chrUn_gl000217 chrUn_gl000218 chrUn_gl000219
172149 161147 179198
chrUn_gl000220 chrUn_gl000221 chrUn_gl000222
161802 155397 186861
chrUn_gl000223 chrUn_gl000224 chrUn_gl000225
180455 179693 211173
chrUn_gl000226 chrUn_gl000227 chrUn_gl000228
15008 128374 129120
chrUn_gl000229 chrUn_gl000230 chrUn_gl000231
19913 43691 27386
chrUn_gl000232 chrUn_gl000233 chrUn_gl000234
40652 45941 40531
chrUn_gl000235 chrUn_gl000236 chrUn_gl000237
34474 41934 45867
chrUn_gl000238 chrUn_gl000239 chrUn_gl000240
39939 33824 41933
chrUn_gl000241 chrUn_gl000242 chrUn_gl000243
42152 43523 43341
chrUn_gl000244 chrUn_gl000245 chrUn_gl000246
39929 36651 38154
chrUn_gl000247 chrUn_gl000248 chrUn_gl000249
36422 39786 38502
> genome$chr1 # a MaskedDNAString object!
249250621-letter "MaskedDNAString" instance (# for masking)
seq: ####################################...####################################
masks:
maskedwidth maskedratio active names desc
1 23970000 0.09616827 TRUE AGAPS assembly gaps
2 0 0.00000000 TRUE AMB intra-contig ambiguities (empty)
3 114014472 0.45742904 FALSE RM RepeatMasker
4 1581889 0.00634658 FALSE TRF Tandem Repeats Finder [period<=12]
all masks together:
maskedwidth maskedratio
138071094 0.5539448
all active masks together:
maskedwidth maskedratio
23970000 0.09616827
> ## To get rid of the masks altogether:
> unmasked(genome$chr1) # same as BSgenome.Hsapiens.UCSC.hg19$chr1
249250621-letter "DNAString" instance
seq: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>
> if ("AGAPS" %in% masknames(genome)) {
+
+ ## Check that the assembly gaps contain only Ns:
+ checkOnlyNsInGaps <- function(seq)
+ {
+ ## Replace all masks by the inverted AGAPS mask
+ masks(seq) <- gaps(masks(seq)["AGAPS"])
+ unique_letters <- uniqueLetters(seq)
+ if (any(unique_letters != "N"))
+ stop("assembly gaps contain more than just Ns")
+ }
+
+ ## A message will be printed each time a sequence is removed
+ ## from the cache:
+ options(verbose=TRUE)
+
+ for (seqname in seqnames(genome)) {
+ cat("Checking sequence", seqname, "... ")
+ seq <- genome[[seqname]]
+ checkOnlyNsInGaps(seq)
+ cat("OK\n")
+ }
+ }
Checking sequence chr1 ... OK
Checking sequence chr2 ... caching chr2
OK
Checking sequence chr3 ... caching chr3
OK
Checking sequence chr4 ... uncaching chr2
caching chr4
OK
Checking sequence chr5 ... uncaching chr3
caching chr5
OK
Checking sequence chr6 ... caching chr6
OK
Checking sequence chr7 ... caching chr7
OK
Checking sequence chr8 ... uncaching chr6
uncaching chr5
uncaching chr4
caching chr8
OK
Checking sequence chr9 ... caching chr9
OK
Checking sequence chr10 ... caching chr10
OK
Checking sequence chr11 ... caching chr11
OK
Checking sequence chr12 ... uncaching chr10
uncaching chr9
uncaching chr8
uncaching chr7
caching chr12
OK
Checking sequence chr13 ... caching chr13
OK
Checking sequence chr14 ... caching chr14
OK
Checking sequence chr15 ... caching chr15
OK
Checking sequence chr16 ... caching chr16
OK
Checking sequence chr17 ... caching chr17
OK
Checking sequence chr18 ... caching chr18
OK
Checking sequence chr19 ... caching chr19
OK
Checking sequence chr20 ... caching chr20
OK
Checking sequence chr21 ... uncaching chr19
uncaching chr18
uncaching chr17
uncaching chr16
uncaching chr15
uncaching chr14
uncaching chr13
caching chr21
OK
Checking sequence chr22 ... caching chr22
OK
Checking sequence chrX ... caching chrX
OK
Checking sequence chrY ... caching chrY
OK
Checking sequence chrM ... caching chrM
OK
Checking sequence chr1_gl000191_random ... caching chr1_gl000191_random
OK
Checking sequence chr1_gl000192_random ... caching chr1_gl000192_random
OK
Checking sequence chr4_ctg9_hap1 ... caching chr4_ctg9_hap1
OK
Checking sequence chr4_gl000193_random ... caching chr4_gl000193_random
OK
Checking sequence chr4_gl000194_random ... caching chr4_gl000194_random
OK
Checking sequence chr6_apd_hap1 ... caching chr6_apd_hap1
OK
Checking sequence chr6_cox_hap2 ... caching chr6_cox_hap2
OK
Checking sequence chr6_dbb_hap3 ... caching chr6_dbb_hap3
OK
Checking sequence chr6_mann_hap4 ... caching chr6_mann_hap4
OK
Checking sequence chr6_mcf_hap5 ... uncaching chr6_dbb_hap3
uncaching chr6_cox_hap2
uncaching chr6_apd_hap1
uncaching chr4_gl000194_random
uncaching chr4_gl000193_random
uncaching chr4_ctg9_hap1
uncaching chr1_gl000192_random
uncaching chr1_gl000191_random
uncaching chrM
uncaching chrY
uncaching chrX
uncaching chr22
caching chr6_mcf_hap5
OK
Checking sequence chr6_qbl_hap6 ... caching chr6_qbl_hap6
OK
Checking sequence chr6_ssto_hap7 ... caching chr6_ssto_hap7
OK
Checking sequence chr7_gl000195_random ... caching chr7_gl000195_random
OK
Checking sequence chr8_gl000196_random ... caching chr8_gl000196_random
OK
Checking sequence chr8_gl000197_random ... caching chr8_gl000197_random
OK
Checking sequence chr9_gl000198_random ... caching chr9_gl000198_random
OK
Checking sequence chr9_gl000199_random ... caching chr9_gl000199_random
OK
Checking sequence chr9_gl000200_random ... caching chr9_gl000200_random
OK
Checking sequence chr9_gl000201_random ... caching chr9_gl000201_random
OK
Checking sequence chr11_gl000202_random ... caching chr11_gl000202_random
OK
Checking sequence chr17_ctg5_hap1 ... caching chr17_ctg5_hap1
OK
Checking sequence chr17_gl000203_random ... caching chr17_gl000203_random
OK
Checking sequence chr17_gl000204_random ... caching chr17_gl000204_random
OK
Checking sequence chr17_gl000205_random ... uncaching chr17_gl000203_random
uncaching chr17_ctg5_hap1
uncaching chr11_gl000202_random
uncaching chr9_gl000201_random
uncaching chr9_gl000200_random
uncaching chr9_gl000199_random
uncaching chr9_gl000198_random
uncaching chr8_gl000197_random
uncaching chr8_gl000196_random
uncaching chr7_gl000195_random
uncaching chr6_ssto_hap7
uncaching chr6_qbl_hap6
uncaching chr6_mcf_hap5
uncaching chr6_mann_hap4
caching chr17_gl000205_random
OK
Checking sequence chr17_gl000206_random ... caching chr17_gl000206_random
OK
Checking sequence chr18_gl000207_random ... caching chr18_gl000207_random
OK
Checking sequence chr19_gl000208_random ... caching chr19_gl000208_random
OK
Checking sequence chr19_gl000209_random ... caching chr19_gl000209_random
OK
Checking sequence chr21_gl000210_random ... caching chr21_gl000210_random
OK
Checking sequence chrUn_gl000211 ... caching chrUn_gl000211
OK
Checking sequence chrUn_gl000212 ... caching chrUn_gl000212
OK
Checking sequence chrUn_gl000213 ... caching chrUn_gl000213
OK
Checking sequence chrUn_gl000214 ... caching chrUn_gl000214
OK
Checking sequence chrUn_gl000215 ... caching chrUn_gl000215
OK
Checking sequence chrUn_gl000216 ... caching chrUn_gl000216
OK
Checking sequence chrUn_gl000217 ... caching chrUn_gl000217
OK
Checking sequence chrUn_gl000218 ... caching chrUn_gl000218
uncaching chrUn_gl000217
uncaching chrUn_gl000216
uncaching chrUn_gl000215
uncaching chrUn_gl000214
uncaching chrUn_gl000213
uncaching chrUn_gl000212
uncaching chrUn_gl000211
uncaching chr21_gl000210_random
uncaching chr19_gl000209_random
uncaching chr19_gl000208_random
uncaching chr18_gl000207_random
uncaching chr17_gl000206_random
OK
Checking sequence chrUn_gl000219 ... caching chrUn_gl000219
OK
Checking sequence chrUn_gl000220 ... caching chrUn_gl000220
OK
Checking sequence chrUn_gl000221 ... caching chrUn_gl000221
OK
Checking sequence chrUn_gl000222 ... caching chrUn_gl000222
OK
Checking sequence chrUn_gl000223 ... caching chrUn_gl000223
OK
Checking sequence chrUn_gl000224 ... caching chrUn_gl000224
OK
Checking sequence chrUn_gl000225 ... caching chrUn_gl000225
OK
Checking sequence chrUn_gl000226 ... caching chrUn_gl000226
OK
Checking sequence chrUn_gl000227 ... caching chrUn_gl000227
OK
Checking sequence chrUn_gl000228 ... caching chrUn_gl000228
OK
Checking sequence chrUn_gl000229 ... caching chrUn_gl000229
OK
Checking sequence chrUn_gl000230 ... caching chrUn_gl000230
OK
Checking sequence chrUn_gl000231 ... caching chrUn_gl000231
OK
Checking sequence chrUn_gl000232 ... caching chrUn_gl000232
uncaching chrUn_gl000231
uncaching chrUn_gl000230
uncaching chrUn_gl000229
uncaching chrUn_gl000228
uncaching chrUn_gl000227
uncaching chrUn_gl000226
uncaching chrUn_gl000225
uncaching chrUn_gl000224
uncaching chrUn_gl000223
uncaching chrUn_gl000222
uncaching chrUn_gl000221
uncaching chrUn_gl000220
uncaching chrUn_gl000219
OK
Checking sequence chrUn_gl000233 ... caching chrUn_gl000233
OK
Checking sequence chrUn_gl000234 ... caching chrUn_gl000234
OK
Checking sequence chrUn_gl000235 ... caching chrUn_gl000235
OK
Checking sequence chrUn_gl000236 ... caching chrUn_gl000236
OK
Checking sequence chrUn_gl000237 ... caching chrUn_gl000237
OK
Checking sequence chrUn_gl000238 ... caching chrUn_gl000238
OK
Checking sequence chrUn_gl000239 ... caching chrUn_gl000239
OK
Checking sequence chrUn_gl000240 ... caching chrUn_gl000240
OK
Checking sequence chrUn_gl000241 ... caching chrUn_gl000241
OK
Checking sequence chrUn_gl000242 ... caching chrUn_gl000242
OK
Checking sequence chrUn_gl000243 ... caching chrUn_gl000243
OK
Checking sequence chrUn_gl000244 ... caching chrUn_gl000244
OK
Checking sequence chrUn_gl000245 ... caching chrUn_gl000245
OK
Checking sequence chrUn_gl000246 ... uncaching chrUn_gl000244
uncaching chrUn_gl000243
uncaching chrUn_gl000242
uncaching chrUn_gl000241
uncaching chrUn_gl000240
uncaching chrUn_gl000239
uncaching chrUn_gl000238
uncaching chrUn_gl000237
uncaching chrUn_gl000236
uncaching chrUn_gl000235
uncaching chrUn_gl000234
uncaching chrUn_gl000233
caching chrUn_gl000246
OK
Checking sequence chrUn_gl000247 ... caching chrUn_gl000247
OK
Checking sequence chrUn_gl000248 ... caching chrUn_gl000248
OK
Checking sequence chrUn_gl000249 ... caching chrUn_gl000249
OK
>
> ## See the GenomeSearching vignette in the BSgenome software
> ## package for some examples of genome-wide motif searching using
> ## Biostrings and the BSgenome data packages:
> #if (interactive())
> vignette("GenomeSearching", package="BSgenome")
>
>
>
>
>
> dev.off()
null device
1
>
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/%{ <-- HERE (.*?)}/ at /usr/bin/run-mailcap line 528.