Last data update: 2014.03.03
R: BSgenome utilities
BSgenome-utils R Documentation
BSgenome utilities
Description
Utilities for BSgenome objects.
Usage
## S4 method for signature 'BSgenome'
matchPWM(pwm, subject, min.score = "80%", exclude = "",
maskList = logical(0))
## S4 method for signature 'BSgenome'
countPWM(pwm, subject, min.score = "80%", exclude = "",
maskList = logical(0))
## S4 method for signature 'BSgenome'
vmatchPattern(pattern, subject, max.mismatch = 0, min.mismatch = 0,
with.indels = FALSE, fixed = TRUE, algorithm = "auto",
exclude = "", maskList = logical(0), userMask =
RangesList(), invertUserMask = FALSE)
## S4 method for signature 'BSgenome'
vcountPattern(pattern, subject, max.mismatch = 0, min.mismatch = 0,
with.indels = FALSE, fixed = TRUE, algorithm = "auto",
exclude = "", maskList = logical(0), userMask =
RangesList(), invertUserMask = FALSE)
## S4 method for signature 'BSgenome'
vmatchPDict(pdict, subject, max.mismatch = 0, min.mismatch = 0,
fixed = TRUE, algorithm = "auto", verbose = FALSE,
exclude = "", maskList = logical(0))
## S4 method for signature 'BSgenome'
vcountPDict(pdict, subject, max.mismatch = 0, min.mismatch = 0,
fixed = TRUE, algorithm = "auto", collapse = FALSE,
weight = 1L, verbose = FALSE, exclude = "", maskList = logical(0))
Arguments
pwm
A numeric matrix with row names A, C, G and T representing a Position
Weight Matrix.
pattern
A DNAString object containing the pattern sequence.
pdict
A DNAStringSet object containing the pattern sequences.
subject
A BSgenome object containing the subject sequences.
min.score
The minimum score for counting a match.
Can be given as a character string containing a percentage (e.g.
"85%"
) of the highest possible score or as a single number.
max.mismatch, min.mismatch
The maximum and minimum number of mismatching letters allowed (see
?`lowlevel-matching`
for the details).
If non-zero, an inexact matching algorithm is used.
with.indels
If TRUE
then indels are allowed. In that case, min.mismatch
must be 0
and max.mismatch
is interpreted as the maximum
"edit distance" allowed between any pattern and any of its matches
(see ?`matchPattern`
for the details).
fixed
If FALSE
then IUPAC extended letters are interpreted as ambiguities
(see ?`lowlevel-matching`
for the details).
algorithm
For vmatchPattern
and vcountPattern
one of the following:
"auto"
, "naive-exact"
, "naive-inexact"
,
"boyer-moore"
, "shift-or"
, or "indels"
.
For vmatchPDict
and vcountPDict
one of the following:
"auto"
, "naive-exact"
, "naive-inexact"
,
"boyer-moore"
, or "shift-or"
.
collapse, weight
ignored arguments.
verbose
TRUE
or FALSE
.
exclude
A character vector with strings that will be used to filter out
chromosomes whose names match these strings.
maskList
A named logical vector of maskStates preferred when used with a
BSGenome object. When using the bsapply function, the masks will
be set to the states in this vector.
userMask
A RangesList, containing a mask to be applied to each
chromosome. See bsapply
.
invertUserMask
Whether the userMask
should be inverted.
Value
A GRanges object for matchPWM
with two
elementMetadata columns: "score" (numeric), and "string" (DNAStringSet).
A GRanges object for vmatchPattern
.
A GRanges object for vmatchPDict
with
one elementMetadata column: "index", which represents a mapping to a
position in the original pattern dictionary.
A data.frame object for countPWM
and vcountPattern
with three columns: "seqname" (factor), "strand" (factor), and
"count" (integer).
A DataFrame object for vcountPDict
with four
columns: "seqname" ('factor' Rle), "strand" ('factor' Rle),
"index" (integer) and "count" ('integer' Rle). As with vmatchPDict
the index column represents a mapping to a position in the original
pattern dictionary.
Author(s)
P. Aboyoun
See Also
matchPWM
,
matchPattern
,
matchPDict
,
bsapply
Examples
library(BSgenome.Celegans.UCSC.ce2)
data(HNF4alpha)
pwm <- PWM(HNF4alpha)
matchPWM(pwm, Celegans)
countPWM(pwm, Celegans)
pattern <- consensusString(HNF4alpha)
vmatchPattern(pattern, Celegans, fixed = "subject")
vcountPattern(pattern, Celegans, fixed = "subject")
vmatchPDict(HNF4alpha[1:10], Celegans)
vcountPDict(HNF4alpha[1:10], Celegans)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(BSgenome)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BSgenome/BSgenome-utils.Rd_%03d_medium.png", width=480, height=480)
> ### Name: BSgenome-utils
> ### Title: BSgenome utilities
> ### Aliases: BSgenome-utils matchPWM,BSgenome-method
> ### countPWM,BSgenome-method vmatchPattern,BSgenome-method
> ### vcountPattern,BSgenome-method vmatchPDict,BSgenome-method
> ### vcountPDict,BSgenome-method
> ### Keywords: methods utilities
>
> ### ** Examples
>
> library(BSgenome.Celegans.UCSC.ce2)
> data(HNF4alpha)
>
> pwm <- PWM(HNF4alpha)
> matchPWM(pwm, Celegans)
GRanges object with 154706 ranges and 2 metadata columns:
seqnames ranges strand | score string
<Rle> <IRanges> <Rle> | <numeric> <DNAStringSet>
[1] chrI [ 3596, 3608] + | 0.8017528 GGGGCAAAATTAG
[2] chrI [ 3880, 3892] + | 0.8291304 GGATTAGAGTTCT
[3] chrI [ 5158, 5170] + | 0.8208024 ATTTCAAAGTTTT
[4] chrI [ 6128, 6140] + | 0.8540097 GGTGCAAACGTCT
[5] chrI [10039, 10051] + | 0.8078732 GGTCTAAAATTCT
... ... ... ... . ... ...
[154702] chrM [ 2382, 2394] - | 0.8014127 CTGGCAAACTCCA
[154703] chrM [ 2578, 2590] - | 0.8197826 TGGTAAAAGTTTA
[154704] chrM [ 7934, 7946] - | 0.9522937 AGATCAAAGTCCA
[154705] chrM [10543, 10555] - | 0.8020190 ATCACAAAGGTCG
[154706] chrM [10816, 10828] - | 0.8612522 AGGGCAGAAGCCA
-------
seqinfo: 7 sequences from an unspecified genome
Warning messages:
1: In .Call2("XString_match_PWM", pwm, subject, min.score, count.only, :
'subject' contains letters not in [ACGT] ==> assigned weight 0 to them
2: In .Call2("XString_match_PWM", pwm, subject, min.score, count.only, :
'subject' contains letters not in [ACGT] ==> assigned weight 0 to them
3: In .Call2("XString_match_PWM", pwm, subject, min.score, count.only, :
'subject' contains letters not in [ACGT] ==> assigned weight 0 to them
4: In .Call2("XString_match_PWM", pwm, subject, min.score, count.only, :
'subject' contains letters not in [ACGT] ==> assigned weight 0 to them
> countPWM(pwm, Celegans)
seqname strand count
1 chrI + 11234
2 chrI - 11168
3 chrII + 12082
4 chrII - 12203
5 chrIII + 10342
6 chrIII - 10360
7 chrIV + 12878
8 chrIV - 13188
9 chrV + 16688
10 chrV - 16372
11 chrX + 14139
12 chrX - 14040
13 chrM + 7
14 chrM - 5
Warning messages:
1: In .Call2("XString_match_PWM", pwm, subject, min.score, count.only, :
'subject' contains letters not in [ACGT] ==> assigned weight 0 to them
2: In .Call2("XString_match_PWM", pwm, subject, min.score, count.only, :
'subject' contains letters not in [ACGT] ==> assigned weight 0 to them
3: In .Call2("XString_match_PWM", pwm, subject, min.score, count.only, :
'subject' contains letters not in [ACGT] ==> assigned weight 0 to them
4: In .Call2("XString_match_PWM", pwm, subject, min.score, count.only, :
'subject' contains letters not in [ACGT] ==> assigned weight 0 to them
>
> pattern <- consensusString(HNF4alpha)
> vmatchPattern(pattern, Celegans, fixed = "subject")
GRanges object with 123 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chrI [5249651, 5249663] +
[2] chrI [5858867, 5858879] +
[3] chrI [6956753, 6956765] +
[4] chrI [8927423, 8927435] +
[5] chrI [9260962, 9260974] +
... ... ... ...
[119] chrX [ 9436287, 9436299] -
[120] chrX [10737397, 10737409] -
[121] chrX [12030747, 12030759] -
[122] chrX [13784772, 13784784] -
[123] chrX [16717792, 16717804] -
-------
seqinfo: 7 sequences from an unspecified genome
> vcountPattern(pattern, Celegans, fixed = "subject")
seqname strand count
1 chrI + 7
2 chrI - 7
3 chrII + 12
4 chrII - 8
5 chrIII + 8
6 chrIII - 9
7 chrIV + 15
8 chrIV - 9
9 chrV + 7
10 chrV - 13
11 chrX + 15
12 chrX - 13
13 chrM + 0
14 chrM - 0
>
> vmatchPDict(HNF4alpha[1:10], Celegans)
GRanges object with 14 ranges and 1 metadata column:
seqnames ranges strand | index
<Rle> <IRanges> <Rle> | <Rle>
[1] chrI [10714238, 10714250] + | 1
[2] chrI [10242063, 10242075] - | 1
[3] chrI [ 995608, 995620] - | 3
[4] chrIII [ 360758, 360770] + | 1
[5] chrIII [ 9996856, 9996868] - | 1
... ... ... ... . ...
[10] chrV [19656881, 19656893] + | 2
[11] chrV [ 2789885, 2789897] - | 1
[12] chrV [ 6905706, 6905718] - | 1
[13] chrV [ 8901596, 8901608] - | 1
[14] chrX [ 8870857, 8870869] + | 8
-------
seqinfo: 7 sequences from an unspecified genome
> vcountPDict(HNF4alpha[1:10], Celegans)
DataFrame with 140 rows and 4 columns
seqname strand index count
<Rle> <Rle> <integer> <Rle>
1 chrI + 1 1
2 chrI + 2 0
3 chrI + 3 0
4 chrI + 4 0
5 chrI + 5 0
... ... ... ... ...
136 chrM - 6 0
137 chrM - 7 0
138 chrM - 8 0
139 chrM - 9 0
140 chrM - 10 0
>
>
>
>
>
> dev.off()
null device
1
>