Last data update: 2014.03.03

R: BSgenome utilities
BSgenome-utilsR Documentation

BSgenome utilities

Description

Utilities for BSgenome objects.

Usage

## S4 method for signature 'BSgenome'
matchPWM(pwm, subject, min.score = "80%", exclude = "",
       maskList = logical(0))
## S4 method for signature 'BSgenome'
countPWM(pwm, subject, min.score = "80%", exclude = "", 
       maskList = logical(0))
## S4 method for signature 'BSgenome'
vmatchPattern(pattern, subject, max.mismatch = 0, min.mismatch = 0,
            with.indels = FALSE, fixed = TRUE, algorithm = "auto",
            exclude = "", maskList = logical(0),  userMask =
               RangesList(), invertUserMask = FALSE)
## S4 method for signature 'BSgenome'
vcountPattern(pattern, subject, max.mismatch = 0, min.mismatch = 0,
            with.indels = FALSE, fixed = TRUE, algorithm = "auto",
            exclude = "", maskList = logical(0),  userMask =
               RangesList(), invertUserMask = FALSE)
## S4 method for signature 'BSgenome'
vmatchPDict(pdict, subject, max.mismatch = 0, min.mismatch = 0,
          fixed = TRUE, algorithm = "auto", verbose = FALSE,
          exclude = "", maskList = logical(0))
## S4 method for signature 'BSgenome'
vcountPDict(pdict, subject, max.mismatch = 0, min.mismatch = 0,
          fixed = TRUE, algorithm = "auto", collapse = FALSE,
          weight = 1L, verbose = FALSE, exclude = "", maskList = logical(0))

Arguments

pwm

A numeric matrix with row names A, C, G and T representing a Position Weight Matrix.

pattern

A DNAString object containing the pattern sequence.

pdict

A DNAStringSet object containing the pattern sequences.

subject

A BSgenome object containing the subject sequences.

min.score

The minimum score for counting a match. Can be given as a character string containing a percentage (e.g. "85%") of the highest possible score or as a single number.

max.mismatch, min.mismatch

The maximum and minimum number of mismatching letters allowed (see ?`lowlevel-matching` for the details). If non-zero, an inexact matching algorithm is used.

with.indels

If TRUE then indels are allowed. In that case, min.mismatch must be 0 and max.mismatch is interpreted as the maximum "edit distance" allowed between any pattern and any of its matches (see ?`matchPattern` for the details).

fixed

If FALSE then IUPAC extended letters are interpreted as ambiguities (see ?`lowlevel-matching` for the details).

algorithm

For vmatchPattern and vcountPattern one of the following: "auto", "naive-exact", "naive-inexact", "boyer-moore", "shift-or", or "indels".

For vmatchPDict and vcountPDict one of the following: "auto", "naive-exact", "naive-inexact", "boyer-moore", or "shift-or".

collapse, weight

ignored arguments.

verbose

TRUE or FALSE.

exclude

A character vector with strings that will be used to filter out chromosomes whose names match these strings.

maskList

A named logical vector of maskStates preferred when used with a BSGenome object. When using the bsapply function, the masks will be set to the states in this vector.

userMask

A RangesList, containing a mask to be applied to each chromosome. See bsapply.

invertUserMask

Whether the userMask should be inverted.

Value

A GRanges object for matchPWM with two elementMetadata columns: "score" (numeric), and "string" (DNAStringSet).

A GRanges object for vmatchPattern.

A GRanges object for vmatchPDict with one elementMetadata column: "index", which represents a mapping to a position in the original pattern dictionary.

A data.frame object for countPWM and vcountPattern with three columns: "seqname" (factor), "strand" (factor), and "count" (integer).

A DataFrame object for vcountPDict with four columns: "seqname" ('factor' Rle), "strand" ('factor' Rle), "index" (integer) and "count" ('integer' Rle). As with vmatchPDict the index column represents a mapping to a position in the original pattern dictionary.

Author(s)

P. Aboyoun

See Also

matchPWM, matchPattern, matchPDict, bsapply

Examples

  library(BSgenome.Celegans.UCSC.ce2)
  data(HNF4alpha)

  pwm <- PWM(HNF4alpha)
  matchPWM(pwm, Celegans)
  countPWM(pwm, Celegans)

  pattern <- consensusString(HNF4alpha)
  vmatchPattern(pattern, Celegans, fixed = "subject")
  vcountPattern(pattern, Celegans, fixed = "subject")

  vmatchPDict(HNF4alpha[1:10], Celegans)
  vcountPDict(HNF4alpha[1:10], Celegans)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(BSgenome)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BSgenome/BSgenome-utils.Rd_%03d_medium.png", width=480, height=480)
> ### Name: BSgenome-utils
> ### Title: BSgenome utilities
> ### Aliases: BSgenome-utils matchPWM,BSgenome-method
> ###   countPWM,BSgenome-method vmatchPattern,BSgenome-method
> ###   vcountPattern,BSgenome-method vmatchPDict,BSgenome-method
> ###   vcountPDict,BSgenome-method
> ### Keywords: methods utilities
> 
> ### ** Examples
> 
>   library(BSgenome.Celegans.UCSC.ce2)
>   data(HNF4alpha)
> 
>   pwm <- PWM(HNF4alpha)
>   matchPWM(pwm, Celegans)
GRanges object with 154706 ranges and 2 metadata columns:
           seqnames         ranges strand |     score         string
              <Rle>      <IRanges>  <Rle> | <numeric> <DNAStringSet>
       [1]     chrI [ 3596,  3608]      + | 0.8017528  GGGGCAAAATTAG
       [2]     chrI [ 3880,  3892]      + | 0.8291304  GGATTAGAGTTCT
       [3]     chrI [ 5158,  5170]      + | 0.8208024  ATTTCAAAGTTTT
       [4]     chrI [ 6128,  6140]      + | 0.8540097  GGTGCAAACGTCT
       [5]     chrI [10039, 10051]      + | 0.8078732  GGTCTAAAATTCT
       ...      ...            ...    ... .       ...            ...
  [154702]     chrM [ 2382,  2394]      - | 0.8014127  CTGGCAAACTCCA
  [154703]     chrM [ 2578,  2590]      - | 0.8197826  TGGTAAAAGTTTA
  [154704]     chrM [ 7934,  7946]      - | 0.9522937  AGATCAAAGTCCA
  [154705]     chrM [10543, 10555]      - | 0.8020190  ATCACAAAGGTCG
  [154706]     chrM [10816, 10828]      - | 0.8612522  AGGGCAGAAGCCA
  -------
  seqinfo: 7 sequences from an unspecified genome
Warning messages:
1: In .Call2("XString_match_PWM", pwm, subject, min.score, count.only,  :
  'subject' contains letters not in [ACGT] ==> assigned weight 0 to them
2: In .Call2("XString_match_PWM", pwm, subject, min.score, count.only,  :
  'subject' contains letters not in [ACGT] ==> assigned weight 0 to them
3: In .Call2("XString_match_PWM", pwm, subject, min.score, count.only,  :
  'subject' contains letters not in [ACGT] ==> assigned weight 0 to them
4: In .Call2("XString_match_PWM", pwm, subject, min.score, count.only,  :
  'subject' contains letters not in [ACGT] ==> assigned weight 0 to them
>   countPWM(pwm, Celegans)
   seqname strand count
1     chrI      + 11234
2     chrI      - 11168
3    chrII      + 12082
4    chrII      - 12203
5   chrIII      + 10342
6   chrIII      - 10360
7    chrIV      + 12878
8    chrIV      - 13188
9     chrV      + 16688
10    chrV      - 16372
11    chrX      + 14139
12    chrX      - 14040
13    chrM      +     7
14    chrM      -     5
Warning messages:
1: In .Call2("XString_match_PWM", pwm, subject, min.score, count.only,  :
  'subject' contains letters not in [ACGT] ==> assigned weight 0 to them
2: In .Call2("XString_match_PWM", pwm, subject, min.score, count.only,  :
  'subject' contains letters not in [ACGT] ==> assigned weight 0 to them
3: In .Call2("XString_match_PWM", pwm, subject, min.score, count.only,  :
  'subject' contains letters not in [ACGT] ==> assigned weight 0 to them
4: In .Call2("XString_match_PWM", pwm, subject, min.score, count.only,  :
  'subject' contains letters not in [ACGT] ==> assigned weight 0 to them
> 
>   pattern <- consensusString(HNF4alpha)
>   vmatchPattern(pattern, Celegans, fixed = "subject")
GRanges object with 123 ranges and 0 metadata columns:
        seqnames               ranges strand
           <Rle>            <IRanges>  <Rle>
    [1]     chrI   [5249651, 5249663]      +
    [2]     chrI   [5858867, 5858879]      +
    [3]     chrI   [6956753, 6956765]      +
    [4]     chrI   [8927423, 8927435]      +
    [5]     chrI   [9260962, 9260974]      +
    ...      ...                  ...    ...
  [119]     chrX [ 9436287,  9436299]      -
  [120]     chrX [10737397, 10737409]      -
  [121]     chrX [12030747, 12030759]      -
  [122]     chrX [13784772, 13784784]      -
  [123]     chrX [16717792, 16717804]      -
  -------
  seqinfo: 7 sequences from an unspecified genome
>   vcountPattern(pattern, Celegans, fixed = "subject")
   seqname strand count
1     chrI      +     7
2     chrI      -     7
3    chrII      +    12
4    chrII      -     8
5   chrIII      +     8
6   chrIII      -     9
7    chrIV      +    15
8    chrIV      -     9
9     chrV      +     7
10    chrV      -    13
11    chrX      +    15
12    chrX      -    13
13    chrM      +     0
14    chrM      -     0
> 
>   vmatchPDict(HNF4alpha[1:10], Celegans)
GRanges object with 14 ranges and 1 metadata column:
       seqnames               ranges strand | index
          <Rle>            <IRanges>  <Rle> | <Rle>
   [1]     chrI [10714238, 10714250]      + |     1
   [2]     chrI [10242063, 10242075]      - |     1
   [3]     chrI [  995608,   995620]      - |     3
   [4]   chrIII [  360758,   360770]      + |     1
   [5]   chrIII [ 9996856,  9996868]      - |     1
   ...      ...                  ...    ... .   ...
  [10]     chrV [19656881, 19656893]      + |     2
  [11]     chrV [ 2789885,  2789897]      - |     1
  [12]     chrV [ 6905706,  6905718]      - |     1
  [13]     chrV [ 8901596,  8901608]      - |     1
  [14]     chrX [ 8870857,  8870869]      + |     8
  -------
  seqinfo: 7 sequences from an unspecified genome
>   vcountPDict(HNF4alpha[1:10], Celegans)
DataFrame with 140 rows and 4 columns
    seqname strand     index count
      <Rle>  <Rle> <integer> <Rle>
1      chrI      +         1     1
2      chrI      +         2     0
3      chrI      +         3     0
4      chrI      +         4     0
5      chrI      +         5     0
...     ...    ...       ...   ...
136    chrM      -         6     0
137    chrM      -         7     0
138    chrM      -         8     0
139    chrM      -         9     0
140    chrM      -        10     0
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>