Last data update: 2014.03.03

R: Motif scanning scores for a set of ordered sequences
motifScanScoresR Documentation

Motif scanning scores for a set of ordered sequences

Description

Provides motif scanning scores along the full length of a sequence for a list of sequences of the same length ordered by a provided index. Motif is specified by a position weight matrix (PWM) that contains estimated probability of base b at position i and is usually constructed via call to PWM function. Scanning scores are returned in the form of a two-dimensional matrix, where the rows are sequences ordered by the specified index and the columns are relative positions within the sequence. Each cell in the matrix contains the score of the specified motif in the given sequence starting at the given position. The resulting matrix can be used to visualise motif occurrences and their strength in an ordered set of sequences centered at a common reference point.

Usage

motifScanScores(regionsSeq, motifPWM, seqOrder = c(1:length(regionsSeq)),
        asPercentage = TRUE)

Arguments

regionsSeq

A DNAStringSet object. Set of sequences of the same length to be scanned with the motif.

motifPWM

A numeric matrix representing the Position Weight Matrix (PWM), such as returned by PWM function. Can contain either probabilities or log2 probability ratio of base b at position i.

seqOrder

Integer vector specifying the order of the provided input sequences. Must have the same length as the number of sequences in the regionSeq. The default value will order the sequences as they are ordered in the input regionSeq object.

asPercentage

Logical, should the scores represent percentage of the maximal motif PWM score (TRUE) or raw scores (FALSE).

Details

This function uses the PWMscoreStartingAt function to get scores for a given motif starting at each position (nucleotide) in a set of input sequences. Input sequences must all be of the same length and are ordered according to the index provided in the seqOrder argument, creating an n * m matrix, where n is the number of sequences and m is the length of the sequences. Each cell in the matrix contains the score of the specified motif in the given sequence starting at the given position.

Value

The function returns a matrix with motif scanning scores for each position in the set of input sequences.

Author(s)

Vanja Haberle

See Also

plotMotifScanScores
motifScanHits

Examples

library(GenomicRanges)
load(system.file("data", "zebrafishPromoters.RData", package="seqPattern"))
promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth

load(system.file("data", "TBPpwm.RData", package="seqPattern"))

motifScores <- motifScanScores(regionsSeq = zebrafishPromoters,
                            motifPWM = TBPpwm, seqOrder = order(promoterWidth),
                            asPercentage = TRUE)
dim(motifScores)
motifScores[1:10,1:10]

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(seqPattern)
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/seqPattern/motifScanScores.Rd_%03d_medium.png", width=480, height=480)
> ### Name: motifScanScores
> ### Title: Motif scanning scores for a set of ordered sequences
> ### Aliases: motifScanScores motifScanScores,DNAStringSet,matrix-method
> 
> ### ** Examples
> 
> library(GenomicRanges)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
> load(system.file("data", "zebrafishPromoters.RData", package="seqPattern"))
> promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth
> 
> load(system.file("data", "TBPpwm.RData", package="seqPattern"))
> 
> motifScores <- motifScanScores(regionsSeq = zebrafishPromoters,
+                             motifPWM = TBPpwm, seqOrder = order(promoterWidth),
+                             asPercentage = TRUE)
There were 12 warnings (use warnings() to see them)
> dim(motifScores)
[1] 1000  993
> motifScores[1:10,1:10]
          [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
 [1,] 65.51940 46.58582 66.84336 40.83923 66.98917 33.95287 66.98917 33.95287
 [2,] 41.05448 75.32756 47.61203 66.91131 41.05448 75.32756 47.61203 66.91131
 [3,] 34.12331 51.64957 61.58158 48.81109 69.67650 63.34393 53.19385 61.45189
 [4,] 64.14539 66.49088 48.89598 48.92937 66.98139 61.57374 66.54518 70.45384
 [5,] 54.13310 36.16330 55.59755 33.38717 69.08229 41.12145 85.11572 53.17832
 [6,] 52.83823 40.70590 54.79243 29.31002 48.70010 38.52361 40.74647 37.75773
 [7,] 72.58866 52.26739 65.75099 74.04721 79.17474 68.47018 67.30881 75.77011
 [8,] 64.06429 43.04457 69.61121 43.40458 48.93457 60.83204 60.61853 72.03804
 [9,] 46.00557 48.28349 58.66297 70.25927 45.78619 69.61317 46.38162 66.98139
[10,] 53.55870 48.51381 43.25460 50.25474 37.45123 58.85735 55.18240 45.15990
          [,9]    [,10]
 [1,] 66.98917 34.64860
 [2,] 41.05448 75.32756
 [3,] 63.78743 64.05097
 [4,] 63.55884 75.19737
 [5,] 81.84003 71.48858
 [6,] 35.71067 28.42917
 [7,] 52.60913 80.70319
 [8,] 64.70370 69.69974
 [9,] 65.32296 71.76799
[10,] 66.94353 53.16711
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>