Last data update: 2014.03.03

R: Occurrence of sequence patterns in a set of ordered sequences
getPatternOccurrenceListR Documentation

Occurrence of sequence patterns in a set of ordered sequences

Description

Finds positions of specified sequence patterns in a list of sequences of the same length ordered by a provided index. Sequence patterns can be consensus sequences of variable length and can contain IUPAC ambiguity code. Position of each pattern occurrence is specified in two-dimensional matrix, i.e. the first coordinate provides the ordinal number of the sequence and the second coordinate gives the position within the sequence where the pattern occurs.

Usage

getPatternOccurrenceList(regionsSeq, patterns, seqOrder =
        c(1:length(regionsSeq)), useMulticore = FALSE, nrCores = NULL)

Arguments

regionsSeq

A DNAStringSet object. Set of sequences of the same length in which to search for the patterns.

patterns

Character vector specifying one or more DNA sequence patterns (oligonucleotides). IUPAC ambiguity codes can be used and will match any letter in the subject that is associated with the code.

seqOrder

Integer vector specifying the order of the provided input sequences. Must have the same length as the number of sequences in the regionSeq. The default value will order the sequences as they are ordered in the regionSeq object.

useMulticore

Logical, should multicore be used. useMulticore = TRUE is supported only on Unix-like platforms.

nrCores

Number of cores to use when useMulticore = TRUE. Default value NULL uses all detected cores.

Details

This function uses the matchPattern function to find occurrences of given sequence patterns in a set of input sequences. Input sequences must all be of the same length and are ordered according to the index provided in the seqOrder argument, creating a n * m matrix, where n is the number of sequences and m is the length of the sequences. Positions of pattern matches in the resulting matrix are returned as two-dimensional coordinates.

Value

The function returns a named list with one element for each sequence pattern specified in the patterns argument. Each element of the list is a data.frame with positions of the corresponding pattern in the set of input sequences. The input sequences of the same length are sorted according to the index in seqOrder argument and the positions of pattern matches in the resulting n * m matrix (where n is the number of sequences and m is the length of the sequence) are provided. The sequence column in the data.frame provides the ordinal number of the sequence in the ordered list of sequences and the position column provides the start position of the pattern match within that sequence.

Author(s)

Vanja Haberle

See Also

plotPatternDensityMap
motifScanHits

Examples

library(GenomicRanges)
load(system.file("data", "zebrafishPromoters.RData", package="seqPattern"))
promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth

# dinucleotide patterns
patternsOccurrence <- getPatternOccurrenceList(regionsSeq = zebrafishPromoters,
                    patterns = c("TA", "GC"), seqOrder = order(promoterWidth))
names(patternsOccurrence)
head(patternsOccurrence[["GC"]])

# motif consensus sequence
patternsOccurrence <- getPatternOccurrenceList(regionsSeq = zebrafishPromoters,
                    patterns = "TATAWAWR", seqOrder = order(promoterWidth))
names(patternsOccurrence)
head(patternsOccurrence[["TATAWAWR"]])

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(seqPattern)
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/seqPattern/getPatternOccurrenceList.Rd_%03d_medium.png", width=480, height=480)
> ### Name: getPatternOccurrenceList
> ### Title: Occurrence of sequence patterns in a set of ordered sequences
> ### Aliases: getPatternOccurrenceList
> ###   getPatternOccurrenceList,DNAStringSet-method
> 
> ### ** Examples
> 
> library(GenomicRanges)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
> load(system.file("data", "zebrafishPromoters.RData", package="seqPattern"))
> promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth
> 
> # dinucleotide patterns
> patternsOccurrence <- getPatternOccurrenceList(regionsSeq = zebrafishPromoters,
+                     patterns = c("TA", "GC"), seqOrder = order(promoterWidth))
> names(patternsOccurrence)
[1] "TA" "GC"
> head(patternsOccurrence[["GC"]])
  sequence position value
1        1       16     1
2        1       33     1
3        1       48     1
4        1       98     1
5        1      114     1
6        1      118     1
> 
> # motif consensus sequence
> patternsOccurrence <- getPatternOccurrenceList(regionsSeq = zebrafishPromoters,
+                     patterns = "TATAWAWR", seqOrder = order(promoterWidth))
> names(patternsOccurrence)
[1] "TATAWAWR"
> head(patternsOccurrence[["TATAWAWR"]])
  sequence position value
1        2       54     1
2        2       56     1
3        2       58     1
4        2       60     1
5        2       62     1
6        2       64     1
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>