R: Occurrence of sequence patterns in a set of ordered sequences
getPatternOccurrenceList
R Documentation
Occurrence of sequence patterns in a set of ordered sequences
Description
Finds positions of specified sequence patterns in a list of sequences of the
same length ordered by a provided index. Sequence patterns can be consensus
sequences of variable length and can contain IUPAC ambiguity code. Position of
each pattern occurrence is specified in two-dimensional matrix, i.e. the
first coordinate provides the ordinal number of the sequence and the second
coordinate gives the position within the sequence where the pattern occurs.
A DNAStringSet object. Set of sequences of the same length
in which to search for the patterns.
patterns
Character vector specifying one or more DNA sequence patterns
(oligonucleotides). IUPAC ambiguity codes can be used and will match any
letter in the subject that is associated with the code.
seqOrder
Integer vector specifying the order of the provided input sequences.
Must have the same length as the number of sequences in the
regionSeq. The default value will order the sequences as they are
ordered in the regionSeq object.
useMulticore
Logical, should multicore be used. useMulticore = TRUE is
supported only on Unix-like platforms.
nrCores
Number of cores to use when useMulticore = TRUE. Default value
NULL uses all detected cores.
Details
This function uses the matchPattern function to find
occurrences of given sequence patterns in a set of input sequences.
Input sequences must all be of the same length and are ordered according to
the index provided in the seqOrder argument, creating a n * m
matrix, where n is the number of sequences and m is the length
of the sequences. Positions of pattern matches in the resulting matrix are
returned as two-dimensional coordinates.
Value
The function returns a named list with one element for each sequence pattern
specified in the patterns argument. Each element of the list is a
data.frame with positions of the corresponding pattern in the set of
input sequences. The input sequences of the same length are sorted according
to the index in seqOrder argument and the positions of pattern
matches in the resulting n * m matrix (where n is the number
of sequences and m is the length of the sequence) are provided. The
sequence column in the data.frame provides the ordinal number of the
sequence in the ordered list of sequences and the position column
provides the start position of the pattern match within that sequence.
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(seqPattern)
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/seqPattern/getPatternOccurrenceList.Rd_%03d_medium.png", width=480, height=480)
> ### Name: getPatternOccurrenceList
> ### Title: Occurrence of sequence patterns in a set of ordered sequences
> ### Aliases: getPatternOccurrenceList
> ### getPatternOccurrenceList,DNAStringSet-method
>
> ### ** Examples
>
> library(GenomicRanges)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
> load(system.file("data", "zebrafishPromoters.RData", package="seqPattern"))
> promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth
>
> # dinucleotide patterns
> patternsOccurrence <- getPatternOccurrenceList(regionsSeq = zebrafishPromoters,
+ patterns = c("TA", "GC"), seqOrder = order(promoterWidth))
> names(patternsOccurrence)
[1] "TA" "GC"
> head(patternsOccurrence[["GC"]])
sequence position value
1 1 16 1
2 1 33 1
3 1 48 1
4 1 98 1
5 1 114 1
6 1 118 1
>
> # motif consensus sequence
> patternsOccurrence <- getPatternOccurrenceList(regionsSeq = zebrafishPromoters,
+ patterns = "TATAWAWR", seqOrder = order(promoterWidth))
> names(patternsOccurrence)
[1] "TATAWAWR"
> head(patternsOccurrence[["TATAWAWR"]])
sequence position value
1 2 54 1
2 2 56 1
3 2 58 1
4 2 60 1
5 2 62 1
6 2 64 1
>
>
>
>
>
> dev.off()
null device
1
>