R: Plotting density maps of sequence pattern occurrence
plotPatternDensityMap
R Documentation
Plotting density maps of sequence pattern occurrence
Description
Plots density of sequence pattern occurrences in an ordered set of sequences of
the same length in the form of a two dimensional map centered at a common
reference position. Multiple sequence patterns can be processed at once and one
plot per pattern will be created with the same color scale across all plots,
allowing visual density comparison across different patterns.
A DNAStringSet object. Set of sequences of the same length
for which the patterns occurrence density should be visualised.
patterns
Character vector specifying one or more DNA sequence patterns
(oligonucleotides). IUPAC ambiguity codes can be used and will match any
letter in the subject that is associated with the code.
seqOrder
Integer vector specifying the order of the provided input sequences.
Must have the same length as the number of sequences in the
regionSeq. Input sequences will be sorted according to this index
in an ascending order form top to the bottom of the plot, i.e.
the sequence labeled with the lowest number will appear at the top of
the plot. The default value will order the sequences as they are ordered
in the input regionSeq object.
flankUp, flankDown
The number of base-pairs upstream and downstream of the reference
position in the provided sequences, respectively.
flankUp + flankDown must sum up to the length of the sequences.
If no values are provided both flankUp and flankDown are
set to be half of the length of the input sequences, i.e. the
reference position is assumed to be in the middle of the sequences.
nBin
Numeric vector with two values containing the number of equally spaced
points in each direction over which the density is to be estimated. The
first value specifies number of bins along x-axis, i.e. along the
nucleotides in the sequence, and the second value specifies the number
of bins along y-axis, i.e. across ordered input sequences. The values
are passed on to the gridsize argument of the
bkde2D function to compute a 2D binned kernel density
estimate. If nBin is not specified it will default to
c(n, m), where n is the number of input sequences and
m is the length of sequences.
bandWidth
Numeric vector of length 2, containing the bandwidth to be used in each
coordinate direction. The first value specifies the bandwidth along the
x-axis, i.e. along the nucleotides in the sequence, and the
second value specifies the bandwidth along y-axis, i.e. across
ordered input sequences. The values are passed on to the
bandwidth argument of the bkde2D function to
compute a 2D binned kernel density estimate and are used as standard
deviation of the bivariate Gaussian kernel. If bandWidth is not
specified it will default to c(3,3).
color
Character specifying the color palette for the density plot. One of the
following color palettes can be specified: "blue", "brown",
"cyan", "gold", "gray", "green", "pink", "purple", "red". Please refer
to the vignette for the appearance of these palettes.
transf
The function mapping the density scale to the color scale. See Details.
xTicks
Character vector of labels to be placed at the tick-marks on x-axis.
The default NULL value produces five tick-marks: one at the
reference point and two equally spaced tick-marks both upstream and
downstream of the reference point.
xTicksAt
Numeric vector of positions of the tick-marks on the x-axis. The values
can range from 1 (the position of the first base-pair in the sequence)
to input sequence length. The default NULL value produces five
tick-marks: one at the reference point and two equally spaced tick-marks
both upstream and downstream of the reference point.
xLabel
The label for the x-axis. The default is no label, i.e. empty
string.
yTicks
Character vector of labels to be placed at the tick-marks on y-axis.
The default NULL value produces no tick-marks and labels.
yTicksAt
Numeric vector of positions of the tick-marks on the y-axis. The values
can range from 1 (the position of the last sequence on the bottom of the
plot) to input sequence length (the position of the first sequence on
the top of the plot). The default NULL value produces no
tick-marks.
yLabel
The label for the y-axis. The default is no label, i.e. empty
string.
cexAxis
The magnification to be used for axis annotation.
plotScale
Logical, should the scale bar be plotted in the lower left corner of
the plot.
scaleLength
The length of the scale bar to be plotted. Used only when
plotScale = TRUE. If no value is provided, it defaults to one
fifth of the input sequence length.
scaleWidth
The width of the line for the scale bar. Used only when
plotScale = TRUE.
addPatternLabel
Logical, should the pattern label be written in the upper left corner
of the plot.
cexLabel
The magnification to be used for pattern label.
labelCol
The color to be used for pattern label and scale bar.
addReferenceLine
Logical, should the vertical dashed line be drawn at the reference
point.
plotColorLegend
Logical, should the color legend for the pattern density be plotted. If
TRUE a separate .png file named outFile."ColorLegend.png"
will be created, showing mapping of pattern density values to colours.
outFile
Character vector specifying the base name of the output plot file. The
final name of the plot file for each pattern will be
outFile."pattern.png".
plotWidth, plotHeight
Width and height of the density plot(s) in pixels.
useMulticore
Logical, should multicore be used. useMulticore = TRUE is
supported only on Unix-like platforms.
nrCores
Number of cores to use when useMulticore = TRUE. Default value
NULL uses all detected cores.
Value
The function produces PNG files in the working directory, visualising
density of patterns occurrence in the set of ordered input sequences. One
file/plot per specified pattern is created.
Author(s)
Vanja Haberle
References
Haberle et al. (2014) Two independent transcription initiation codes
overlap on vertebrate core promoters, Nature507:381-385.
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(seqPattern)
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/seqPattern/plotPatternDensityMap.Rd_%03d_medium.png", width=480, height=480)
> ### Name: plotPatternDensityMap
> ### Title: Plotting density maps of sequence pattern occurrence
> ### Aliases: plotPatternDensityMap
> ### plotPatternDensityMap,DNAStringSet-method
>
> ### ** Examples
>
> library(GenomicRanges)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
> load(system.file("data", "zebrafishPromoters.RData", package="seqPattern"))
>
> promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth
>
> # dinucleotide patterns
> plotPatternDensityMap(regionsSeq = zebrafishPromoters, patterns = c("TA", "GC"),
+ seqOrder = order(promoterWidth), flankUp = 400, flankDown = 600,
+ color = "blue")
Getting oligonucleotide occurrence matrix...
Calculating density...
->TA
->GC
Plotting...
->TA
->GC
>
> # motif consensus sequence
> plotPatternDensityMap(regionsSeq = zebrafishPromoters, patterns = "TATAWAWR",
+ seqOrder = order(promoterWidth), flankUp = 400, flankDown = 600,
+ color = "cyan")
Getting oligonucleotide occurrence matrix...
Calculating density...
->TATAWAWR
Plotting...
->TATAWAWR
>
>
>
>
>
> dev.off()
null device
1
>