Functions for visualizing prediction profiles,
cross validation result, grid search performance parameters and
receiver operating characteristics
Usage
## S4 method for signature 'PredictionProfile,missing'
plot(x, sel = NULL, col = c("red",
"blue"), standardize = TRUE, shades = NULL, legend = "default",
legendPos = "topright", ylim = NULL, xlab = "", ylab = "weight",
lwd.profile = 1, lwd.axis = 1, las = 1, heptads = FALSE,
annotate = FALSE, markOffset = TRUE, windowSize = 1, ...)
## S4 method for signature 'CrossValidationResult,missing'
plot(x, col = "springgreen")
## S4 method for signature 'ModelSelectionResult,missing'
plot(x, sel = c("ACC", "BACC", "MCC",
"AUC"))
## S4 method for signature 'ROCData,missing'
plot(x, lwd = 2, aucDigits = 3, cex = 0.8,
side = 1, line = -3, adj = 0.9, ...)
Arguments
x
for the first method above a prediction profile object of class
PredictionProfile containing the profiles to be plotted,
for the second method a cross validation result object usually taken
from the trained kebabs model object
sel
an integer vector with one or two entries to select samples
of the prediction profile matrix for plotting, if this parameter is not
supplied by the user the frist one or two samples are selected.
col
a character vector with one or two color names used for plotting
the samples. Default=c("red", "blue").
standardize
logical. If FALSE, the profile values s_i
are displayed as they are with the value y=-b/L superimposed as
a light gray line. If TRUE (default), the whole profile is
shifted by -b/L and the light gray line is displayed at y=0.
shades
vector of at least two color specifications; If not NULL,
the background area above and below the base line y=-b/L are shaded
in colors shades[1] and shades[2], respectively. Default=NULL
legend
a character vector with one or two character strings
containing the legend/description of the profile. If set to an empty vector
or to NULL, no legend is displayed.
legendPos
position specification for the legend(if legend is
specified). Can either be a vector with coordinates or a single keyword like
“topright” (see legend).
ylim
argument that allows the user to preset the y-range of the
profile plot.
xlab
label of horizontal axis, empty by default.
ylab
label of vertical axis, defaults to "weight".
lwd.profile
profile line width as described for parameter lwd
in par
lwd.axis
axis line width as described for parameter lwd
in par
las
see par
heptads
logical indicating whether for proteins with heptad
annotation (i.e. characters a to g, usually in periodic repetition) the
heptad structure should be indicated through vertical lightgray lines
each heptad. Default=FALSE
annotate
logical indicating whether annotation information should
be shown in the center of the plot; Default=FALSE
markOffset
logical indicating whether the start positions in the
sequences according to the assigned offset elmement metadata values should
be shown near the sequence characters; for the upper sequence the first
position is marked by "^" below the respective character, for the
lower sequence it is marked by "v" above the sequence. If no offset
element metadata is assigned to the sequences the marks are suppressed.
Default=TRUE
windowSize
length of sliding window. When the parameter is set to
the default value 1 the contributions of each position are plotted
as step function. For kernels with multiple patterns at one position
(mismatch, gappy pair and motif kernel) the weight contributions of all
patterns at the position are summed up. Values larger than 1 define the
length of a sliding window. All contributions within the window are
averaged and the resulting value is displayed at the center position of
the window. For positions within half of the window size from the start
and end of the sequence the averaging cannot be performed over the full
window but just the remaining positions. This means that the variation
of the averaged weight contributions is higher in these border regions.
If an even value is specified for this parameter one is added to the
parameter value. When the parameter is set to Inf (infinite)
instead of averages cumulative values along the sequence are used, i.e.
at each position the sum of all contributions up to this position is
displayed. In this case the plot shows how the standardized or
unstandardized value (see parameter standardize) of the
discrimination function builds up along the sequence. Default=1
...
all other arguments are passed to the standard
plot command that is called internally to
display the graphics window.
lwd
see par
aucDigits
number of decimal places of AUC to be printed into
the ROC plot. If this parameter is set to 0 the AUC will not be added to
the plot. Default=3
cex
see mtext
side
see mtext
line
see mtext
adj
see mtext
Details
Plotting of Prediction Profiles
The first variant of the plot method mentioned in the usage section
displays one or two prediction profiles as a step
function with the steps connected by vertical lines. The parameter
sel allows to select the sample(s) if the prediction profile object
contains the profiles of more than two samples. The alignment of the
step functions is impacted by offset metadata assigned to the sequences.
When offset values are assigned one sequence if shifted horizontally to
align the start position 1 pointed to by the offset value for each sequence.
(see also parameter markOffset). If no offset metadata is available
for the sequences both step functions start at their first position on the
left side of the plot. The vertical plot range can be determined by the
rng argument. If the plot is generated for one profile, the sequence
is is visualized above the plot, for two sequences the first sequence is
shown above, the second sequence below the plot. Matching characters at a
position are shown in the same color (by default in "black", the
non-matching characters in the sample-specific colors (see parameter
col). Annotation information can also be visualized along with the
step function. A call with two prediction profiles should facilitate the
comparison of profiles (e.g. wild type versus mutated sequence).
The baseline for the step function of a single sample represents the offset
b of the model distributed equally to all sequence positions according to
the following reformulation of the discriminant function
f(x) = b + sum(si(x)) = sum(si(x) -(-b/L))
for i = 1, ... L
For standardized plots (see parameter standardize this baseline value
is subtracted from the weight contribution at each position. When sequences
of different length are plotted together only a standardized plot gives
compareable y ranges for both step functions. For sequences of equal length
the visualization can be done in non-standardized or standardized form
showing the lightgray horizontal baseline at positon y=-b/L or at
y=0. If the area between the step function and the baseline lying
above the baseline is larger than the area below the baseline the sample
is predicted as belonging to the class assciated with positive values
of the discrimination function, otherwise to the opposite class. (For
multiclass problems prediction profiles can only be generated from the
feature weights related to one of the classifiers in the pairwise or
one-against-rest approaches leaving only two classes for the profile
plot.)
When plotting to a pdf it is recommended to use a height to width ratio
of around 1:(max sequence length/25), e.g. for a maximum sequence length
of 500 bases or amino acids select height=10 and width=200 when opening the
pdf document for plotting.
Plotting of CrossValidation Result
The second variant of plot method shown in the usage section
displays the cross validation result as boxplot.
Plotting of Grid Performance Values
The third variant of plot method shown in the usage section
plots grid performance data as grid with the color of each rectange
corresponding to the preformance value of the grid point.
Plotting of Receiver Operating Characteristics (ROC)
The fourth variant of plot method shown in the usage section
plots the receiver operating characteristics for the given ROC data.
J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package
for kernel-based analysis of biological sequences.
Bioinformatics, 31(15):2574-2576, 2015.
DOI: 10.1093/bioinformatics/btv176.
## set seed for random generator, included here only to make results
## reproducable for this example
set.seed(456)
## load transcription factor binding site data
data(TFBS)
enhancerFB
## select 70% of the samples for training and the rest for test
train <- sample(1:length(enhancerFB), length(enhancerFB) * 0.7)
test <- c(1:length(enhancerFB))[-train]
## create the kernel object for gappy pair kernel with normalization
gappy <- gappyPairKernel(k=1, m=3)
## show details of kernel object
gappy
## run training with explicit representation
model <- kbsvm(x=enhancerFB[train], y=yFB[train], kernel=gappy,
pkg="LiblineaR", svm="C-svc", cost=80, explicit="yes",
featureWeights="yes")
## compute and plot ROC for test sequences
preddec <- predict(model, enhancerFB[test], predictionType="decision")
rocdata <- computeROCandAUC(preddec, yFB[test], allLabels=unique(yFB))
plot(rocdata)
## generate prediction profile for the first three test sequences
predProf <- getPredictionProfile(enhancerFB, gappy, featureWeights(model),
modelOffset(model), sel=test[1:3])
## show prediction profiles
predProf
## plot prediction profile to pdf
## As sequences are usually very long select a ratio of height to width
## for the pdf which takes care of the maximum sequence length which is
## plotted. Only single or pairs of prediction profiles can be plotted.
## Plot profile for window size 1 (default) and 50. Load package Biobase
## for openPDF
## Not run:
library(Biobase)
pdf(file="PredictionProfile1_w1.pdf", height=10, width=200)
plot(predProf, sel=c(1,3))
dev.off()
openPDF("PredictionProfile1_w1.pdf")
pdf(file="PredictionProfile1_w50.pdf", height=10, width=200)
plot(predProf, sel=c(1,3), windowSize=50)
dev.off()
openPDF("PredictionProfile1_w50.pdf")
pdf(file="PredictionProfile2_w1.pdf", height=10, width=200)
plot(predProf, sel=c(2,3))
dev.off()
openPDF("PredictionProfile2_w1.pdf")
pdf(file="PredictionProfile2_w50.pdf", height=10, width=200)
plot(predProf, sel=c(2,3), windowSize=50)
dev.off()
openPDF("PredictionProfile2_w50.pdf")
## End(Not run)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(kebabs)
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: XVector
Loading required package: kernlab
Attaching package: 'kernlab'
The following object is masked from 'package:Biostrings':
type
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/kebabs/plot-methods.Rd_%03d_medium.png", width=480, height=480)
> ### Name: plot,PredictionProfile,missing-method
> ### Title: Plot Prediction Profiles, Cross Validation Result, Grid Search
> ### Performance Parameters and Receiver Operating Characteristics
> ### Aliases: plot plot,CrossValidationResult,missing-method
> ### plot,CrossValidationResult-method
> ### plot,ModelSelectionResult,missing-method
> ### plot,ModelSelectionResult-method
> ### plot,PredictionProfile,missing-method plot,PredictionProfile-method
> ### plot,ROCData,missing-method plot,ROCData-method
> ### Keywords: methods plot prediction profile
>
> ### ** Examples
>
> ## set seed for random generator, included here only to make results
> ## reproducable for this example
> set.seed(456)
> ## load transcription factor binding site data
> data(TFBS)
> enhancerFB
A DNAStringSet instance of length 500
width seq names
[1] 827 ACTAAACAACATCAATCAATAC...TTCTCTAGGCAAAATCCTGACA chr19.21240050.21...
[2] 678 GAATATAGACCCTTGGTGGTGG...GGAGGAAGTTATATTAATTTAT chr2.144463827.14...
[3] 878 CACCCACATGGTGGCTCTCAAC...TCTGCTCCCCACTGTATCACTC chr7.38525408.385...
[4] 927 TGCCGTAGTGTGCCAGCTTTTC...TTCTCTGATTCTCCTGGATCAA chr4.90747075.907...
[5] 953 CTTCATATACCTATTAATTCAG...AATAACCTAATTTAAAAAGGGG chr4.9148679.9149631
... ... ...
[496] 478 ACGAGAATGAGGGAAAGCACTG...TTGCAGTCTGCCTTCCGAGTGA chr8.45499713.455...
[497] 1552 AGGGGACTGGAAGAAAGGAACC...AGATCTCTCTTCTAAGCGAGCA chr8.94657200.946...
[498] 503 GATGAACGTAAAATGCAAGCTG...GAACCTAATGACTCCCTTCTGA chr17.73900306.73...
[499] 1577 ACCCCTCTGAGACCAAGAGGTG...GGCAAGTAGGTCTGTATTCCTG chr9.90587075.905...
[500] 778 CACTGTCATAACTTGCTTTCTA...GGCTTGTAGAGTAGCTGCTCTC chr17.48680137.48...
> ## select 70% of the samples for training and the rest for test
> train <- sample(1:length(enhancerFB), length(enhancerFB) * 0.7)
> test <- c(1:length(enhancerFB))[-train]
> ## create the kernel object for gappy pair kernel with normalization
> gappy <- gappyPairKernel(k=1, m=3)
> ## show details of kernel object
> gappy
gappy pair kernel: k=1, m=3
>
> ## run training with explicit representation
> model <- kbsvm(x=enhancerFB[train], y=yFB[train], kernel=gappy,
+ pkg="LiblineaR", svm="C-svc", cost=80, explicit="yes",
+ featureWeights="yes")
>
> ## compute and plot ROC for test sequences
> preddec <- predict(model, enhancerFB[test], predictionType="decision")
> rocdata <- computeROCandAUC(preddec, yFB[test], allLabels=unique(yFB))
> plot(rocdata)
>
> ## generate prediction profile for the first three test sequences
> predProf <- getPredictionProfile(enhancerFB, gappy, featureWeights(model),
+ modelOffset(model), sel=test[1:3])
>
> ## show prediction profiles
> predProf
An object of class "PredictionProfile"
Sequences:
A DNAStringSet instance of length 3
width seq names
[1] 878 CACCCACATGGTGGCTCTCAACC...CTCTGCTCCCCACTGTATCACTC chr7.38525408.385...
[2] 927 TGCCGTAGTGTGCCAGCTTTTCG...TTTCTCTGATTCTCCTGGATCAA chr4.90747075.907...
[3] 803 TACCTTAAGGAAATACAGACAAG...GCTCTAGCAGGGCTTTGTGAAGA chr5.113189184.11...
gappy pair kernel: k=1, m=3
Baselines: 0.01223163 0.01158508 0.01337406
Profiles:
Pos 1 Pos 2 Pos 926 Pos 927
chr7.38525408.385... 0.02608203 -0.00241429 ... 0.00000000 0.00000000
chr4.90747075.907... -0.00542039 0.03740036 ... 0.03051739 0.01590914
chr5.113189184.11... -0.01496562 -0.04417322 ... 0.00000000 0.00000000
>
> ## plot prediction profile to pdf
> ## As sequences are usually very long select a ratio of height to width
> ## for the pdf which takes care of the maximum sequence length which is
> ## plotted. Only single or pairs of prediction profiles can be plotted.
> ## Plot profile for window size 1 (default) and 50. Load package Biobase
> ## for openPDF
> ## Not run:
> ##D library(Biobase)
> ##D pdf(file="PredictionProfile1_w1.pdf", height=10, width=200)
> ##D plot(predProf, sel=c(1,3))
> ##D dev.off()
> ##D openPDF("PredictionProfile1_w1.pdf")
> ##D pdf(file="PredictionProfile1_w50.pdf", height=10, width=200)
> ##D plot(predProf, sel=c(1,3), windowSize=50)
> ##D dev.off()
> ##D openPDF("PredictionProfile1_w50.pdf")
> ##D pdf(file="PredictionProfile2_w1.pdf", height=10, width=200)
> ##D plot(predProf, sel=c(2,3))
> ##D dev.off()
> ##D openPDF("PredictionProfile2_w1.pdf")
> ##D pdf(file="PredictionProfile2_w50.pdf", height=10, width=200)
> ##D plot(predProf, sel=c(2,3), windowSize=50)
> ##D dev.off()
> ##D openPDF("PredictionProfile2_w50.pdf")
> ## End(Not run)
>
>
>
>
>
> dev.off()
null device
1
>