R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Plot Prediction Profiles, Cross Validation Result, Grid...

plot,PredictionProfile,missing-method

R Documentation

Plot Prediction Profiles, Cross Validation Result, Grid Search Performance Parameters and Receiver Operating Characteristics

Description

Functions for visualizing prediction profiles, cross validation result, grid search performance parameters and receiver operating characteristics

Usage

## S4 method for signature 'PredictionProfile,missing'
plot(x, sel = NULL, col = c("red",
  "blue"), standardize = TRUE, shades = NULL, legend = "default",
  legendPos = "topright", ylim = NULL, xlab = "", ylab = "weight",
  lwd.profile = 1, lwd.axis = 1, las = 1, heptads = FALSE,
  annotate = FALSE, markOffset = TRUE, windowSize = 1, ...)

## S4 method for signature 'CrossValidationResult,missing'
plot(x, col = "springgreen")

## S4 method for signature 'ModelSelectionResult,missing'
plot(x, sel = c("ACC", "BACC", "MCC",
  "AUC"))

## S4 method for signature 'ROCData,missing'
plot(x, lwd = 2, aucDigits = 3, cex = 0.8,
  side = 1, line = -3, adj = 0.9, ...)

Arguments

`x`	for the first method above a prediction profile object of class `PredictionProfile` containing the profiles to be plotted, for the second method a cross validation result object usually taken from the trained kebabs model object
`sel`	an integer vector with one or two entries to select samples of the prediction profile matrix for plotting, if this parameter is not supplied by the user the frist one or two samples are selected.
`col`	a character vector with one or two color names used for plotting the samples. Default=c("red", "blue").
`standardize`	logical. If `FALSE`, the profile values `s_i` are displayed as they are with the value y=-b/L superimposed as a light gray line. If `TRUE` (default), the whole profile is shifted by -b/L and the light gray line is displayed at `y=0`.
`shades`	vector of at least two color specifications; If not NULL, the background area above and below the base line `y=-b/L` are shaded in colors `shades[1]` and `shades[2]`, respectively. Default=NULL
`legend`	a character vector with one or two character strings containing the legend/description of the profile. If set to an empty vector or to NULL, no legend is displayed.
`legendPos`	position specification for the legend(if `legend` is specified). Can either be a vector with coordinates or a single keyword like “topright” (see `legend`).
`ylim`	argument that allows the user to preset the y-range of the profile plot.
`xlab`	label of horizontal axis, empty by default.
`ylab`	label of vertical axis, defaults to "weight".
`lwd.profile`	profile line width as described for parameter `lwd` in `par`
`lwd.axis`	axis line width as described for parameter `lwd` in `par`
`las`	see `par`
`heptads`	logical indicating whether for proteins with heptad annotation (i.e. characters a to g, usually in periodic repetition) the heptad structure should be indicated through vertical lightgray lines each heptad. Default=FALSE
`annotate`	logical indicating whether annotation information should be shown in the center of the plot; Default=FALSE
`markOffset`	logical indicating whether the start positions in the sequences according to the assigned offset elmement metadata values should be shown near the sequence characters; for the upper sequence the first position is marked by `"^"` below the respective character, for the lower sequence it is marked by `"v"` above the sequence. If no offset element metadata is assigned to the sequences the marks are suppressed. Default=TRUE
`windowSize`	length of sliding window. When the parameter is set to the default value 1 the contributions of each position are plotted as step function. For kernels with multiple patterns at one position (mismatch, gappy pair and motif kernel) the weight contributions of all patterns at the position are summed up. Values larger than 1 define the length of a sliding window. All contributions within the window are averaged and the resulting value is displayed at the center position of the window. For positions within half of the window size from the start and end of the sequence the averaging cannot be performed over the full window but just the remaining positions. This means that the variation of the averaged weight contributions is higher in these border regions. If an even value is specified for this parameter one is added to the parameter value. When the parameter is set to `Inf` (infinite) instead of averages cumulative values along the sequence are used, i.e. at each position the sum of all contributions up to this position is displayed. In this case the plot shows how the standardized or unstandardized value (see parameter `standardize`) of the discrimination function builds up along the sequence. Default=1
`...`	all other arguments are passed to the standard `plot` command that is called internally to display the graphics window.
`lwd`	see `par`
`aucDigits`	number of decimal places of AUC to be printed into the ROC plot. If this parameter is set to 0 the AUC will not be added to the plot. Default=3
`cex`	see `mtext`
`side`	see `mtext`
`line`	see `mtext`
`adj`	see `mtext`

Details

Plotting of Prediction Profiles

The first variant of the plot method mentioned in the usage section displays one or two prediction profiles as a step function with the steps connected by vertical lines. The parameter sel allows to select the sample(s) if the prediction profile object contains the profiles of more than two samples. The alignment of the step functions is impacted by offset metadata assigned to the sequences. When offset values are assigned one sequence if shifted horizontally to align the start position 1 pointed to by the offset value for each sequence. (see also parameter markOffset). If no offset metadata is available for the sequences both step functions start at their first position on the left side of the plot. The vertical plot range can be determined by the rng argument. If the plot is generated for one profile, the sequence is is visualized above the plot, for two sequences the first sequence is shown above, the second sequence below the plot. Matching characters at a position are shown in the same color (by default in "black", the non-matching characters in the sample-specific colors (see parameter col). Annotation information can also be visualized along with the step function. A call with two prediction profiles should facilitate the comparison of profiles (e.g. wild type versus mutated sequence).

The baseline for the step function of a single sample represents the offset b of the model distributed equally to all sequence positions according to the following reformulation of the discriminant function

f(x) = b + sum(si(x)) = sum(si(x) -(-b/L)) for i = 1, ... L

For standardized plots (see parameter standardize this baseline value is subtracted from the weight contribution at each position. When sequences of different length are plotted together only a standardized plot gives compareable y ranges for both step functions. For sequences of equal length the visualization can be done in non-standardized or standardized form showing the lightgray horizontal baseline at positon y=-b/L or at y=0. If the area between the step function and the baseline lying above the baseline is larger than the area below the baseline the sample is predicted as belonging to the class assciated with positive values of the discrimination function, otherwise to the opposite class. (For multiclass problems prediction profiles can only be generated from the feature weights related to one of the classifiers in the pairwise or one-against-rest approaches leaving only two classes for the profile plot.)

When plotting to a pdf it is recommended to use a height to width ratio of around 1:(max sequence length/25), e.g. for a maximum sequence length of 500 bases or amino acids select height=10 and width=200 when opening the pdf document for plotting.

Plotting of CrossValidation Result

The second variant of plot method shown in the usage section displays the cross validation result as boxplot.

Plotting of Grid Performance Values

The third variant of plot method shown in the usage section plots grid performance data as grid with the color of each rectange corresponding to the preformance value of the grid point.

Plotting of Receiver Operating Characteristics (ROC)

The fourth variant of plot method shown in the usage section plots the receiver operating characteristics for the given ROC data.

Value

see details above

Author(s)

Johannes Palme <kebabs@bioinf.jku.at>

References

http://www.bioinf.jku.at/software/kebabs

J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package for kernel-based analysis of biological sequences. Bioinformatics, 31(15):2574-2576, 2015. DOI: 10.1093/bioinformatics/btv176.

Examples

## set seed for random generator, included here only to make results
## reproducable for this example
set.seed(456)
## load transcription factor binding site data
data(TFBS)
enhancerFB
## select 70% of the samples for training and the rest for test
train <- sample(1:length(enhancerFB), length(enhancerFB) * 0.7)
test <- c(1:length(enhancerFB))[-train]
## create the kernel object for gappy pair kernel with normalization
gappy <- gappyPairKernel(k=1, m=3)
## show details of kernel object
gappy

## run training with explicit representation
model <- kbsvm(x=enhancerFB[train], y=yFB[train], kernel=gappy,
               pkg="LiblineaR", svm="C-svc", cost=80, explicit="yes",
               featureWeights="yes")

## compute and plot ROC for test sequences
preddec <- predict(model, enhancerFB[test], predictionType="decision")
rocdata <- computeROCandAUC(preddec, yFB[test], allLabels=unique(yFB))
plot(rocdata)

## generate prediction profile for the first three test sequences
predProf <- getPredictionProfile(enhancerFB, gappy, featureWeights(model),
                                 modelOffset(model), sel=test[1:3])

## show prediction profiles
predProf

## plot prediction profile to pdf
## As sequences are usually very long select a ratio of height to width
## for the pdf which takes care of the maximum sequence length which is
## plotted. Only single or pairs of prediction profiles can be plotted.
## Plot profile for window size 1 (default) and 50. Load package Biobase
## for openPDF
## Not run: 
library(Biobase)
pdf(file="PredictionProfile1_w1.pdf", height=10, width=200)
plot(predProf, sel=c(1,3))
dev.off()
openPDF("PredictionProfile1_w1.pdf")
pdf(file="PredictionProfile1_w50.pdf", height=10, width=200)
plot(predProf, sel=c(1,3), windowSize=50)
dev.off()
openPDF("PredictionProfile1_w50.pdf")
pdf(file="PredictionProfile2_w1.pdf", height=10, width=200)
plot(predProf, sel=c(2,3))
dev.off()
openPDF("PredictionProfile2_w1.pdf")
pdf(file="PredictionProfile2_w50.pdf", height=10, width=200)
plot(predProf, sel=c(2,3), windowSize=50)
dev.off()
openPDF("PredictionProfile2_w50.pdf")

## End(Not run)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(kebabs)
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
Loading required package: kernlab

Attaching package: 'kernlab'

The following object is masked from 'package:Biostrings':

    type

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/kebabs/plot-methods.Rd_%03d_medium.png", width=480, height=480)
> ### Name: plot,PredictionProfile,missing-method
> ### Title: Plot Prediction Profiles, Cross Validation Result, Grid Search
> ###   Performance Parameters and Receiver Operating Characteristics
> ### Aliases: plot plot,CrossValidationResult,missing-method
> ###   plot,CrossValidationResult-method
> ###   plot,ModelSelectionResult,missing-method
> ###   plot,ModelSelectionResult-method
> ###   plot,PredictionProfile,missing-method plot,PredictionProfile-method
> ###   plot,ROCData,missing-method plot,ROCData-method
> ### Keywords: methods plot prediction profile
> 
> ### ** Examples
> 
> ## set seed for random generator, included here only to make results
> ## reproducable for this example
> set.seed(456)
> ## load transcription factor binding site data
> data(TFBS)
> enhancerFB
  A DNAStringSet instance of length 500
      width seq                                             names               
  [1]   827 ACTAAACAACATCAATCAATAC...TTCTCTAGGCAAAATCCTGACA chr19.21240050.21...
  [2]   678 GAATATAGACCCTTGGTGGTGG...GGAGGAAGTTATATTAATTTAT chr2.144463827.14...
  [3]   878 CACCCACATGGTGGCTCTCAAC...TCTGCTCCCCACTGTATCACTC chr7.38525408.385...
  [4]   927 TGCCGTAGTGTGCCAGCTTTTC...TTCTCTGATTCTCCTGGATCAA chr4.90747075.907...
  [5]   953 CTTCATATACCTATTAATTCAG...AATAACCTAATTTAAAAAGGGG chr4.9148679.9149631
  ...   ... ...
[496]   478 ACGAGAATGAGGGAAAGCACTG...TTGCAGTCTGCCTTCCGAGTGA chr8.45499713.455...
[497]  1552 AGGGGACTGGAAGAAAGGAACC...AGATCTCTCTTCTAAGCGAGCA chr8.94657200.946...
[498]   503 GATGAACGTAAAATGCAAGCTG...GAACCTAATGACTCCCTTCTGA chr17.73900306.73...
[499]  1577 ACCCCTCTGAGACCAAGAGGTG...GGCAAGTAGGTCTGTATTCCTG chr9.90587075.905...
[500]   778 CACTGTCATAACTTGCTTTCTA...GGCTTGTAGAGTAGCTGCTCTC chr17.48680137.48...
> ## select 70% of the samples for training and the rest for test
> train <- sample(1:length(enhancerFB), length(enhancerFB) * 0.7)
> test <- c(1:length(enhancerFB))[-train]
> ## create the kernel object for gappy pair kernel with normalization
> gappy <- gappyPairKernel(k=1, m=3)
> ## show details of kernel object
> gappy
gappy pair kernel: k=1, m=3
> 
> ## run training with explicit representation
> model <- kbsvm(x=enhancerFB[train], y=yFB[train], kernel=gappy,
+                pkg="LiblineaR", svm="C-svc", cost=80, explicit="yes",
+                featureWeights="yes")
> 
> ## compute and plot ROC for test sequences
> preddec <- predict(model, enhancerFB[test], predictionType="decision")
> rocdata <- computeROCandAUC(preddec, yFB[test], allLabels=unique(yFB))
> plot(rocdata)
> 
> ## generate prediction profile for the first three test sequences
> predProf <- getPredictionProfile(enhancerFB, gappy, featureWeights(model),
+                                  modelOffset(model), sel=test[1:3])
> 
> ## show prediction profiles
> predProf
An object of class  "PredictionProfile" 

Sequences:

  A DNAStringSet instance of length 3
    width seq                                               names               
[1]   878 CACCCACATGGTGGCTCTCAACC...CTCTGCTCCCCACTGTATCACTC chr7.38525408.385...
[2]   927 TGCCGTAGTGTGCCAGCTTTTCG...TTTCTCTGATTCTCCTGGATCAA chr4.90747075.907...
[3]   803 TACCTTAAGGAAATACAGACAAG...GCTCTAGCAGGGCTTTGTGAAGA chr5.113189184.11...

gappy pair kernel: k=1, m=3

Baselines:  0.01223163 0.01158508 0.01337406 

Profiles:
                           Pos 1       Pos 2          Pos 926     Pos 927 
chr7.38525408.385...  0.02608203 -0.00241429  ...  0.00000000  0.00000000
chr4.90747075.907... -0.00542039  0.03740036  ...  0.03051739  0.01590914
chr5.113189184.11... -0.01496562 -0.04417322  ...  0.00000000  0.00000000

> 
> ## plot prediction profile to pdf
> ## As sequences are usually very long select a ratio of height to width
> ## for the pdf which takes care of the maximum sequence length which is
> ## plotted. Only single or pairs of prediction profiles can be plotted.
> ## Plot profile for window size 1 (default) and 50. Load package Biobase
> ## for openPDF
> ## Not run: 
> ##D library(Biobase)
> ##D pdf(file="PredictionProfile1_w1.pdf", height=10, width=200)
> ##D plot(predProf, sel=c(1,3))
> ##D dev.off()
> ##D openPDF("PredictionProfile1_w1.pdf")
> ##D pdf(file="PredictionProfile1_w50.pdf", height=10, width=200)
> ##D plot(predProf, sel=c(1,3), windowSize=50)
> ##D dev.off()
> ##D openPDF("PredictionProfile1_w50.pdf")
> ##D pdf(file="PredictionProfile2_w1.pdf", height=10, width=200)
> ##D plot(predProf, sel=c(2,3))
> ##D dev.off()
> ##D openPDF("PredictionProfile2_w1.pdf")
> ##D pdf(file="PredictionProfile2_w50.pdf", height=10, width=200)
> ##D plot(predProf, sel=c(2,3), windowSize=50)
> ##D dev.off()
> ##D openPDF("PredictionProfile2_w50.pdf")
> ## End(Not run)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>