R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Calculation Of Predicition Profiles for Mixture Kernels

getPredProfMixture,BioVector-method

R Documentation

Calculation Of Predicition Profiles for Mixture Kernels

Description

compute prediction profiles for a given set of biological sequences from a model trained with mixture kernels

Usage

## S4 method for signature 'BioVector'
getPredProfMixture(object, trainseqs, mixModel, kernels,
  mixCoef, svmIndex = 1, sel = 1:length(object),
  weightLimit = .Machine$double.eps)

## S4 method for signature 'XStringSet'
getPredProfMixture(object, trainseqs, mixModel, kernels,
  mixCoef, svmIndex = 1, sel = 1:length(object),
  weightLimit = .Machine$double.eps)

## S4 method for signature 'XString'
getPredProfMixture(object, trainseqs, mixModel, kernels,
  mixCoef, svmIndex = 1, sel = 1, weightLimit = .Machine$double.eps)

Arguments

`object`	a single biological sequence in the form of an `DNAString`, `RNAString` or `AAString` or multiple biological sequences as `DNAStringSet`, `RNAStringSet`, `AAStringSet` (or as `BioVector`).
`trainseqs`	training sequences on which the mixture model was trained as `DNAStringSet`, `RNAStringSet`, `AAStringSet` (or as `BioVector`).
`mixModel`	model object of class `KBModel` trained with kernel mixture.
`kernels`	a list of sequence kernel objects of class `SequenceKernel`. The same kernels must be used as in training.
`mixCoef`	mixing coefficients for the kernel mixture. The same mixing coefficient values must be used as in training.
`svmIndex`	integer value selecting one of the pairwise SVMs in case of pairwise multiclass classification. Default=1
`sel`	subset of indices into `x` as integer vector. When this parameter is present the prediction profiles are computed for the specified subset of samples only. Default=`integer(0)`
`weightLimit`	the feature weight limit is a single numeric value and allows pruning of feature weights. All feature weights with an absolute value below this limit are set to 0 and are not considered for the prediction profile computation. This parameter is only relevant when feature weights are calculated in KeBABS during training. Default=.Machine$double.eps

Details

With this method prediction profiles can be generated explicitely for a given set of sequences with a model trained on a precomputed kernel matrix as mixture of multiple kernels.

Value

upon successful completion, the function returns a set of prediction profiles for the sequences as class PredictionProfile.

Author(s)

Johannes Palme <kebabs@bioinf.jku.at>

References

http://www.bioinf.jku.at/software/kebabs

(Mahrenholz, 2011) – C.C. Mahrenholz, I.G. Abfalter, U. Bodenhofer, R. Volkmer, and S. Hochreiter. Complex networks govern coiled coil oligomerization - predicting and profiling by means of a machine learning approach.

(Bodenhofer, 2009) – U. Bodenhofer, K. Schwarzbauer, S. Ionescu, and S. Hochreiter. Modeling Position Specificity in Sequence Kernels by Fuzzy Equivalence Relations.

J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package for kernel-based analysis of biological sequences. Bioinformatics, 31(15):2574-2576, 2015. DOI: 10.1093/bioinformatics/btv176.

Examples

## set random generator seed to make the results of this example
## reproducable
set.seed(123)

## load coiled coil data
data(CCoil)
gappya1 <- gappyPairKernel(k=1,m=11, annSpec=TRUE)
gappya2 <- gappyPairKernel(k=2,m=9, annSpec=TRUE)
kernels <- list(gappya1, gappya2)
mixCoef <- c(0.7,0.3)

## precompute mixed kernel matrix
km <- as.KernelMatrix(mixCoef[1]*gappya1(ccseq) +
                      mixCoef[2]*gappya2(ccseq))
mixModel <- kbsvm(x=km, y=as.numeric(yCC),
               pkg="e1071", svm="C-svc", cost=15)

## define two new sequences to be predicted
GCN4 <- AAStringSet(c("MKQLEDKVEELLSKNYHLENEVARLKKLV",
                      "MKQLEDKVEELLSKYYHTENEVARLKKLV"))
names(GCN4) <- c("GCN4wt", "GCN_N16Y,L19T")
## assign annotation metadata
annCharset <- annotationCharset(ccseq)
annot <- c("abcdefgabcdefgabcdefgabcdefga",
           "abcdefgabcdefgabcdefgabcdefga")
annotationMetadata(GCN4, annCharset=annCharset) <- annot

## compute prediction profiles
predProf <- getPredProfMixture(GCN4, ccseq, mixModel,
                               kernels, mixCoef)

## show prediction profiles
predProf

## plot prediction profile of both aa sequences
plot(predProf, sel=c(1,2), ylim=c(-0.4, 0.2), heptads=TRUE, annotate=TRUE)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(kebabs)
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
Loading required package: kernlab

Attaching package: 'kernlab'

The following object is masked from 'package:Biostrings':

    type

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/kebabs/getPredProfMixture-methods.Rd_%03d_medium.png", width=480, height=480)
> ### Name: getPredProfMixture,BioVector-method
> ### Title: Calculation Of Predicition Profiles for Mixture Kernels
> ### Aliases: getPredProfMixture getPredProfMixture,BioVector-method
> ###   getPredProfMixture,XString-method
> ###   getPredProfMixture,XStringSet-method
> ### Keywords: feature methods prediction profile weights
> 
> ### ** Examples
> 
> ## set random generator seed to make the results of this example
> ## reproducable
> set.seed(123)
> 
> ## load coiled coil data
> data(CCoil)
> gappya1 <- gappyPairKernel(k=1,m=11, annSpec=TRUE)
> gappya2 <- gappyPairKernel(k=2,m=9, annSpec=TRUE)
> kernels <- list(gappya1, gappya2)
> mixCoef <- c(0.7,0.3)
> 
> ## precompute mixed kernel matrix
> km <- as.KernelMatrix(mixCoef[1]*gappya1(ccseq) +
+                       mixCoef[2]*gappya2(ccseq))
> mixModel <- kbsvm(x=km, y=as.numeric(yCC),
+                pkg="e1071", svm="C-svc", cost=15)
> 
> ## define two new sequences to be predicted
> GCN4 <- AAStringSet(c("MKQLEDKVEELLSKNYHLENEVARLKKLV",
+                       "MKQLEDKVEELLSKYYHTENEVARLKKLV"))
> names(GCN4) <- c("GCN4wt", "GCN_N16Y,L19T")
> ## assign annotation metadata
> annCharset <- annotationCharset(ccseq)
> annot <- c("abcdefgabcdefgabcdefgabcdefga",
+            "abcdefgabcdefgabcdefgabcdefga")
> annotationMetadata(GCN4, annCharset=annCharset) <- annot
> 
> ## compute prediction profiles
> predProf <- getPredProfMixture(GCN4, ccseq, mixModel,
+                                kernels, mixCoef)
> 
> ## show prediction profiles
> predProf
An object of class  "PredictionProfile" 

Sequences:

  A AAStringSet instance of length 2
    width seq                                               names               
[1]    29 MKQLEDKVEELLSKNYHLENEVARLKKLV                     GCN4wt
[2]    29 MKQLEDKVEELLSKYYHTENEVARLKKLV                     GCN_N16Y,L19T

[[1]]
gappy pair kernel: k=1, m=11, annSpec=TRUE

[[2]]
gappy pair kernel: k=2, m=9, annSpec=TRUE


Baselines:  0.0232767 0.0232767 

Profiles:
                     Pos 1        Pos 2            Pos 28       Pos 29 
       GCN4wt  0.054902666 -0.106325083  ...  0.080479901  0.116184775
GCN_N16Y,L19T  0.056571588 -0.109831082  ...  0.071846565  0.105417596

> 
> ## plot prediction profile of both aa sequences
> plot(predProf, sel=c(1,2), ylim=c(-0.4, 0.2), heptads=TRUE, annotate=TRUE)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>