Last data update: 2014.03.03

R: Calculation Of Predicition Profiles for Mixture Kernels
getPredProfMixture,BioVector-methodR Documentation

Calculation Of Predicition Profiles for Mixture Kernels

Description

compute prediction profiles for a given set of biological sequences from a model trained with mixture kernels

Usage

## S4 method for signature 'BioVector'
getPredProfMixture(object, trainseqs, mixModel, kernels,
  mixCoef, svmIndex = 1, sel = 1:length(object),
  weightLimit = .Machine$double.eps)

## S4 method for signature 'XStringSet'
getPredProfMixture(object, trainseqs, mixModel, kernels,
  mixCoef, svmIndex = 1, sel = 1:length(object),
  weightLimit = .Machine$double.eps)

## S4 method for signature 'XString'
getPredProfMixture(object, trainseqs, mixModel, kernels,
  mixCoef, svmIndex = 1, sel = 1, weightLimit = .Machine$double.eps)

Arguments

object

a single biological sequence in the form of an DNAString, RNAString or AAString or multiple biological sequences as DNAStringSet, RNAStringSet, AAStringSet (or as BioVector).

trainseqs

training sequences on which the mixture model was trained as DNAStringSet, RNAStringSet, AAStringSet (or as BioVector).

mixModel

model object of class KBModel trained with kernel mixture.

kernels

a list of sequence kernel objects of class SequenceKernel. The same kernels must be used as in training.

mixCoef

mixing coefficients for the kernel mixture. The same mixing coefficient values must be used as in training.

svmIndex

integer value selecting one of the pairwise SVMs in case of pairwise multiclass classification. Default=1

sel

subset of indices into x as integer vector. When this parameter is present the prediction profiles are computed for the specified subset of samples only. Default=integer(0)

weightLimit

the feature weight limit is a single numeric value and allows pruning of feature weights. All feature weights with an absolute value below this limit are set to 0 and are not considered for the prediction profile computation. This parameter is only relevant when feature weights are calculated in KeBABS during training. Default=.Machine$double.eps

Details

With this method prediction profiles can be generated explicitely for a given set of sequences with a model trained on a precomputed kernel matrix as mixture of multiple kernels.

Value

upon successful completion, the function returns a set of prediction profiles for the sequences as class PredictionProfile.

Author(s)

Johannes Palme <kebabs@bioinf.jku.at>

References

http://www.bioinf.jku.at/software/kebabs

(Mahrenholz, 2011) – C.C. Mahrenholz, I.G. Abfalter, U. Bodenhofer, R. Volkmer, and S. Hochreiter. Complex networks govern coiled coil oligomerization - predicting and profiling by means of a machine learning approach.

(Bodenhofer, 2009) – U. Bodenhofer, K. Schwarzbauer, S. Ionescu, and S. Hochreiter. Modeling Position Specificity in Sequence Kernels by Fuzzy Equivalence Relations.

J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package for kernel-based analysis of biological sequences. Bioinformatics, 31(15):2574-2576, 2015. DOI: 10.1093/bioinformatics/btv176.

See Also

PredictionProfile, predict, plot, featureWeights, getPredictionProfile

Examples

## set random generator seed to make the results of this example
## reproducable
set.seed(123)

## load coiled coil data
data(CCoil)
gappya1 <- gappyPairKernel(k=1,m=11, annSpec=TRUE)
gappya2 <- gappyPairKernel(k=2,m=9, annSpec=TRUE)
kernels <- list(gappya1, gappya2)
mixCoef <- c(0.7,0.3)

## precompute mixed kernel matrix
km <- as.KernelMatrix(mixCoef[1]*gappya1(ccseq) +
                      mixCoef[2]*gappya2(ccseq))
mixModel <- kbsvm(x=km, y=as.numeric(yCC),
               pkg="e1071", svm="C-svc", cost=15)

## define two new sequences to be predicted
GCN4 <- AAStringSet(c("MKQLEDKVEELLSKNYHLENEVARLKKLV",
                      "MKQLEDKVEELLSKYYHTENEVARLKKLV"))
names(GCN4) <- c("GCN4wt", "GCN_N16Y,L19T")
## assign annotation metadata
annCharset <- annotationCharset(ccseq)
annot <- c("abcdefgabcdefgabcdefgabcdefga",
           "abcdefgabcdefgabcdefgabcdefga")
annotationMetadata(GCN4, annCharset=annCharset) <- annot

## compute prediction profiles
predProf <- getPredProfMixture(GCN4, ccseq, mixModel,
                               kernels, mixCoef)

## show prediction profiles
predProf

## plot prediction profile of both aa sequences
plot(predProf, sel=c(1,2), ylim=c(-0.4, 0.2), heptads=TRUE, annotate=TRUE)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(kebabs)
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
Loading required package: kernlab

Attaching package: 'kernlab'

The following object is masked from 'package:Biostrings':

    type

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/kebabs/getPredProfMixture-methods.Rd_%03d_medium.png", width=480, height=480)
> ### Name: getPredProfMixture,BioVector-method
> ### Title: Calculation Of Predicition Profiles for Mixture Kernels
> ### Aliases: getPredProfMixture getPredProfMixture,BioVector-method
> ###   getPredProfMixture,XString-method
> ###   getPredProfMixture,XStringSet-method
> ### Keywords: feature methods prediction profile weights
> 
> ### ** Examples
> 
> ## set random generator seed to make the results of this example
> ## reproducable
> set.seed(123)
> 
> ## load coiled coil data
> data(CCoil)
> gappya1 <- gappyPairKernel(k=1,m=11, annSpec=TRUE)
> gappya2 <- gappyPairKernel(k=2,m=9, annSpec=TRUE)
> kernels <- list(gappya1, gappya2)
> mixCoef <- c(0.7,0.3)
> 
> ## precompute mixed kernel matrix
> km <- as.KernelMatrix(mixCoef[1]*gappya1(ccseq) +
+                       mixCoef[2]*gappya2(ccseq))
> mixModel <- kbsvm(x=km, y=as.numeric(yCC),
+                pkg="e1071", svm="C-svc", cost=15)
> 
> ## define two new sequences to be predicted
> GCN4 <- AAStringSet(c("MKQLEDKVEELLSKNYHLENEVARLKKLV",
+                       "MKQLEDKVEELLSKYYHTENEVARLKKLV"))
> names(GCN4) <- c("GCN4wt", "GCN_N16Y,L19T")
> ## assign annotation metadata
> annCharset <- annotationCharset(ccseq)
> annot <- c("abcdefgabcdefgabcdefgabcdefga",
+            "abcdefgabcdefgabcdefgabcdefga")
> annotationMetadata(GCN4, annCharset=annCharset) <- annot
> 
> ## compute prediction profiles
> predProf <- getPredProfMixture(GCN4, ccseq, mixModel,
+                                kernels, mixCoef)
> 
> ## show prediction profiles
> predProf
An object of class  "PredictionProfile" 

Sequences:

  A AAStringSet instance of length 2
    width seq                                               names               
[1]    29 MKQLEDKVEELLSKNYHLENEVARLKKLV                     GCN4wt
[2]    29 MKQLEDKVEELLSKYYHTENEVARLKKLV                     GCN_N16Y,L19T

[[1]]
gappy pair kernel: k=1, m=11, annSpec=TRUE

[[2]]
gappy pair kernel: k=2, m=9, annSpec=TRUE


Baselines:  0.0232767 0.0232767 

Profiles:
                     Pos 1        Pos 2            Pos 28       Pos 29 
       GCN4wt  0.054902666 -0.106325083  ...  0.080479901  0.116184775
GCN_N16Y,L19T  0.056571588 -0.109831082  ...  0.071846565  0.105417596

> 
> ## plot prediction profile of both aa sequences
> plot(predProf, sel=c(1,2), ylim=c(-0.4, 0.2), heptads=TRUE, annotate=TRUE)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>