R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Extract Predictions From 'classify()' objects

predictClassify

R Documentation

Extract Predictions From `classify()` objects

Description

This function predicts the class labels of test data for a given model.

Usage

predictClassify(model, test.data)

Arguments

`model`	a model of `MLSeq` class
`test.data`	a `DESeqDataSet` instance of new observations.

Details

predictClassify function gives a vector of predicted classes of data set. This vector is in factor class.

Value

predicted

a vector of predicted classes of test data. See details.

Author(s)

Gokmen Zararsiz, Dincer Goksuluk, Selcuk Korkmaz, Vahap Eldem, Izzet Parug Duru, Turgay Unver, Ahmet Ozturk

References

Kuhn M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, (http://www.jstatsoft.org/v28/i05/).

Anders S. Huber W. (2010). Differential expression analysis for sequence count data. Genome Biology, 11:R106

Witten DM. (2011). Classification and clustering of sequencing data using a poisson model. The Annals of Applied Statistics, 5(4), 2493:2518.

Charity WL. et al. (2014) Voom: precision weights unlock linear model analysis tools for RNA-seq read counts, Genome Biology, 15:R29, doi:10.1186/gb-2014-15-2-r29

Witten D. et al. (2010) Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls. BMC Biology, 8:58

Robinson MD, Oshlack A (2010). A scaling normalization method for differential expression analysis of RNA-Seq data. Genome Biology, 11:R25, doi:10.1186/gb-2010-11-3-r25

Examples

data(cervical)

data = cervical[c(1:150),]  # a subset of cervical data with first 150 features.

class = data.frame(condition=factor(rep(c("N","T"),c(29,29))))# defining sample classes.

n = ncol(data)  # number of samples
p = nrow(data)  # number of features

nTest = ceiling(n*0.2)  # number of samples for test set (20% test, 80% train).
ind = sample(n,nTest,FALSE)

# train set
data.train = data[,-ind]
data.train = as.matrix(data.train + 1)
classtr = data.frame(condition=class[-ind,])

# train set in S4 class
data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train,
colData = classtr, formula(~ condition))
data.trainS4 <- DESeq(data.trainS4, fitType="local")

# test set
data.test = data[,ind]
data.test = as.matrix(data.test + 1)
classts = data.frame(condition=class[ind,])

# test set in S4 
data.testS4 = DESeqDataSetFromMatrix(countData = data.test,
colData = classts, formula(~ condition))
data.testS4 = DESeq(data.testS4, fitType="local")

## Number of repeats (rpt) might change model accuracies ##

# Classification and Regression Tree (CART) Classification
cart = classify(data = data.trainS4, method = "cart", normalize = "deseq", deseqTransform = "vst", cv = 5, rpt = 3, ref="T")
cart

# Random Forest (RF) Classification
rf = classify(data = data.trainS4, method = "randomforest", normalize = "deseq", deseqTransform = "vst", cv = 5, rpt = 3, ref="T")
rf

# predicted classes of test samples for SVM method
pred.cart = predictClassify(cart, data.testS4)
pred.cart

# predicted classes of test samples for RF method
pred.rf = predictClassify(rf, data.testS4)
pred.rf

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(MLSeq)
Loading required package: caret
Loading required package: lattice
Loading required package: ggplot2
Loading required package: DESeq2
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit


Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: limma

Attaching package: 'limma'

The following object is masked from 'package:DESeq2':

    plotMA

The following object is masked from 'package:BiocGenerics':

    plotMA

Loading required package: randomForest
randomForest 4.6-12
Type rfNews() to see new features/changes/bug fixes.

Attaching package: 'randomForest'

The following object is masked from 'package:Biobase':

    combine

The following object is masked from 'package:BiocGenerics':

    combine

The following object is masked from 'package:ggplot2':

    margin

Loading required package: edgeR
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/MLSeq/predictClassify.Rd_%03d_medium.png", width=480, height=480)
> ### Name: predictClassify
> ### Title: Extract Predictions From 'classify()' objects
> ### Aliases: predictClassify
> 
> ### ** Examples
> 
> data(cervical)
> 
> data = cervical[c(1:150),]  # a subset of cervical data with first 150 features.
> 
> class = data.frame(condition=factor(rep(c("N","T"),c(29,29))))# defining sample classes.
> 
> n = ncol(data)  # number of samples
> p = nrow(data)  # number of features
> 
> nTest = ceiling(n*0.2)  # number of samples for test set (20% test, 80% train).
> ind = sample(n,nTest,FALSE)
> 
> # train set
> data.train = data[,-ind]
> data.train = as.matrix(data.train + 1)
> classtr = data.frame(condition=class[-ind,])
> 
> # train set in S4 class
> data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train,
+ colData = classtr, formula(~ condition))
converting counts to integer mode
> data.trainS4 <- DESeq(data.trainS4, fitType="local")
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 15 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
> 
> # test set
> data.test = data[,ind]
> data.test = as.matrix(data.test + 1)
> classts = data.frame(condition=class[ind,])
> 
> # test set in S4 
> data.testS4 = DESeqDataSetFromMatrix(countData = data.test,
+ colData = classts, formula(~ condition))
converting counts to integer mode
> data.testS4 = DESeq(data.testS4, fitType="local")
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 7 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
> 
> ## Number of repeats (rpt) might change model accuracies ##
> 
> # Classification and Regression Tree (CART) Classification
> cart = classify(data = data.trainS4, method = "cart", normalize = "deseq", deseqTransform = "vst", cv = 5, rpt = 3, ref="T")
found already estimated dispersions, replacing these
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
Loading required package: rpart
> cart

  An object of class  MLSeq 

            Method  :  cart 

       Accuracy(%)  :  95.65 
    Sensitivity(%)  :  96 
    Specificity(%)  :  95.24 

  Reference Class   :  T 

> 
> # Random Forest (RF) Classification
> rf = classify(data = data.trainS4, method = "randomforest", normalize = "deseq", deseqTransform = "vst", cv = 5, rpt = 3, ref="T")
found already estimated dispersions, replacing these
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
> rf

  An object of class  MLSeq 

            Method  :  randomforest 

       Accuracy(%)  :  100 
    Sensitivity(%)  :  100 
    Specificity(%)  :  100 

  Reference Class   :  T 

> 
> # predicted classes of test samples for SVM method
> pred.cart = predictClassify(cart, data.testS4)
found already estimated dispersions, replacing these
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
> pred.cart
 [1] T N N T T T N T N T N N
Levels: T N
> 
> # predicted classes of test samples for RF method
> pred.rf = predictClassify(rf, data.testS4)
found already estimated dispersions, replacing these
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
> pred.rf
 [1] N N N T T T N T N T N N
Levels: T N
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>

Extract Predictions From classify() objects