|
|||||||||||
Details
Value
Author(s)Gokmen Zararsiz, Dincer Goksuluk, Selcuk Korkmaz, Vahap Eldem, Izzet Parug Duru, Turgay Unver, Ahmet Ozturk ReferencesKuhn M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, (http://www.jstatsoft.org/v28/i05/). Anders S. Huber W. (2010). Differential expression analysis for sequence count data. Genome Biology, 11:R106 Witten DM. (2011). Classification and clustering of sequencing data using a poisson model. The Annals of Applied Statistics, 5(4), 2493:2518. Charity WL. et al. (2014) Voom: precision weights unlock linear model analysis tools for RNA-seq read counts, Genome Biology, 15:R29, doi:10.1186/gb-2014-15-2-r29 Witten D. et al. (2010) Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls. BMC Biology, 8:58 Robinson MD, Oshlack A (2010). A scaling normalization method for differential expression analysis of RNA-Seq data. Genome Biology, 11:R25, doi:10.1186/gb-2010-11-3-r25 See Also
Examplesdata(cervical) data = cervical[c(1:150),] # a subset of cervical data with first 150 features. class = data.frame(condition=factor(rep(c("N","T"),c(29,29))))# defining sample classes. n = ncol(data) # number of samples p = nrow(data) # number of features nTest = ceiling(n*0.2) # number of samples for test set (20% test, 80% train). ind = sample(n,nTest,FALSE) # train set data.train = data[,-ind] data.train = as.matrix(data.train + 1) classtr = data.frame(condition=class[-ind,]) # train set in S4 class data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, colData = classtr, formula(~ condition)) data.trainS4 <- DESeq(data.trainS4, fitType="local") # test set data.test = data[,ind] data.test = as.matrix(data.test + 1) classts = data.frame(condition=class[ind,]) # test set in S4 data.testS4 = DESeqDataSetFromMatrix(countData = data.test, colData = classts, formula(~ condition)) data.testS4 = DESeq(data.testS4, fitType="local") ## Number of repeats (rpt) might change model accuracies ## # Classification and Regression Tree (CART) Classification cart = classify(data = data.trainS4, method = "cart", normalize = "deseq", deseqTransform = "vst", cv = 5, rpt = 3, ref="T") cart # Random Forest (RF) Classification rf = classify(data = data.trainS4, method = "randomforest", normalize = "deseq", deseqTransform = "vst", cv = 5, rpt = 3, ref="T") rf # predicted classes of test samples for SVM method pred.cart = predictClassify(cart, data.testS4) pred.cart # predicted classes of test samples for RF method pred.rf = predictClassify(rf, data.testS4) pred.rf ResultsR version 3.3.1 (2016-06-21) -- "Bug in Your Hair" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library(MLSeq) Loading required package: caret Loading required package: lattice Loading required package: ggplot2 Loading required package: DESeq2 Loading required package: S4Vectors Loading required package: stats4 Loading required package: BiocGenerics Loading required package: parallel Attaching package: 'BiocGenerics' The following objects are masked from 'package:parallel': clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following objects are masked from 'package:stats': IQR, mad, xtabs The following objects are masked from 'package:base': Filter, Find, Map, Position, Reduce, anyDuplicated, append, as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit Attaching package: 'S4Vectors' The following objects are masked from 'package:base': colMeans, colSums, expand.grid, rowMeans, rowSums Loading required package: IRanges Loading required package: GenomicRanges Loading required package: GenomeInfoDb Loading required package: SummarizedExperiment Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Loading required package: limma Attaching package: 'limma' The following object is masked from 'package:DESeq2': plotMA The following object is masked from 'package:BiocGenerics': plotMA Loading required package: randomForest randomForest 4.6-12 Type rfNews() to see new features/changes/bug fixes. Attaching package: 'randomForest' The following object is masked from 'package:Biobase': combine The following object is masked from 'package:BiocGenerics': combine The following object is masked from 'package:ggplot2': margin Loading required package: edgeR > png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/MLSeq/predictClassify.Rd_%03d_medium.png", width=480, height=480) > ### Name: predictClassify > ### Title: Extract Predictions From 'classify()' objects > ### Aliases: predictClassify > > ### ** Examples > > data(cervical) > > data = cervical[c(1:150),] # a subset of cervical data with first 150 features. > > class = data.frame(condition=factor(rep(c("N","T"),c(29,29))))# defining sample classes. > > n = ncol(data) # number of samples > p = nrow(data) # number of features > > nTest = ceiling(n*0.2) # number of samples for test set (20% test, 80% train). > ind = sample(n,nTest,FALSE) > > # train set > data.train = data[,-ind] > data.train = as.matrix(data.train + 1) > classtr = data.frame(condition=class[-ind,]) > > # train set in S4 class > data.trainS4 <- DESeqDataSetFromMatrix(countData = data.train, + colData = classtr, formula(~ condition)) converting counts to integer mode > data.trainS4 <- DESeq(data.trainS4, fitType="local") estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing -- replacing outliers and refitting for 15 genes -- DESeq argument 'minReplicatesForReplace' = 7 -- original counts are preserved in counts(dds) estimating dispersions fitting model and testing > > # test set > data.test = data[,ind] > data.test = as.matrix(data.test + 1) > classts = data.frame(condition=class[ind,]) > > # test set in S4 > data.testS4 = DESeqDataSetFromMatrix(countData = data.test, + colData = classts, formula(~ condition)) converting counts to integer mode > data.testS4 = DESeq(data.testS4, fitType="local") estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing -- replacing outliers and refitting for 7 genes -- DESeq argument 'minReplicatesForReplace' = 7 -- original counts are preserved in counts(dds) estimating dispersions fitting model and testing > > ## Number of repeats (rpt) might change model accuracies ## > > # Classification and Regression Tree (CART) Classification > cart = classify(data = data.trainS4, method = "cart", normalize = "deseq", deseqTransform = "vst", cv = 5, rpt = 3, ref="T") found already estimated dispersions, replacing these gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates Loading required package: rpart > cart An object of class MLSeq Method : cart Accuracy(%) : 95.65 Sensitivity(%) : 96 Specificity(%) : 95.24 Reference Class : T > > # Random Forest (RF) Classification > rf = classify(data = data.trainS4, method = "randomforest", normalize = "deseq", deseqTransform = "vst", cv = 5, rpt = 3, ref="T") found already estimated dispersions, replacing these gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates > rf An object of class MLSeq Method : randomforest Accuracy(%) : 100 Sensitivity(%) : 100 Specificity(%) : 100 Reference Class : T > > # predicted classes of test samples for SVM method > pred.cart = predictClassify(cart, data.testS4) found already estimated dispersions, replacing these gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates > pred.cart [1] T N N T T T N T N T N N Levels: T N > > # predicted classes of test samples for RF method > pred.rf = predictClassify(rf, data.testS4) found already estimated dispersions, replacing these gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates > pred.rf [1] N N N T T T N T N T N N Levels: T N > > > > > > dev.off() null device 1 > |
|||||||||||
|