data(cervical)
data = cervical[c(1:150),] # a subset of cervical data with first 150 features.
class = data.frame(condition=factor(rep(c("N","T"),c(29,29))))# defining sample classes.
n = ncol(data) # number of samples
p = nrow(data) # number of features
nTest = ceiling(n*0.2) # number of samples for test set (20% test, 80% train).
ind = sample(n,nTest,FALSE)
# train set
data.train = data[,-ind]
data.train = as.matrix(data.train + 1)
classtr = data.frame(condition=class[-ind,])
# train set in S4 class
data.trainS4 = DESeqDataSetFromMatrix(countData = data.train,
colData = classtr, formula(~ condition))
data.trainS4 = DESeq(data.trainS4, fitType="local")
# Random Forest (RF) Classification
rf = classify(data = data.trainS4, method = "randomforest", normalize = "deseq", deseqTransform = "vst", cv = 5, rpt = 3, ref="T")
normalization(rf)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(MLSeq)
Loading required package: caret
Loading required package: lattice
Loading required package: ggplot2
Loading required package: DESeq2
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: limma
Attaching package: 'limma'
The following object is masked from 'package:DESeq2':
plotMA
The following object is masked from 'package:BiocGenerics':
plotMA
Loading required package: randomForest
randomForest 4.6-12
Type rfNews() to see new features/changes/bug fixes.
Attaching package: 'randomForest'
The following object is masked from 'package:Biobase':
combine
The following object is masked from 'package:BiocGenerics':
combine
The following object is masked from 'package:ggplot2':
margin
Loading required package: edgeR
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/MLSeq/normalization.Rd_%03d_medium.png", width=480, height=480)
> ### Name: normalization-methods
> ### Title: Accessors for the 'normalization' slot of an MLSeq object
> ### Aliases: normalization normalization,MLSeq-method
>
> ### ** Examples
>
> data(cervical)
>
> data = cervical[c(1:150),] # a subset of cervical data with first 150 features.
>
> class = data.frame(condition=factor(rep(c("N","T"),c(29,29))))# defining sample classes.
>
> n = ncol(data) # number of samples
> p = nrow(data) # number of features
>
> nTest = ceiling(n*0.2) # number of samples for test set (20% test, 80% train).
> ind = sample(n,nTest,FALSE)
>
> # train set
> data.train = data[,-ind]
> data.train = as.matrix(data.train + 1)
> classtr = data.frame(condition=class[-ind,])
>
> # train set in S4 class
> data.trainS4 = DESeqDataSetFromMatrix(countData = data.train,
+ colData = classtr, formula(~ condition))
converting counts to integer mode
> data.trainS4 = DESeq(data.trainS4, fitType="local")
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 9 genes
-- DESeq argument 'minReplicatesForReplace' = 7
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
>
> # Random Forest (RF) Classification
> rf = classify(data = data.trainS4, method = "randomforest", normalize = "deseq", deseqTransform = "vst", cv = 5, rpt = 3, ref="T")
found already estimated dispersions, replacing these
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
>
> normalization(rf)
[1] "deseq"
>
>
>
>
>
>
> dev.off()
null device
1
>