R: Iterative Bayesian Model Averaging: training and prediction
iterateBMAglm.train.predict
R Documentation
Iterative Bayesian Model Averaging: training and prediction
Description
Classification and variable selection on microarray data.
This is a multivariate technique to select a small number
of relevant variables (typically genes) to classify
microarray samples. This function performs the training,
and prediction steps. The data is assumed to consist of
two classes. Logistic regression is used for classification.
an ExpressionSet object.
We assume the rows in the expression data represent variables (genes),
while the columns represent
samples or experiments. This training data is used to
select relevant genes (variables) for classification.
test.expr.set
an ExpressionSet object.
We assume the rows in the expression data represent variables (genes),
while the columns represent samples or experiments.
The variables selected using the
training data is used to classify samples on this test data.
train.class
class vector for the observations (samples or
experiments) in the training data.
Class numbers are assumed to start from 0,
and the length of this class vector should be equal
to the number of rows in train.dat.
Since we assume 2-class data, we expect the class vector
consists of zero's and one's.
p
a number indicating the maximum number of top univariate genes
used in the iterative BMA algorithm. This number is assumed to be
less than the total number of genes in the training data.
A larger p usually requires longer computational time as more
iterations of the BMA algorithm are potentially applied.
The default is 100.
nbest
a number specifying the number of models of each size
returned to bic.glm in the BMA package.
The default is 10.
maxNvar
a number indicating the maximum number of variables used in
each iteration of bic.glm from the BMA package.
The default is 30.
maxIter
a number indicating the maximum of iterations of
bic.glm. The default is 20000.
thresProbne0
a number specifying the threshold for the posterior
probability that each variable (gene) is non-zero (in
percent). Variables (genes) with such posterior
probability less than this threshold are dropped in
the iterative application of bic.glm. The default
is 1 percent.
Details
This function consists of the training phase and the prediction
phase. The training phase consists of first
ordering all the variables (genes) by a univariate measure
called between-groups to within-groups sums-of-squares (BSS/WSS)
ratio, and then iteratively applying the bic.glm algorithm
from the BMA package. The prediction phase uses the variables
(genes) selected in the training phase to classify the samples
in the test set.
Value
A vector consisting of the predicted probability that each test
sample belongs to class 1 is returned.
Note
The BMA and Biobase packages are required.
References
Raftery, A.E. (1995).
Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111-196, Cambridge, Mass.: Blackwells.
Yeung, K.Y., Bumgarner, R.E. and Raftery, A.E. (2005)
Bayesian Model Averaging: Development of an improved multi-class, gene selection and classification tool for microarray data.
Bioinformatics 21: 2394-2402.
library (Biobase)
library (BMA)
library (iterativeBMA)
data(trainData)
data(trainClass)
data (testData)
ret.vec <- iterateBMAglm.train.predict (train.expr.set=trainData, test.expr.set=testData, trainClass, p=100)
## compute the Brier Score
data (testClass)
brier.score (ret.vec, testClass)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(iterativeBMA)
Loading required package: BMA
Loading required package: survival
Loading required package: leaps
Loading required package: robustbase
Attaching package: 'robustbase'
The following object is masked from 'package:survival':
heart
Loading required package: inline
Loading required package: rrcov
Scalable Robust Estimators with High Breakdown Point (version 1.3-11)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Attaching package: 'Biobase'
The following object is masked from 'package:robustbase':
rowMedians
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/iterativeBMA/iterateBMAglm_train_predict.Rd_%03d_medium.png", width=480, height=480)
> ### Name: iterateBMAglm.train.predict
> ### Title: Iterative Bayesian Model Averaging: training and prediction
> ### Aliases: iterateBMAglm.train.predict
> ### Keywords: multivariate classif
>
> ### ** Examples
>
> library (Biobase)
> library (BMA)
> library (iterativeBMA)
> data(trainData)
> data(trainClass)
> data (testData)
>
> ret.vec <- iterateBMAglm.train.predict (train.expr.set=trainData, test.expr.set=testData, trainClass, p=100)
[1] "5: explored up to variable ## 100"
There were 50 or more warnings (use warnings() to see the first 50)
>
> ## compute the Brier Score
> data (testClass)
> brier.score (ret.vec, testClass)
[1] 2.221017
>
>
>
>
>
>
> dev.off()
null device
1
>