This function repeatedly calls bic.glm from the
BMA package until all variables are exhausted.
The data is assumed to consist of
two classes. Logistic regression is used for classification.
Usage
iterateBMAglm.wrapper (sortedA, y, nbest=10, maxNvar=30, maxIter=20000, thresProbne0=1)
Arguments
sortedA
data matrix where columns are variables and rows are
observations. The variables (columns) are assumed to
be sorted using a univariate measure.
In the case of gene expression data, the columns (variables)
represent genes, while the rows (observations) represent
samples or experiments.
y
class vector for the observations (samples or
experiments) in the training data.
Class numbers are assumed to start from 0,
and the length of this class vector should be equal
to the number of rows in sortedA.
Since we assume 2-class data, we expect the class vector
consists of zero's and one's.
nbest
a number specifying the number of models of each size
returned to bic.glm in the BMA package.
The default is 10.
maxNvar
a number indicating the maximum number of variables used in
each iteration of bic.glm from the BMA package.
The default is 30.
maxIter
a number indicating the maximum of iterations of
bic.glm. The default is 20000.
thresProbne0
a number specifying the threshold for the posterior
probability that each variable (gene) is non-zero (in
percent). Variables (genes) with such posterior
probability less than this threshold are dropped in
the iterative application of bic.glm. The default
is 1 percent.
Details
In this function, the variables are assumed to be sorted, and
bic.glm is called repeatedly. In the first application of
the bic.glm algorithm, the top maxNvar univariate
ranked genes are used. After each application of the bic.glm
algorithm, the genes with probne0 < thresProbne0
are dropped, and the next univariate ordered genes are added
to the BMA window.
The function iterateBMAglm.train calls BssWssFast before
calling this function.
Using this function, users can experiment with alternative
univariate measures.
Value
If all variables are exhausted, an object of class
bic.glm returned by the last iteration
of bic.glm. Otherwise, -1 is returned.
The object of class bic.glm is a list consisting
of the following components:
namesx
the names of the variables in the last iteration of
bic.glm.
postprob
the posterior probabilities of the models selected.
deviance
the estimated model deviances.
label
labels identifying the models selected.
bic
values of BIC for the models.
size
the number of independent variables in each of the models.
which
a logical matrix with one row per model and one column per
variable indicating whether that variable is in the model.
probne0
the posterior probability that each variable is non-zero
(in percent).
postmean
the posterior mean of each coefficient (from model averaging).
postsd
the posterior standard deviation of each coefficient
(from model averaging).
condpostmean
the posterior mean of each coefficient conditional on
the variable being included in the model.
condpostsd
the posterior standard deviation of each coefficient
conditional on the variable being included in the model.
mle
matrix with one row per model and one column per variable giving
the maximum likelihood estimate of each coefficient for each model.
se
matrix with one row per model and one column per variable giving
the standard error of each coefficient for each model.
reduced
a logical indicating whether any variables were dropped
before model averaging.
dropped
a vector containing the names of those variables dropped
before model averaging.
call
the matched call that created the bma.lm object.
Note
The BMA and Biobase packages are required.
References
Raftery, A.E. (1995).
Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111-196, Cambridge, Mass.: Blackwells.
Yeung, K.Y., Bumgarner, R.E. and Raftery, A.E. (2005)
Bayesian Model Averaging: Development of an improved multi-class, gene selection and classification tool for microarray data.
Bioinformatics 21: 2394-2402.
library (Biobase)
library (BMA)
library (iterativeBMA)
data(trainData)
data(trainClass)
## Use the BSS/WSS ratio to rank all genes in the training data
sorted.vec <- BssWssFast (t(exprs(trainData)), trainClass, numClass = 2)
## get the top ranked 50 genes
sorted.train.dat <- t(exprs(trainData[sorted.vec$ix[1:50], ]))
## run iterative bic.glm
ret.bic.glm <- iterateBMAglm.wrapper (sorted.train.dat, y=trainClass)
## The above commands are equivalent to the following
ret.bic.glm <- iterateBMAglm.train (train.expr.set=trainData, trainClass, p=50)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(iterativeBMA)
Loading required package: BMA
Loading required package: survival
Loading required package: leaps
Loading required package: robustbase
Attaching package: 'robustbase'
The following object is masked from 'package:survival':
heart
Loading required package: inline
Loading required package: rrcov
Scalable Robust Estimators with High Breakdown Point (version 1.3-11)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Attaching package: 'Biobase'
The following object is masked from 'package:robustbase':
rowMedians
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/iterativeBMA/iterateBMAglm_wrapper.Rd_%03d_medium.png", width=480, height=480)
> ### Name: iterateBMAglm.wrapper
> ### Title: Iterative Bayesian Model Averaging
> ### Aliases: iterateBMAglm.wrapper
> ### Keywords: multivariate classif
>
> ### ** Examples
>
> library (Biobase)
> library (BMA)
> library (iterativeBMA)
> data(trainData)
> data(trainClass)
>
> ## Use the BSS/WSS ratio to rank all genes in the training data
> sorted.vec <- BssWssFast (t(exprs(trainData)), trainClass, numClass = 2)
> ## get the top ranked 50 genes
> sorted.train.dat <- t(exprs(trainData[sorted.vec$ix[1:50], ]))
>
> ## run iterative bic.glm
> ret.bic.glm <- iterateBMAglm.wrapper (sorted.train.dat, y=trainClass)
[1] "2: explored up to variable ## 50"
There were 50 or more warnings (use warnings() to see the first 50)
>
> ## The above commands are equivalent to the following
> ret.bic.glm <- iterateBMAglm.train (train.expr.set=trainData, trainClass, p=50)
[1] "2: explored up to variable ## 50"
There were 50 or more warnings (use warnings() to see the first 50)
>
>
>
>
>
>
> dev.off()
null device
1
>