R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Iterative Bayesian Model Averaging: training step

iterateBMAglm.train

R Documentation

Iterative Bayesian Model Averaging: training step

Description

Classification and variable selection on microarray data. This is a multivariate technique to select a small number of relevant variables (typically genes) to classify microarray samples. This function performs the training phase. The data is assumed to consist of two classes. Logistic regression is used for classification.

Usage

iterateBMAglm.train (train.expr.set, train.class, p=100, nbest=10, maxNvar=30, maxIter=20000, thresProbne0=1)

Arguments

`train.expr.set`	an `ExpressionSet` object. We assume the rows in the expression data represent variables (genes), while the columns represent samples or experiments. This training data is used to select relevant genes (variables) for classification.
`train.class`	class vector for the observations (samples or experiments) in the training data. Class numbers are assumed to start from 0, and the length of this class vector should be equal to the number of rows in train.dat. Since we assume 2-class data, we expect the class vector consists of zero's and one's.
`p`	a number indicating the maximum number of top univariate genes used in the iterative BMA algorithm. This number is assumed to be less than the total number of genes in the training data. A larger p usually requires longer computational time as more iterations of the BMA algorithm are potentially applied. The default is 100.
`nbest`	a number specifying the number of models of each size returned to `bic.glm` in the `BMA` package. The default is 10.
`maxNvar`	a number indicating the maximum number of variables used in each iteration of `bic.glm` from the `BMA` package. The default is 30.
`maxIter`	a number indicating the maximum of iterations of `bic.glm`. The default is 20000.
`thresProbne0`	a number specifying the threshold for the posterior probability that each variable (gene) is non-zero (in percent). Variables (genes) with such posterior probability less than this threshold are dropped in the iterative application of `bic.glm`. The default is 1 percent.

Details

The training phase consists of first ordering all the variables (genes) by a univariate measure called between-groups to within-groups sums-of-squares (BSS/WSS) ratio, and then iteratively applying the bic.glm algorithm from the BMA package. In the first application of the bic.glm algorithm, the top maxNvar univariate ranked genes are used. After each application of the bic.glm algorithm, the genes with probne0 < thresProbne0 are dropped, and the next univariate ordered genes are added to the BMA window.

Value

An object of class bic.glm returned by the last iteration of bic.glm. The object is a list consisting of the following components:

`namesx`	the names of the variables in the last iteration of `bic.glm`.
`postprob`	the posterior probabilities of the models selected.
`deviance`	the estimated model deviances.
`label`	labels identifying the models selected.
`bic`	values of BIC for the models.
`size`	the number of independent variables in each of the models.
`which`	a logical matrix with one row per model and one column per variable indicating whether that variable is in the model.
`probne0`	the posterior probability that each variable is non-zero (in percent).
`postmean`	the posterior mean of each coefficient (from model averaging).
`postsd`	the posterior standard deviation of each coefficient (from model averaging).
`condpostmean`	the posterior mean of each coefficient conditional on the variable being included in the model.
`condpostsd`	the posterior standard deviation of each coefficient conditional on the variable being included in the model.
`mle`	matrix with one row per model and one column per variable giving the maximum likelihood estimate of each coefficient for each model.
`se`	matrix with one row per model and one column per variable giving the standard error of each coefficient for each model.
`reduced`	a logical indicating whether any variables were dropped before model averaging.
`dropped`	a vector containing the names of those variables dropped before model averaging.
`call`	the matched call that created the bma.lm object.

Note

The BMA and Biobase packages are required.

References

Raftery, A.E. (1995). Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111-196, Cambridge, Mass.: Blackwells.

Yeung, K.Y., Bumgarner, R.E. and Raftery, A.E. (2005) Bayesian Model Averaging: Development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21: 2394-2402.

Examples

library (Biobase)
library (BMA)
library (iterativeBMA)
data(trainData)
data(trainClass)

## training phase: select relevant genes
ret.bic.glm <- iterateBMAglm.train (train.expr.set=trainData, trainClass, p=100)

## get the selected genes with probne0 > 0
ret.gene.names <- ret.bic.glm$namesx[ret.bic.glm$probne0 > 0]

## show the posterior probabilities of selected models
ret.bic.glm$postprob

data (testData)

## get the subset of test data with the genes from the last iteration of bic.glm
curr.test.dat <- t(exprs(testData)[ret.gene.names,])

## to compute the predicted probabilities for the test samples
y.pred.test <- apply (curr.test.dat, 1, bma.predict, postprobArr=ret.bic.glm$postprob, mleArr=ret.bic.glm$mle)

## compute the Brier Score if the class labels of the test samples are known
data (testClass)
brier.score (y.pred.test, testClass)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(iterativeBMA)
Loading required package: BMA
Loading required package: survival
Loading required package: leaps
Loading required package: robustbase

Attaching package: 'robustbase'

The following object is masked from 'package:survival':

    heart

Loading required package: inline
Loading required package: rrcov
Scalable Robust Estimators with High Breakdown Point (version 1.3-11)

Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Attaching package: 'Biobase'

The following object is masked from 'package:robustbase':

    rowMedians

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/iterativeBMA/iterateBMAglm_train.Rd_%03d_medium.png", width=480, height=480)
> ### Name: iterateBMAglm.train
> ### Title: Iterative Bayesian Model Averaging: training step
> ### Aliases: iterateBMAglm.train
> ### Keywords: multivariate classif
> 
> ### ** Examples
> 
> library (Biobase)
> library (BMA)
> library (iterativeBMA)
> data(trainData)
> data(trainClass)
> 
> ## training phase: select relevant genes
> ret.bic.glm <- iterateBMAglm.train (train.expr.set=trainData, trainClass, p=100)
[1] "5: explored up to variable ## 100"
There were 50 or more warnings (use warnings() to see the first 50)
> 
> ## get the selected genes with probne0 > 0
> ret.gene.names <- ret.bic.glm$namesx[ret.bic.glm$probne0 > 0]
> 
> ## show the posterior probabilities of selected models
> ret.bic.glm$postprob
 [1] 0.40650525 0.06594386 0.06594386 0.06594386 0.06594386 0.06594386
 [7] 0.06594386 0.06594386 0.06594386 0.06594386
> 
> data (testData)
> 
> ## get the subset of test data with the genes from the last iteration of bic.glm
> curr.test.dat <- t(exprs(testData)[ret.gene.names,])
> 
> ## to compute the predicted probabilities for the test samples
> y.pred.test <- apply (curr.test.dat, 1, bma.predict, postprobArr=ret.bic.glm$postprob, mleArr=ret.bic.glm$mle)
> 
> ## compute the Brier Score if the class labels of the test samples are known
> data (testClass)
> brier.score (y.pred.test, testClass)
[1] 2.106802
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>