R Graphical Manual

Browse All

Last data update: 2014.03.03

R: The Iterative Bayesian Model Averaging (BMA) algorithm for...

iterativeBMAsurv-package

R Documentation

The Iterative Bayesian Model Averaging (BMA) algorithm for survival analysis

Description

The iterative Bayesian Model Averaging (BMA) algorithm for survival analysis is a variable selection method for applying survival analysis to microarray data..

Details

Package:	iterativeBMAsurv
Type:	Package
Version:	0.1.0
Date:	2008-3-24
License:	GPL version 2 or higher

The function iterateBMAsurv.train selects relevant variables by iteratively applying the bic.surv function from the BMA package until all variables in the training data are exhausted. The variables are assumed to be pre-sorted by rank when this function is called. The function iterateBMAsurv.train.wrapper acts as a wrapper for iterateBMAsurv.train, returning the names of the selected variables and an object of class bic.surv if the iterations exhaust all variables in the training set (-1 otherwise). Again, the variables are assumed to be pre-sorted by rank, so calling this function allows users to experiment with different univariate ranking measures. The function iterateBMAsurv.train.predict.assess combines the training, prediction, and assessment phases. It returns a list consisting of the numbers of selected genes and models from the training phase, the predicted risk scores of the test samples, and the overall survival analysis statistics indicating the difference between risk groups (p-value, chi-square statistic, and variance matrix). It also writes a Kaplan-Meier survival analysis curve to file, which serves as a pictorial nonparametric estimator of the difference between risk groups. The variables are not assumed to be pre-sorted by rank when this function is called. iterateBMAsurv.train.predict.assess calls singleGeneCoxph, which ranks the genes based on their log likelihood scores using Cox Proportional Hazards Regression. iterateBMAsurv.train.predict.assess calls iterateBMAsurv.train.wrapper in its training phase, so if Cox Proportional Hazards Regression is the desired univariate ranking algorithm, then calling this function with the training and testing sets is all that is necessary for a complete survival analysis run. The function crossVal performs k runs of n-fold cross validation on a training data set, where k and n are specified by the user. crossVal calls iterateBMAsurv.train.predict.assess during each fold, so Cox Proportional Hazards Regression is the univariate ranking measure for this function.

Author(s)

Ka Yee Yeung, University of Washington, Seattle, WA, Amalia Annest, University of Washington, Tacoma, WA

Maintainer: Ka Yee Yeung <kayee@u.washington.edu> Amalia Annest <amanu@u.washington.edu>

References

Annest, A., Yeung, K.Y., Bumgarner, R.E., and Raftery, A.E. (2008). Iterative Bayesian Model Averaging for Survival Analysis. Manuscript in Progress.

Raftery, A.E. (1995). Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111-196, Cambridge, Mass.: Blackwells.

Volinsky, C., Madigan, D., Raftery, A., and Kronmal, R. (1997) Bayesian Model Averaging in Proprtional Hazard Models: Assessing the Risk of a Stroke. Applied Statistics 46: 433-448.

Yeung, K.Y., Bumgarner, R.E. and Raftery, A.E. (2005) Bayesian Model Averaging: Development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21: 2394-2402.

Examples

library (BMA)
library (iterativeBMAsurv)
data(trainData)
data(trainSurv)
data(trainCens)
data(testData)
data(testSurv)
data(testCens)

## Use p=10 genes and nbest=5 for fast computation
ret.bma <- iterateBMAsurv.train.predict.assess (train.dat=trainData, test.dat=testData, surv.time.train=trainSurv, surv.time.test=testSurv, cens.vec.train=trainCens, cens.vec.test=testCens, p=10, nbest=5)

## Extract the statistics from this survival analysis run
number.genes <- ret.bma$nvar
number.models <- ret.bma$nmodel
evaluate.success <- ret.bma$statistics

## Perform 1 run of 2-fold cross validation on the training set, using p=10 genes and nbest=5 for fast computation
cv <- crossVal(exset=trainData, survTime=trainSurv, censor=trainCens, diseaseType="DLBCL", noFolds=2, noRuns=1, p=10, nbest=5)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(iterativeBMAsurv)
Loading required package: BMA
Loading required package: survival
Loading required package: leaps
Loading required package: robustbase

Attaching package: 'robustbase'

The following object is masked from 'package:survival':

    heart

Loading required package: inline
Loading required package: rrcov
Scalable Robust Estimators with High Breakdown Point (version 1.3-11)

Loading required package: splines
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/iterativeBMAsurv/iterativeBMAsurv-package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: iterativeBMAsurv-package
> ### Title: The Iterative Bayesian Model Averaging (BMA) algorithm for
> ###   survival analysis
> ### Aliases: iterativeBMAsurv-package iterativeBMAsurv
> ### Keywords: multivariate survival
> 
> ### ** Examples
> 
> library (BMA)
> library (iterativeBMAsurv)
> data(trainData)
> data(trainSurv)
> data(trainCens)
> data(testData)
> data(testSurv)
> data(testCens)
> 
> ## Use p=10 genes and nbest=5 for fast computation
> ret.bma <- iterateBMAsurv.train.predict.assess (train.dat=trainData, test.dat=testData, surv.time.train=trainSurv, surv.time.test=testSurv, cens.vec.train=trainCens, cens.vec.test=testCens, p=10, nbest=5)
1: Explored up to variable # 10
Iterate bic.surv is done!
Selected genes:
 [1] "X33310" "X28197" "X19373" "X16527" "X27415" "X24394" "X28531" "X27585"
 [9] "X27766" "X26940"
Posterior probabilities of selected genes:
 [1] 100.0  18.9  16.9   0.0  19.6   0.0  47.0  12.6   0.0   4.3
# selected genes = 7
# selected models = 11
Risk Table:
           cens.vec.test
             0  1
  High Risk  7  9
  Low Risk   8 12
Call:
survdiff(formula = mySurv.obj ~ unlist(risk.groups))

                               N Observed Expected (O-E)^2/E (O-E)^2/V
unlist(risk.groups)=High Risk 16        9     8.21    0.0752     0.126
unlist(risk.groups)=Low Risk  20       12    12.79    0.0483     0.126

 Chisq= 0.1  on 1 degrees of freedom, p= 0.722 
          [,1]      [,2]
[1,]  4.891446 -4.891446
[2,] -4.891446  4.891446
[1] 0.1262768
There were 50 or more warnings (use warnings() to see the first 50)
> 
> ## Extract the statistics from this survival analysis run
> number.genes <- ret.bma$nvar
> number.models <- ret.bma$nmodel
> evaluate.success <- ret.bma$statistics
> 
> ## Perform 1 run of 2-fold cross validation on the training set, using p=10 genes and nbest=5 for fast computation
> cv <- crossVal(exset=trainData, survTime=trainSurv, censor=trainCens, diseaseType="DLBCL", noFolds=2, noRuns=1, p=10, nbest=5)
******************** BEGINNING CV RUN 1 ********************
---------- BEGINNING FOLD 1 ----------
1: Explored up to variable # 10
Iterate bic.surv is done!
Selected genes:
 [1] "X25054" "X19373" "X28197" "X24394" "X19327" "X27766" "X16527" "X27731"
 [9] "X27585" "X26940"
Posterior probabilities of selected genes:
 [1] 100.0   7.7  11.7  67.8   1.1  17.7   8.5  20.6   8.3  27.7
# selected genes = 10
# selected models = 20
Risk Table:
           cens.vec.test
             0  1
  High Risk  2  8
  Low Risk  15  7
Call:
survdiff(formula = mySurv.obj ~ unlist(risk.groups))

                               N Observed Expected (O-E)^2/E (O-E)^2/V
unlist(risk.groups)=High Risk 10        8     3.92      4.26      5.87
unlist(risk.groups)=Low Risk  22        7    11.08      1.50      5.87

 Chisq= 5.9  on 1 degrees of freedom, p= 0.0154 
          [,1]      [,2]
[1,]  2.841607 -2.841607
[2,] -2.841607  2.841607
[1] 5.867927
---------- BEGINNING FOLD 2 ----------
1: Explored up to variable # 10
Iterate bic.surv is done!
Selected genes:
 [1] "X33310" "X26627" "X34729" "X26423" "X29650" "X26586" "X26512" "X33166"
 [9] "X33014" "X28718"
Posterior probabilities of selected genes:
 [1] 59.5 25.4 98.1  9.6 59.2  4.9 18.2  0.0 17.9 95.3
# selected genes = 9
# selected models = 24
Risk Table:
           cens.vec.test
             0  1
  High Risk  3  8
  Low Risk  10 12
Call:
survdiff(formula = mySurv.obj ~ unlist(risk.groups))

                               N Observed Expected (O-E)^2/E (O-E)^2/V
unlist(risk.groups)=High Risk 11        8     4.45     2.821      3.85
unlist(risk.groups)=Low Risk  22       12    15.55     0.809      3.85

 Chisq= 3.8  on 1 degrees of freedom, p= 0.0499 
          [,1]      [,2]
[1,]  3.268738 -3.268738
[2,] -3.268738  3.268738
[1] 3.845127
******************** END CV RUN 1 ********************
Overall results from this run:
          Censored Uncensored Percent Uncensored
Low Risk        25         19           43.18182
High Risk        5         16           76.19048
Overall average result matrix over all runs:
          Censored Uncensored Percent Uncensored
Low Risk        25         19           43.18182
High Risk        5         16           76.19048
Average p-value across all folds and all runs:
[1] 0.03265496
Standard deviation of p-values across all folds and all runs:
[1] 0.02437505
Average chi-square value across all folds and all runs:
[1] 4.856527
Standard deviation of chi-square across all folds and all runs:
[1] 1.430335
There were 50 or more warnings (use warnings() to see the first 50)
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>