R: Risk Groups: assignment of patient test samples
predictiveAssessCategory
R Documentation
Risk Groups: assignment of patient test samples
Description
This function assigns a risk group (high-risk or low-risk) to each
patient sample in the test set based on the value of the patient's
predicted risk score. The cutPoint between high-risk and
low-risk is designated by the user.
A vector containing the predicted risk scores of the test samples.
y.pred.train
A vector containing the computed risk scores of the training samples.
cens.vec.test
A vector of censor data for the patient samples in the
test set. In general, 0 = censored and 1 = uncensored.
cutPoint
Threshold percent for separating high- from low-risk groups.
The default is 50.
Details
This function begins by using the computed risk scores of the training
set (y.pred.train) to define a real-number empirical cutoff point
between high- and low-risk groups. The cutoff point is determined by
the percentile cutPoint as designated by the user. The predicted
risk scores from the test samples are then matched against this cutoff
point to determine whether they belong in the high-risk or the low-risk
category.
Value
A list consisting of 2 components:
assign.risk
A 2 x 2 table indicating the number of test samples in each
category (high-risk/censored, high-risk/uncensored,
low-risk/censored, low-risk/uncensored).
groups
A list of all patient samples in the test set with their
corresponding 'High-risk' or 'Low-risk' designations.
References
Annest, A., Yeung, K.Y., Bumgarner, R.E., and Raftery, A.E. (2008).
Iterative Bayesian Model Averaging for Survival Analysis.
Manuscript in Progress.
Raftery, A.E. (1995).
Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111-196, Cambridge, Mass.: Blackwells.
Volinsky, C., Madigan, D., Raftery, A., and Kronmal, R. (1997)
Bayesian Model Averaging in Proprtional Hazard Models: Assessing the Risk of a Stroke.
Applied Statistics 46: 433-448.
Yeung, K.Y., Bumgarner, R.E. and Raftery, A.E. (2005)
Bayesian Model Averaging: Development of an improved multi-class, gene selection and classification tool for microarray data.
Bioinformatics 21: 2394-2402.
library(BMA)
library(iterativeBMAsurv)
data(trainData)
data(trainSurv)
data(trainCens)
data(testData)
data(testSurv)
data(testCens)
## Training should be pre-sorted before beginning
## Initialize the matrix for the active bic.surv window with variables 1 through maxNvar
maxNvar <- 25
curr.mat <- trainData[, 1:maxNvar]
nextVar <- maxNvar + 1
## Training phase: select relevant genes, using nbest=5 for fast computation
ret.bic.surv <- iterateBMAsurv.train (x=trainData, surv.time=trainSurv, cens.vec=trainCens, curr.mat, stopVar=0, nextVar, maxNvar=25, nbest=5)
# Apply bic.surv again using selected genes
ret.bma <- bic.surv (x=ret.bic.surv$curr.mat, surv.t=trainSurv, cens=trainCens, nbest=5, maxCol=(maxNvar+1))
## Get the matrix for genes with probne0 > 0
ret.gene.mat <- ret.bic.surv$curr.mat[ret.bma$probne0 > 0]
## Get the gene names from ret.gene.mat
selected.genes <- dimnames(ret.gene.mat)[[2]]
## Show the posterior probabilities of selected models
ret.bma$postprob
## Get the subset of test data with the genes from the last iteration of 'bic.surv'
curr.test.dat <- testData[, selected.genes]
## Compute the predicted risk scores for the test samples
y.pred.test <- apply (curr.test.dat, 1, predictBicSurv, postprob.vec=ret.bma$postprob, mle.mat=ret.bma$mle)
## Compute the risk scores in the training set
y.pred.train <- apply (trainData[, selected.genes], 1, predictBicSurv, postprob.vec=ret.bma$postprob, mle.mat=ret.bma$mle)
## Assign risk categories for test samples
ret.table <- predictiveAssessCategory (y.pred.test, y.pred.train, testCens, cutPoint=50)
## Extract risk group vector and risk group table
risk.list <- ret.table$groups
risk.table <- ret.table$assign.risk
## Create a survival object from the test set
mySurv.obj <- Surv(testSurv, testCens)
## Extract statistics including p-value and chi-square
stats <- survdiff(mySurv.obj ~ unlist(risk.list))
## The entire block of code above can be executed simply by calling
## 'iterateBMAsurv.train.predict.assess'
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(iterativeBMAsurv)
Loading required package: BMA
Loading required package: survival
Loading required package: leaps
Loading required package: robustbase
Attaching package: 'robustbase'
The following object is masked from 'package:survival':
heart
Loading required package: inline
Loading required package: rrcov
Scalable Robust Estimators with High Breakdown Point (version 1.3-11)
Loading required package: splines
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/iterativeBMAsurv/predictiveAssessCategory.Rd_%03d_medium.png", width=480, height=480)
> ### Name: predictiveAssessCategory
> ### Title: Risk Groups: assignment of patient test samples
> ### Aliases: predictiveAssessCategory
> ### Keywords: survival
>
> ### ** Examples
>
> library(BMA)
> library(iterativeBMAsurv)
> data(trainData)
> data(trainSurv)
> data(trainCens)
> data(testData)
> data(testSurv)
> data(testCens)
>
> ## Training should be pre-sorted before beginning
>
> ## Initialize the matrix for the active bic.surv window with variables 1 through maxNvar
> maxNvar <- 25
> curr.mat <- trainData[, 1:maxNvar]
> nextVar <- maxNvar + 1
>
> ## Training phase: select relevant genes, using nbest=5 for fast computation
> ret.bic.surv <- iterateBMAsurv.train (x=trainData, surv.time=trainSurv, cens.vec=trainCens, curr.mat, stopVar=0, nextVar, maxNvar=25, nbest=5)
17: Explored up to variable # 100
Iterate bic.surv is done!
Selected genes:
[1] "X31687" "X33840" "X31242" "X16948" "X31471" "X17154" "X28531" "X19241"
[9] "X26146" "X17804" "X27332" "X17241" "X32212" "X29911" "X33558" "X33013"
[17] "X27884" "X33706" "X16817" "X31968" "X30209" "X29650" "X25054" "X16988"
[25] "X32904"
Posterior probabilities of selected genes:
[1] 100.0 47.5 47.3 2.4 38.5 28.5 40.1 96.7 2.8 1.7 0.0 59.9
[13] 0.0 0.0 10.0 0.0 2.5 58.3 2.1 98.8 28.4 7.1 95.1 0.0
[25] 100.0
>
> # Apply bic.surv again using selected genes
> ret.bma <- bic.surv (x=ret.bic.surv$curr.mat, surv.t=trainSurv, cens=trainCens, nbest=5, maxCol=(maxNvar+1))
>
> ## Get the matrix for genes with probne0 > 0
> ret.gene.mat <- ret.bic.surv$curr.mat[ret.bma$probne0 > 0]
>
> ## Get the gene names from ret.gene.mat
> selected.genes <- dimnames(ret.gene.mat)[[2]]
>
> ## Show the posterior probabilities of selected models
> ret.bma$postprob
[1] 0.075782322 0.068183539 0.062240254 0.056227073 0.045761712 0.044794588
[7] 0.043328132 0.042831731 0.039567629 0.039285627 0.038997242 0.034867824
[13] 0.032225236 0.030210326 0.026904418 0.025508701 0.025052995 0.024869256
[19] 0.021711946 0.021061750 0.020689119 0.020114454 0.017345536 0.017179713
[25] 0.017104052 0.015294500 0.014059561 0.014050900 0.012658966 0.010182444
[31] 0.008768581 0.007844758 0.007014883 0.006609877 0.006555310 0.005115046
>
> ## Get the subset of test data with the genes from the last iteration of 'bic.surv'
> curr.test.dat <- testData[, selected.genes]
>
> ## Compute the predicted risk scores for the test samples
> y.pred.test <- apply (curr.test.dat, 1, predictBicSurv, postprob.vec=ret.bma$postprob, mle.mat=ret.bma$mle)
There were 50 or more warnings (use warnings() to see the first 50)
>
> ## Compute the risk scores in the training set
> y.pred.train <- apply (trainData[, selected.genes], 1, predictBicSurv, postprob.vec=ret.bma$postprob, mle.mat=ret.bma$mle)
There were 50 or more warnings (use warnings() to see the first 50)
>
> ## Assign risk categories for test samples
> ret.table <- predictiveAssessCategory (y.pred.test, y.pred.train, testCens, cutPoint=50)
>
> ## Extract risk group vector and risk group table
> risk.list <- ret.table$groups
> risk.table <- ret.table$assign.risk
>
> ## Create a survival object from the test set
> mySurv.obj <- Surv(testSurv, testCens)
>
> ## Extract statistics including p-value and chi-square
> stats <- survdiff(mySurv.obj ~ unlist(risk.list))
>
> ## The entire block of code above can be executed simply by calling
> ## 'iterateBMAsurv.train.predict.assess'
>
>
>
>
>
>
> dev.off()
null device
1
>