R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Risk Groups: assignment of patient test samples

predictiveAssessCategory

R Documentation

Risk Groups: assignment of patient test samples

Description

This function assigns a risk group (high-risk or low-risk) to each patient sample in the test set based on the value of the patient's predicted risk score. The cutPoint between high-risk and low-risk is designated by the user.

Usage

predictiveAssessCategory (y.pred.test, y.pred.train, cens.vec.test, cutPoint=50)

Arguments

`y.pred.test`	A vector containing the predicted risk scores of the test samples.
`y.pred.train`	A vector containing the computed risk scores of the training samples.
`cens.vec.test`	A vector of censor data for the patient samples in the test set. In general, 0 = censored and 1 = uncensored.
`cutPoint`	Threshold percent for separating high- from low-risk groups. The default is 50.

Details

This function begins by using the computed risk scores of the training set (y.pred.train) to define a real-number empirical cutoff point between high- and low-risk groups. The cutoff point is determined by the percentile cutPoint as designated by the user. The predicted risk scores from the test samples are then matched against this cutoff point to determine whether they belong in the high-risk or the low-risk category.

Value

A list consisting of 2 components:

`assign.risk`	A 2 x 2 table indicating the number of test samples in each category (high-risk/censored, high-risk/uncensored, low-risk/censored, low-risk/uncensored).
`groups`	A list of all patient samples in the test set with their corresponding 'High-risk' or 'Low-risk' designations.

References

Annest, A., Yeung, K.Y., Bumgarner, R.E., and Raftery, A.E. (2008). Iterative Bayesian Model Averaging for Survival Analysis. Manuscript in Progress.

Raftery, A.E. (1995). Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111-196, Cambridge, Mass.: Blackwells.

Volinsky, C., Madigan, D., Raftery, A., and Kronmal, R. (1997) Bayesian Model Averaging in Proprtional Hazard Models: Assessing the Risk of a Stroke. Applied Statistics 46: 433-448.

Yeung, K.Y., Bumgarner, R.E. and Raftery, A.E. (2005) Bayesian Model Averaging: Development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21: 2394-2402.

Examples

library(BMA)
library(iterativeBMAsurv)
data(trainData)
data(trainSurv)
data(trainCens)
data(testData)
data(testSurv)
data(testCens)

## Training should be pre-sorted before beginning

## Initialize the matrix for the active bic.surv window with variables 1 through maxNvar
maxNvar <- 25
curr.mat <- trainData[, 1:maxNvar]
nextVar <- maxNvar + 1

## Training phase: select relevant genes, using nbest=5 for fast computation
ret.bic.surv <- iterateBMAsurv.train (x=trainData, surv.time=trainSurv, cens.vec=trainCens, curr.mat, stopVar=0, nextVar, maxNvar=25, nbest=5)

# Apply bic.surv again using selected genes
ret.bma <- bic.surv (x=ret.bic.surv$curr.mat, surv.t=trainSurv, cens=trainCens, nbest=5, maxCol=(maxNvar+1))

## Get the matrix for genes with probne0 > 0
ret.gene.mat <- ret.bic.surv$curr.mat[ret.bma$probne0 > 0]

## Get the gene names from ret.gene.mat
selected.genes <- dimnames(ret.gene.mat)[[2]]

## Show the posterior probabilities of selected models
ret.bma$postprob

## Get the subset of test data with the genes from the last iteration of 'bic.surv'
curr.test.dat <- testData[, selected.genes]

## Compute the predicted risk scores for the test samples
y.pred.test <- apply (curr.test.dat, 1, predictBicSurv, postprob.vec=ret.bma$postprob, mle.mat=ret.bma$mle)

## Compute the risk scores in the training set
y.pred.train <- apply (trainData[, selected.genes], 1, predictBicSurv, postprob.vec=ret.bma$postprob, mle.mat=ret.bma$mle)

## Assign risk categories for test samples
ret.table <- predictiveAssessCategory (y.pred.test, y.pred.train, testCens, cutPoint=50) 

## Extract risk group vector and risk group table
risk.list <- ret.table$groups
risk.table <- ret.table$assign.risk

## Create a survival object from the test set
mySurv.obj <- Surv(testSurv, testCens)

## Extract statistics including p-value and chi-square
stats <- survdiff(mySurv.obj ~ unlist(risk.list))

## The entire block of code above can be executed simply by calling
## 'iterateBMAsurv.train.predict.assess'

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(iterativeBMAsurv)
Loading required package: BMA
Loading required package: survival
Loading required package: leaps
Loading required package: robustbase

Attaching package: 'robustbase'

The following object is masked from 'package:survival':

    heart

Loading required package: inline
Loading required package: rrcov
Scalable Robust Estimators with High Breakdown Point (version 1.3-11)

Loading required package: splines
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/iterativeBMAsurv/predictiveAssessCategory.Rd_%03d_medium.png", width=480, height=480)
> ### Name: predictiveAssessCategory
> ### Title: Risk Groups: assignment of patient test samples
> ### Aliases: predictiveAssessCategory
> ### Keywords: survival
> 
> ### ** Examples
> 
> library(BMA)
> library(iterativeBMAsurv)
> data(trainData)
> data(trainSurv)
> data(trainCens)
> data(testData)
> data(testSurv)
> data(testCens)
> 
> ## Training should be pre-sorted before beginning
> 
> ## Initialize the matrix for the active bic.surv window with variables 1 through maxNvar
> maxNvar <- 25
> curr.mat <- trainData[, 1:maxNvar]
> nextVar <- maxNvar + 1
> 
> ## Training phase: select relevant genes, using nbest=5 for fast computation
> ret.bic.surv <- iterateBMAsurv.train (x=trainData, surv.time=trainSurv, cens.vec=trainCens, curr.mat, stopVar=0, nextVar, maxNvar=25, nbest=5)
17: Explored up to variable # 100
Iterate bic.surv is done!
Selected genes:
 [1] "X31687" "X33840" "X31242" "X16948" "X31471" "X17154" "X28531" "X19241"
 [9] "X26146" "X17804" "X27332" "X17241" "X32212" "X29911" "X33558" "X33013"
[17] "X27884" "X33706" "X16817" "X31968" "X30209" "X29650" "X25054" "X16988"
[25] "X32904"
Posterior probabilities of selected genes:
 [1] 100.0  47.5  47.3   2.4  38.5  28.5  40.1  96.7   2.8   1.7   0.0  59.9
[13]   0.0   0.0  10.0   0.0   2.5  58.3   2.1  98.8  28.4   7.1  95.1   0.0
[25] 100.0
> 
> # Apply bic.surv again using selected genes
> ret.bma <- bic.surv (x=ret.bic.surv$curr.mat, surv.t=trainSurv, cens=trainCens, nbest=5, maxCol=(maxNvar+1))
> 
> ## Get the matrix for genes with probne0 > 0
> ret.gene.mat <- ret.bic.surv$curr.mat[ret.bma$probne0 > 0]
> 
> ## Get the gene names from ret.gene.mat
> selected.genes <- dimnames(ret.gene.mat)[[2]]
> 
> ## Show the posterior probabilities of selected models
> ret.bma$postprob
 [1] 0.075782322 0.068183539 0.062240254 0.056227073 0.045761712 0.044794588
 [7] 0.043328132 0.042831731 0.039567629 0.039285627 0.038997242 0.034867824
[13] 0.032225236 0.030210326 0.026904418 0.025508701 0.025052995 0.024869256
[19] 0.021711946 0.021061750 0.020689119 0.020114454 0.017345536 0.017179713
[25] 0.017104052 0.015294500 0.014059561 0.014050900 0.012658966 0.010182444
[31] 0.008768581 0.007844758 0.007014883 0.006609877 0.006555310 0.005115046
> 
> ## Get the subset of test data with the genes from the last iteration of 'bic.surv'
> curr.test.dat <- testData[, selected.genes]
> 
> ## Compute the predicted risk scores for the test samples
> y.pred.test <- apply (curr.test.dat, 1, predictBicSurv, postprob.vec=ret.bma$postprob, mle.mat=ret.bma$mle)
There were 50 or more warnings (use warnings() to see the first 50)
> 
> ## Compute the risk scores in the training set
> y.pred.train <- apply (trainData[, selected.genes], 1, predictBicSurv, postprob.vec=ret.bma$postprob, mle.mat=ret.bma$mle)
There were 50 or more warnings (use warnings() to see the first 50)
> 
> ## Assign risk categories for test samples
> ret.table <- predictiveAssessCategory (y.pred.test, y.pred.train, testCens, cutPoint=50) 
> 
> ## Extract risk group vector and risk group table
> risk.list <- ret.table$groups
> risk.table <- ret.table$assign.risk
> 
> ## Create a survival object from the test set
> mySurv.obj <- Surv(testSurv, testCens)
> 
> ## Extract statistics including p-value and chi-square
> stats <- survdiff(mySurv.obj ~ unlist(risk.list))
> 
> ## The entire block of code above can be executed simply by calling
> ## 'iterateBMAsurv.train.predict.assess' 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>