Last data update: 2014.03.03

R: Formatting and printing 'TitanCNA' results.
Formatting and output of Titan resultsR Documentation

Formatting and printing TitanCNA results.

Description

Function to format TitanCNA results in to a data.frame and output the results to a tab-delimited file.

Usage

  outputTitanResults(data, convergeParams, optimalPath, filename = NULL, 
      posteriorProbs = FALSE, subcloneProfiles = TRUE)
  outputModelParameters(convergeParams, results, filename, 
  		S_Dbw.scale = 1, S_Dbw.method = "Tong")

Arguments

data

list object that contains the components for the data to be analyzed. chr, posn, ref, and tumDepth that can be obtained using loadAlleleCounts, and logR that can be obtained using correctReadDepth and getPositionOverlap (see Example).

convergeParams

list object that is returned from the function runEMclonalCN in TitanCNA.

optimalPath

numeric array containing the optimal TitanCNA genotype and clonal cluster states for each data point in the analysis. optimalPath is obtained from running viterbiClonalCN.

results

Formatted TitanCNA results output from outputTitanResults.

filename

Path of the file to write the TitanCNA results.

posteriorProbs

Logical TRUE to include the posterior marginal probabilities in printing to filename.

subcloneProfiles

Logical TRUE to include the subclone profiles to the output data.frame. Currently, this only works for 1 or 2 clonal clusters.

S_Dbw.scale

The S_Dbw validity index can be adjusted to account for differences between datasets. SDbw.scale can be used to penalize the S_Dbw dens.bw component. The default is 1.

S_Dbw.method

Compute S_Dbw validity index using Halkidi or Tong method. See computeSDbwIndex.

Details

outputModelParameters outputs to a file with the estimated TITAN model parameters and model selection index. Each row contains information regarding different parameters:

1) Normal contamination estimate - proportion of normal content in the sample; tumour content is 1 minus this number

2) Average tumour ploidy estimate - average number of estimated copies in the genome; 2 represents diploid

3) Clonal cluster cellular prevalence - Z denotes the number of clonal clusters; each value (space-delimited) following are the cellular prevalence estimates for each cluster. Cellular prevalence here is defined as the proportion of tumour sample that does contain the aberrant genotype.

4) Genotype binomial means for clonal cluster Z - set of 21 binomial estimated parameters for each specified cluster

5) Genotype Gaussian means for clonal cluster Z - set of 21 Gaussian estimated means for each specified cluster

6) Genotype Gaussian variance - set of 21 Gaussian estimated variances; variances are shared for across all clusters

7) Number of iterations - number of EM iterations needed for convergence

8) Log likelihood - complete data log-likelihood for current cluster run

9) S_Dbw dens.bw - density component of S_Dbw index; see computeSDbwIndex

10) S_Dbw scat - scatter component of S_Dbw index; see computeSDbwIndex

11) S_Dbw validity index - used for model selection where the run with optimal number of clusters based on lowest S_Dbw index. This value is slightly modified from that computed from computeSDbwIndex. It is computed as S_Dbw= S_Dbw.scale * dens.bw + scat

12) S_Dbw dens.bw, scat, validity index is computed for LogRatio and AllelicRatio datatypes, as well as the combination of Both. For Both, the values are summed for both datatypes.

outputTitanResults outputs a file that has the similar format described in ‘Value’ section.

Value

outputTitanResults also returns a data.frame, where each row corresponds to a position in the analysis, and with the following columns:

Chr

character denoting chromosome number. ChrX and ChrY uses ‘X’ and ‘Y’.

Position

genomic coordinate

RefCount

number of reads matching the reference base

NRefCount

number of reads matching the non-reference base

Depth

total read depth at the position

AllelicRatio

RefCount/Depth

LogRatio

log2 ratio between normalized tumour and normal read depths

CopyNumber

predicted TitanCNA copy number

TITANstate

internal state number used by TitanCNA; see Reference

TITANcall

interpretable TitanCNA state; string (HOMD,DLOH,HET,NLOH,ALOH,ASCNA,BCNA,UBCNA); See Reference

ClonalCluster

predicted TitanCNA clonal cluster; lower cluster numbers represent clusters with higher cellular prevalence

CellularPrevalence

proportion of tumour cells containing event; not to be mistaken as proportion of sample (including normal)

If subcloneProfiles is set to TRUE, then the subclone profiles will be appended to the output data.frame.

Subclone1.CopyNumber

Integer copy number for Subclone 1.

Subclone1.TITANcall

States for Subclone 1

Subclone1.Prevalence

The cellular prevalence of Subclone 1, or sometimes referred to as the subclone fraction.

outputModelParameters returns a list containing the S_Dbw model selection:

dens.bw
scat
S_Dbw

S_Dbw.scale * dens.bw + scat

Author(s)

Gavin Ha <gavinha@gmail.com>

References

Ha, G., Roth, A., Khattra, J., Ho, J., Yap, D., Prentice, L. M., Melnyk, N., McPherson, A., Bashashati, A., Laks, E., Biele, J., Ding, J., Le, A., Rosner, J., Shumansky, K., Marra, M. A., Huntsman, D. G., McAlpine, J. N., Aparicio, S. A. J. R., and Shah, S. P. (2014). TITAN: Inference of copy number architectures in clonal cell populations from tumour whole genome sequence data. Genome Research, 24: 1881-1893. (PMID: 25060187)

See Also

runEMclonalCN, viterbiClonalCN, computeSDbwIndex

Examples

data(EMresults)

#### COMPUTE OPTIMAL STATE PATH USING VITERBI ####
optimalPath <- viterbiClonalCN(data, convergeParams)

#### FORMAT RESULTS ####
results <- outputTitanResults(data, convergeParams, optimalPath, 
                              filename = NULL, posteriorProbs = FALSE,
                              subcloneProfiles = TRUE)

#### OUTPUT RESULTS TO FILE ####
outparam <- paste("cluster2_params.txt", sep = "")
outputModelParameters(convergeParams, results, outparam)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(TitanCNA)
Loading required package: foreach
Loading required package: IRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/TitanCNA/TitanCNA-output.Rd_%03d_medium.png", width=480, height=480)
> ### Name: Formatting and output of Titan results
> ### Title: Formatting and printing 'TitanCNA' results.
> ### Aliases: outputTitanResults outputModelParameters
> ### Keywords: IO manip
> 
> ### ** Examples
> 
> data(EMresults)
> 
> #### COMPUTE OPTIMAL STATE PATH USING VITERBI ####
> optimalPath <- viterbiClonalCN(data, convergeParams)
Warning message:
executing %dopar% sequentially: no parallel backend registered 
> 
> #### FORMAT RESULTS ####
> results <- outputTitanResults(data, convergeParams, optimalPath, 
+                               filename = NULL, posteriorProbs = FALSE,
+                               subcloneProfiles = TRUE)
> 
> #### OUTPUT RESULTS TO FILE ####
> outparam <- paste("cluster2_params.txt", sep = "")
> outputModelParameters(convergeParams, results, outparam)
titan: Saving parameters to cluster2_params.txt
$dens.bw
[1] 0.251006

$scat
[1] 0.1996239

$S_Dbw
[1] 0.4506299

> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>