Last data update: 2014.03.03

R: CBS segmentation
segmentationCBSR Documentation

CBS segmentation

Description

The default segmentation function. This function is called via the fun.segmentation argument of runAbsoluteCN. The arguments are passed via args.segmentation.

Usage

segmentationCBS(normal, tumor, log.ratio, plot.cnv, 
    coverage.cutoff, sampleid = sampleid, exon.weight.file = NULL, 
    alpha = 0.005, vcf = NULL, tumor.id.in.vcf = 1, 
    verbose = TRUE)

Arguments

normal

GATK coverage data for normal sample.

tumor

GATK coverage data for tumor sample.

log.ratio

Copy number log-ratios, one for each exon in coverage file.

plot.cnv

Segmentation plots.

coverage.cutoff

Minimum coverage in both normal and tumor.

sampleid

Sample id, used in output files.

exon.weight.file

Can be used to assign weights to exons.

alpha

Alpha value for CBS, see documentation for the segment function.

vcf

Optional VCF object with germline allelic ratios.

tumor.id.in.vcf

Id of tumor in case multiple samples are stored in VCF.

verbose

Verbose output.

Value

A list with elements

seg

The segmentation.

size

The size of all segments (in base pairs).

Author(s)

Markus Riester

Examples

gatk.normal.file <- system.file("extdata", "example_normal.txt", 
    package="PureCN")
gatk.tumor.file <- system.file("extdata", "example_tumor.txt", 
    package="PureCN")
vcf.file <- system.file("extdata", "example_vcf.vcf", 
    package="PureCN")
gc.gene.file <- system.file("extdata", "example_gc.gene.file.txt", 
    package="PureCN")

# speed-up the runAbsoluteCN by using the stored grid-search.
# (purecn.example.output$candidates).
data(purecn.example.output)

# The max.candidate.solutions argument is set to a very low value only to 
# speed-up this example. This is not a good idea for real samples.
ret <-runAbsoluteCN(gatk.normal.file=gatk.normal.file, 
    gatk.tumor.file=gatk.tumor.file, 
    vcf.file=vcf.file, sampleid='Sample1', gc.gene.file=gc.gene.file, 
    candidates=purecn.example.output$candidates, max.candidate.solutions=2,
    fun.segmentation=segmentationCBS, args.segmentation=list(alpha=0.001))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(PureCN)
Loading required package: DNAcopy
Loading required package: VariantAnnotation
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector

Attaching package: 'VariantAnnotation'

The following object is masked from 'package:base':

    tabulate

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/PureCN/segmentationCBS.Rd_%03d_medium.png", width=480, height=480)
> ### Name: segmentationCBS
> ### Title: CBS segmentation
> ### Aliases: segmentationCBS
> 
> ### ** Examples
> 
> gatk.normal.file <- system.file("extdata", "example_normal.txt", 
+     package="PureCN")
> gatk.tumor.file <- system.file("extdata", "example_tumor.txt", 
+     package="PureCN")
> vcf.file <- system.file("extdata", "example_vcf.vcf", 
+     package="PureCN")
> gc.gene.file <- system.file("extdata", "example_gc.gene.file.txt", 
+     package="PureCN")
> 
> # speed-up the runAbsoluteCN by using the stored grid-search.
> # (purecn.example.output$candidates).
> data(purecn.example.output)
> 
> # The max.candidate.solutions argument is set to a very low value only to 
> # speed-up this example. This is not a good idea for real samples.
> ret <-runAbsoluteCN(gatk.normal.file=gatk.normal.file, 
+     gatk.tumor.file=gatk.tumor.file, 
+     vcf.file=vcf.file, sampleid='Sample1', gc.gene.file=gc.gene.file, 
+     candidates=purecn.example.output$candidates, max.candidate.solutions=2,
+     fun.segmentation=segmentationCBS, args.segmentation=list(alpha=0.001))
Loading GATK coverage files...
Sex of sample: ?
Removing 7 small exons.
Removing 15 low/high GC exons.
Loading VCF...
Assuming LIB-02240e4 is tumor in VCF file.
Found 2331 variants in VCF file.
Removing 0 non heterozygous (in matched normal) germline SNPs.
Removing 62 SNPs with AF < 0.03 or AF >= 0.97 or less than 3 supporting reads or depth < 15.
Found SOMATIC annotation in VCF. Setting somatic prior probabilities for somatic variants to 0.999 or to 1e-04 otherwise.
Segmenting data...
Removing 267 low coverage exons.
Analyzing: Sample1 
Setting multi-figure configuration
Call:
segment(x = smoothed.CNA.obj, alpha = alpha, undo.splits = undo.splits, 
    undo.SD = sdundo, verbose = ifelse(verbose, 1, 0))

        ID chrom   loc.start   loc.end num.mark seg.mean
1  Sample1     1   1216044.5 248722319      934   0.3272
2  Sample1     2   1638036.0 231775198      708  -0.2446
3  Sample1     2 236403412.5 241737117       93   0.2648
4  Sample1     3  11832017.5 149470198      436   0.2969
5  Sample1     3 150264604.0 151542537       18   1.6288
6  Sample1     3 151545662.5 195938114       80   0.3072
7  Sample1     4    843512.0  70146580      133   0.3118
8  Sample1     4  75673305.5  77700146       39  -0.1997
9  Sample1     4  81188156.5 108831608       44  -1.2019
10 Sample1     4 110635592.5 186611721      139   0.2606
11 Sample1     5    442758.0  10761154       38   0.3255
12 Sample1     5  38869183.5 180687408      360  -0.2858
13 Sample1     6   2623865.0 144219759      293  -0.2464
14 Sample1     6 144224235.5 170862274      117   0.3914
15 Sample1     7    938572.5  14028656       56   0.3075
16 Sample1     7  23286512.0  23313764       11   1.7239
17 Sample1     7  26232167.0 156469232      310  -0.2223
18 Sample1     8   6264200.0 145537891      337  -0.2747
19 Sample1     9    214953.0 139440208      371  -0.2226
20 Sample1    10    323391.5  72576624      233   0.2804
21 Sample1    10  72604313.0  72645621       16  -0.2807
22 Sample1    10  72648289.5  75000741       30   0.2522
23 Sample1    10  82300671.5  82403794        7  -1.2827
24 Sample1    10  85982056.5  88768888       12   1.0633
25 Sample1    10  91066426.0  99790218       36  -1.1192
26 Sample1    10 102283640.5 102289566        5   1.2394
27 Sample1    10 103541552.5 121214530       73  -1.2211
28 Sample1    10 124591880.5 134121207       24   0.1328
29 Sample1    11   2291272.0  34378690      106  -0.2981
30 Sample1    11  36614927.0  44081430       15   0.4322
31 Sample1    11  46880700.5  57317514       71  -0.2062
32 Sample1    11  57947383.5  65172438       77   0.2423
33 Sample1    11  65340286.5  66335024       33  -0.3169
34 Sample1    11  66335504.5  71209486       26  -1.2807
35 Sample1    11  71847083.0  82549523       78   0.3824
36 Sample1    11  82550385.5 134134828      151  -0.2000
37 Sample1    12   1740561.0  99126272      373  -1.1729
38 Sample1    12 113537804.0 124428836      212   0.2830
39 Sample1    13  20398996.5 114438189      322  -1.2399
40 Sample1    14  20757846.0 101349088      319   0.2821
41 Sample1    15  27216709.5  99926272      394   0.3090
42 Sample1    16    230533.5  31123514      224   0.2804
43 Sample1    16  56899289.0  56947247       25   0.9000
44 Sample1    16  57507348.5  57722319       20  -0.2426
45 Sample1    16  66918983.0  90038049      183   0.3186
46 Sample1    17   1399145.5  76832320      621   0.3029
47 Sample1    17  77768896.0  80559278       24  -0.2538
48 Sample1    18   5394737.5  71825664      133  -0.2692
49 Sample1    19   1481982.5  57301280      498   0.3127
50 Sample1    20    207959.0  62610776      330   0.2918
51 Sample1    21  11098731.0  47865219      176   0.3147
52 Sample1    22  17443695.0  45996257      148   0.3021
53 Sample1    22  50703417.0  51066096       20  -0.4576
Mean standard deviation of log-ratios: 0.4
Optimizing purity and ploidy. Will take a minute or two...
Local optima: 0.65/1.6, 0.5/2.4, 0.9/2.4, 0.5/2
Testing local optimum at purity 0.65 and total ploidy 1.6.
Fitting SNVs for purity 0.65 and tumor ploidy 1.38.
Analyzing: Sample1 
Optimized purity: 0.65
Testing local optimum at purity 0.5 and total ploidy 2.4.
Fitting SNVs for purity 0.48 and tumor ploidy 2.77.
Analyzing: Sample1 
Optimized purity: 0.48
Testing local optimum at purity 0.9 and total ploidy 2.4.
Fitting SNVs for purity 0.95 and tumor ploidy 2.38.
Analyzing: Sample1 
Optimized purity: 0.95
Testing local optimum at purity 0.5 and total ploidy 2.
Fitting SNVs for purity 0.48 and tumor ploidy 2.77.
Analyzing: Sample1 
Optimized purity: 0.48
Remember, posterior probabilities assume a correct SCNA fit.
Warning message:
In runAbsoluteCN(gatk.normal.file = gatk.normal.file, gatk.tumor.file = gatk.tumor.file,  :
  Too many candidate solutions! Trying optimizing the top candidates.
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>