Last data update: 2014.03.03

R: Segment total copy numbers and allele B fractions using the...
segmentByNonPairedPSCBSR Documentation

Segment total copy numbers and allele B fractions using the Non-paired PSCBS method

Description

Segment total copy numbers and allele B fractions using the Non-paired PSCBS method [1]. This method does not requires matched normals. This is a low-level segmentation method. It is intended to be applied to one tumor sample at the time.

Usage

## Default S3 method:
segmentByNonPairedPSCBS(CT, betaT, ..., flavor=c("tcn", "tcn&dh", "tcn,dh",
  "sqrt(tcn),dh", "sqrt(tcn)&dh"), tauA=NA, tauB=1 - tauA, verbose=FALSE)

Arguments

CT

A numeric vector of J tumor total copy number (TCN) ratios in [0,+Inf) (due to noise, small negative values are also allowed). The TCN ratios are typically scaled such that copy-neutral diploid loci have a mean of two.

betaT

A numeric vector of J tumor allele B fractions (BAFs) in [0,1] (due to noise, values may be slightly outside as well) or NA for non-polymorphic loci.

...

Additional arguments passed to segmentByPairedPSCBS().

flavor

A character specifying what type of segmentation and calling algorithm to be used.

tauA, tauB

Lower and upper thresholds (tauA < tauB for calling SNPs heterozygous based on the tumor allele B fractions (betaT). If NA, then they are estimates from data.

verbose

See Verbose.

Details

Internally segmentByPairedPSCBS() is used for segmentation. This segmentation method does not support weights.

Value

Returns the segmentation results as a NonPairedPSCBS object.

Reproducibility

The "DNAcopy::segment" implementation of CBS uses approximation through random sampling for some estimates. Because of this, repeated calls using the same signals may result in slightly different results, unless the random seed is set/fixed.

Whole-genome segmentation is preferred

Although it is possible to segment each chromosome independently using Paired PSCBS, we strongly recommend to segment whole-genome (TCN,BAF) data at once. The reason for this is that downstream CN-state calling methods, such as the AB and the LOH callers, performs much better on whole-genome data. In fact, they may fail to provide valid calls if done chromsome by chromosome.

Missing and non-finite values

The total copy number signals as well as any optional positions must not contain missing values, i.e. NAs or NaNs. If there are any, an informative error is thrown. Allele B fractions may contain missing values, because such are interpreted as representing non-polymorphic loci.

None of the input signals may have infinite values, i.e. -Inf or +Inf. If so, an informative error is thrown.

Non-Paired PSCBS with known genotypes

If allele B fractions for the matched normal (betaN) are not available, but genotypes (muN) are, then it is possible to run Paired PSCBS. See segmentByPairedPSCBS() for details.

Author(s)

Henrik Bengtsson

References

[1] A.B. Olshen, H. Bengtsson, P. Neuvial, P.T. Spellman, R.A. Olshen, V.E. Seshan, Parent-specific copy number in paired tumor-normal studies using circular binary segmentation, Bioinformatics, 2011
[2] H. Bengtsson, P. Neuvial and T.P. Speed, TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays, BMC Bioinformatics, 2010

See Also

To segment paired tumor-normal total copy numbers and allele B fractions, see segmentByPairedPSCBS().

To segment total copy numbers, or any other unimodal signals, see segmentByCBS().

Examples

verbose <- R.utils::Arguments$getVerbose(-10*interactive(), timestamp=TRUE)

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Load SNP microarray data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
data <- PSCBS::exampleData("paired.chr01")
str(data)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Paired PSCBS segmentation
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Drop single-locus outliers
dataS <- dropSegmentationOutliers(data)

# Speed up example by segmenting fewer loci
dataS <- dataS[seq(from=1, to=nrow(data), by=20),]

str(dataS)

R.oo::attachLocally(dataS)

# Non-Paired PSCBS segmentation
fit <- segmentByNonPairedPSCBS(CT, betaT=betaT,
                            chromosome=chromosome, x=x,
                            seed=0xBEEF, verbose=verbose)
print(fit)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Bootstrap segment level estimates
# (used by the AB caller, which, if skipped here,
#  will do it automatically)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fit <- bootstrapTCNandDHByRegion(fit, B=100, verbose=verbose)
print(fit)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calling segments in allelic balance (AB)
# NOTE: Ideally, this should be done on whole-genome data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Explicitly estimate the threshold in DH for calling AB
# (which be done by default by the caller, if skipped here)
deltaAB <- estimateDeltaAB(fit, flavor="qq(DH)", verbose=verbose)
print(deltaAB)

fit <- callAB(fit, delta=deltaAB, verbose=verbose)
print(fit)


# Even if not explicitly specified, the estimated
# threshold parameter is returned by the caller
stopifnot(fit$params$deltaAB == deltaAB)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calling segments in loss-of-heterozygosity (LOH)
# NOTE: Ideally, this should be done on whole-genome data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Explicitly estimate the threshold in C1 for calling LOH
# (which be done by default by the caller, if skipped here)
deltaLOH <- estimateDeltaLOH(fit, flavor="minC1|nonAB", verbose=verbose)
print(deltaLOH)

fit <- callLOH(fit, delta=deltaLOH, verbose=verbose)
print(fit)
plotTracks(fit)

# Even if not explicitly specified, the estimated
# threshold parameter is returned by the caller
stopifnot(fit$params$deltaLOH == deltaLOH)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(PSCBS)
PSCBS v0.61.0 (2016-02-03) successfully loaded. See ?PSCBS for help.

Attaching package: 'PSCBS'

The following objects are masked from 'package:base':

    append, load

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/PSCBS/segmentByNonPairedPSCBS.Rd_%03d_medium.png", width=480, height=480)
> ### Name: segmentByNonPairedPSCBS
> ### Title: Segment total copy numbers and allele B fractions using the
> ###   Non-paired PSCBS method
> ### Aliases: segmentByNonPairedPSCBS.default segmentByNonPairedPSCBS
> ###   segmentByNonPairedPSCBS.data.frame
> ###   segmentByNonPairedPSCBS.PairedPSCBS segmentByNonPairedPSCBS
> ### Keywords: IO
> 
> ### ** Examples
> 
> verbose <- R.utils::Arguments$getVerbose(-10*interactive(), timestamp=TRUE)
> 
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # Load SNP microarray data
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> data <- PSCBS::exampleData("paired.chr01")
> str(data)
'data.frame':	73346 obs. of  6 variables:
 $ chromosome: int  1 1 1 1 1 1 1 1 1 1 ...
 $ x         : int  1145994 2224111 2319424 2543484 2926730 2941694 3084986 3155127 3292731 3695086 ...
 $ CT        : num  1.625 1.071 1.406 1.18 0.856 ...
 $ betaT     : num  0.757 0.771 0.834 0.778 0.229 ...
 $ CN        : num  2.36 2.13 2.59 1.93 1.71 ...
 $ betaN     : num  0.827 0.875 0.887 0.884 0.103 ...
> 
> 
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # Paired PSCBS segmentation
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # Drop single-locus outliers
> dataS <- dropSegmentationOutliers(data)
> 
> # Speed up example by segmenting fewer loci
> dataS <- dataS[seq(from=1, to=nrow(data), by=20),]
> 
> str(dataS)
'data.frame':	3668 obs. of  6 variables:
 $ chromosome: int  1 1 1 1 1 1 1 1 1 1 ...
 $ x         : int  1145994 4276892 5034491 6266412 8418532 11211748 13928296 14370144 15014887 16589707 ...
 $ CT        : num  1.63 1.16 1.35 1.39 1.55 ...
 $ betaT     : num  0.7574 0.0576 0.8391 0.7917 0.8141 ...
 $ CN        : num  2.36 2.32 2.33 2.97 2.31 ...
 $ betaN     : num  0.8274 0.0421 0.9406 0.869 0.6045 ...
> 
> R.oo::attachLocally(dataS)
> 
> # Non-Paired PSCBS segmentation
> fit <- segmentByNonPairedPSCBS(CT, betaT=betaT,
+                             chromosome=chromosome, x=x,
+                             seed=0xBEEF, verbose=verbose)
> print(fit)
  chromosome tcnId dhId     start       end tcnNbrOfLoci tcnMean tcnNbrOfSNPs
1          1     1    1    554484 143663981         1880  1.3916          778
2          1     2    1 143663981 185240536          671  2.0925          275
3          1     3    1 185240536 246679946         1111  2.6545          417
  tcnNbrOfHets dhNbrOfLoci    dhMean    c1Mean    c2Mean
1          778         778 0.4009957 0.4167872 0.9748128
2          275         275 0.2344486 0.8009582 1.2915418
3          417         417 0.2819897 0.9529792 1.7015208
> 
> 
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # Bootstrap segment level estimates
> # (used by the AB caller, which, if skipped here,
> #  will do it automatically)
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> fit <- bootstrapTCNandDHByRegion(fit, B=100, verbose=verbose)
> print(fit)
  chromosome tcnId dhId     start       end tcnNbrOfLoci tcnMean tcnNbrOfSNPs
1          1     1    1    554484 143663981         1880  1.3916          778
2          1     2    1 143663981 185240536          671  2.0925          275
3          1     3    1 185240536 246679946         1111  2.6545          417
  tcnNbrOfHets dhNbrOfLoci    dhMean    c1Mean    c2Mean
1          778         778 0.4009957 0.4167872 0.9748128
2          275         275 0.2344486 0.8009582 1.2915418
3          417         417 0.2819897 0.9529792 1.7015208
> 
> 
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # Calling segments in allelic balance (AB)
> # NOTE: Ideally, this should be done on whole-genome data
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # Explicitly estimate the threshold in DH for calling AB
> # (which be done by default by the caller, if skipped here)
> deltaAB <- estimateDeltaAB(fit, flavor="qq(DH)", verbose=verbose)
> print(deltaAB)
[1] 0.426037
> 
> fit <- callAB(fit, delta=deltaAB, verbose=verbose)
> print(fit)
  chromosome tcnId dhId     start       end tcnNbrOfLoci tcnMean tcnNbrOfSNPs
1          1     1    1    554484 143663981         1880  1.3916          778
2          1     2    1 143663981 185240536          671  2.0925          275
3          1     3    1 185240536 246679946         1111  2.6545          417
  tcnNbrOfHets dhNbrOfLoci    dhMean    c1Mean    c2Mean abCall
1          778         778 0.4009957 0.4167872 0.9748128   TRUE
2          275         275 0.2344486 0.8009582 1.2915418   TRUE
3          417         417 0.2819897 0.9529792 1.7015208   TRUE
> 
> 
> # Even if not explicitly specified, the estimated
> # threshold parameter is returned by the caller
> stopifnot(fit$params$deltaAB == deltaAB)
> 
> 
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # Calling segments in loss-of-heterozygosity (LOH)
> # NOTE: Ideally, this should be done on whole-genome data
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # Explicitly estimate the threshold in C1 for calling LOH
> # (which be done by default by the caller, if skipped here)
> deltaLOH <- estimateDeltaLOH(fit, flavor="minC1|nonAB", verbose=verbose)
Warning message:
In estimateDeltaLOHByMinC1ForNonAB.PairedPSCBS(this, ..., verbose = verbose) :
  All 3 segments are in allelic balance. Cannot estimate DeltaLOH, which requires that at least one segment must be in allelic imbalance.  Returning -Inf instead.
> print(deltaLOH)
[1] -Inf
> 
> fit <- callLOH(fit, delta=deltaLOH, verbose=verbose)
> print(fit)
  chromosome tcnId dhId     start       end tcnNbrOfLoci tcnMean tcnNbrOfSNPs
1          1     1    1    554484 143663981         1880  1.3916          778
2          1     2    1 143663981 185240536          671  2.0925          275
3          1     3    1 185240536 246679946         1111  2.6545          417
  tcnNbrOfHets dhNbrOfLoci    dhMean    c1Mean    c2Mean abCall lohCall
1          778         778 0.4009957 0.4167872 0.9748128   TRUE   FALSE
2          275         275 0.2344486 0.8009582 1.2915418   TRUE   FALSE
3          417         417 0.2819897 0.9529792 1.7015208   TRUE   FALSE
> plotTracks(fit)
> 
> # Even if not explicitly specified, the estimated
> # threshold parameter is returned by the caller
> stopifnot(fit$params$deltaLOH == deltaLOH)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>