R: Segment total copy numbers and allele B fractions using the...
segmentByPairedPSCBS
R Documentation
Segment total copy numbers and allele B fractions using the Paired PSCBS method
Description
Segment total copy numbers and allele B fractions using the Paired PSCBS method [1].
This method requires matched normals.
This is a low-level segmentation method.
It is intended to be applied to one tumor-normal sample at the time.
A numericvector of J tumor total copy number (TCN)
ratios in [0,+Inf) (due to noise, small negative values are
also allowed). The TCN ratios are typically scaled such that
copy-neutral diploid loci have a mean of two.
thetaT, thetaN
(alternative) As an alternative to specifying
tumor TCN ratios relative to the match normal by
argument CT, on may specify total tumor and normal
signals seperately, in which case the TCN ratios CT are
calculated as CT = 2*thetaT/thetaN.
betaT
A numericvector of J tumor allele B fractions (BAFs)
in [0,1] (due to noise, values may be slightly outside as well)
or NA for non-polymorphic loci.
betaN
A numericvector of J matched normal BAFs in [0,1]
(due to noise, values may be slightly outside as well) or NA
for non-polymorphic loci.
muN
An optional numericvector of J genotype calls in
{0,1/2,1} for AA, AB, and BB, respectively,
and NA for non-polymorphic loci.
If not given, they are estimated from the normal BAFs using
callNaiveGenotypes as described in [2].
rho
(alternative to betaT and betaN/muN)
A numericvector of J decrease-of-heterozygosity signals (DHs)
in [0,1] (due to noise, values may be slightly larger than one
as well). By definition, DH should be NA for homozygous loci
and for non-polymorphic loci.
chromosome
(Optional) An integer scalar (or a vector of length J),
which can be used to specify which chromosome each locus belongs to
in case multiple chromosomes are segments.
This argument is also used for annotation purposes.
x
Optional numericvector of J genomic locations.
If NULL, index locations 1:J are used.
alphaTCN, alphaDH
The significance levels for segmenting total
copy numbers (TCNs) and decrease-in-heterozygosity signals (DHs),
respectively.
undoTCN, undoDH
Non-negative numerics. If greater than 0,
then a cleanup of segmentions post segmentation is done.
See argument undo of segmentByCBS() for more
details.
avgTCN, avgDH
A character string specifying how to calculating
segment mean levels after change points have been
identified.
...
Additional arguments passed to segmentByCBS().
flavor
A character specifying what type of segmentation and
calling algorithm to be used.
tbn
If TRUE, betaT is normalized before segmentation
using the TumorBoost method [2], otherwise not.
preserveScale
Passed to normalizeTumorBoost,
which is only called if tbn is TRUE.
joinSegments
If TRUE, there are no gaps between neighboring
segments.
If FALSE, the boundaries of a segment are defined by the support
that the loci in the segments provides, i.e. there exist a locus
at each end point of each segment. This also means that there
is a gap between any neighboring segments, unless the change point
is in the middle of multiple loci with the same position.
The latter is what DNAcopy::segment() returns.
knownSegments
Optional data.frame specifying
non-overlapping known segments. These segments must
not share loci. See findLargeGaps() and gapsToSegments().
dropMissingCT
If TRUE, loci for which 'CT' is missing
are dropped, otherwise not.
seed
An (optional) integer specifying the random seed to be
set before calling the segmentation method. The random seed is
set to its original state when exiting. If NULL, it is not set.
verbose
See Verbose.
Details
Internally segmentByCBS() is used for segmentation.
The Paired PSCBS segmentation method does not support weights.
Value
Returns the segmentation results as a PairedPSCBS object.
Reproducibility
The "DNAcopy::segment" implementation of CBS uses approximation
through random sampling for some estimates. Because of this,
repeated calls using the same signals may result in slightly
different results, unless the random seed is set/fixed.
Whole-genome segmentation is preferred
Although it is possible to segment each chromosome independently
using Paired PSCBS, we strongly recommend to segment whole-genome
(TCN,BAF) data at once. The reason for this is that downstream
CN-state calling methods, such as the AB and the LOH callers,
performs much better on whole-genome data. In fact, they may
fail to provide valid calls if done chromsome by chromosome.
Missing and non-finite values
The total copy number signals as well as any optional positions
must not contain missing values, i.e. NAs or NaNs.
If there are any, an informative error is thrown.
Allele B fractions may contain missing values, because such are
interpreted as representing non-polymorphic loci.
None of the input signals may have infinite values, i.e. -Inf or +Inf.
If so, an informative error is thrown.
Paired PSCBS with only genotypes
If allele B fractions for the matched normal (betaN) are
not available, but genotypes (muN) are, then it is possible
to run a version of Paired PSCBS where TumorBoost normalization
of the tumor allele B fractions is skipped. In order for this
to work, argument tbn must be set to FALSE.
Author(s)
Henrik Bengtsson
References
[1] A.B. Olshen, H. Bengtsson, P. Neuvial, P.T. Spellman, R.A. Olshen, V.E. Seshan, Parent-specific copy number in paired tumor-normal studies using circular binary segmentation, Bioinformatics, 2011
[2] H. Bengtsson, P. Neuvial and T.P. Speed, TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays, BMC Bioinformatics, 2010
See Also
Internally, callNaiveGenotypes is used to
call naive genotypes, normalizeTumorBoost is
used for TumorBoost normalization, and segmentByCBS() is used
to segment TCN and DH separately.
To segment tumor total copy numbers and allele B fractions
without a matched normal, see segmentByNonPairedPSCBS().
To segment total copy-numbers, or any other unimodal signals,
see segmentByCBS().
Examples
verbose <- R.utils::Arguments$getVerbose(-10*interactive(), timestamp=TRUE)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Load SNP microarray data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
data <- PSCBS::exampleData("paired.chr01")
str(data)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Paired PSCBS segmentation
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Drop single-locus outliers
dataS <- dropSegmentationOutliers(data)
# Speed up example by segmenting fewer loci
dataS <- dataS[seq(from=1, to=nrow(data), by=10),]
str(dataS)
R.oo::attachLocally(dataS)
# Paired PSCBS segmentation
fit <- segmentByPairedPSCBS(CT, betaT=betaT, betaN=betaN,
chromosome=chromosome, x=x,
seed=0xBEEF, verbose=verbose)
print(fit)
# Plot results
plotTracks(fit)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Bootstrap segment level estimates
# (used by the AB caller, which, if skipped here,
# will do it automatically)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fit <- bootstrapTCNandDHByRegion(fit, B=100, verbose=verbose)
print(fit)
plotTracks(fit)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calling segments in allelic balance (AB)
# NOTE: Ideally, this should be done on whole-genome data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Explicitly estimate the threshold in DH for calling AB
# (which be done by default by the caller, if skipped here)
deltaAB <- estimateDeltaAB(fit, flavor="qq(DH)", verbose=verbose)
print(deltaAB)
## [1] 0.1657131
fit <- callAB(fit, delta=deltaAB, verbose=verbose)
print(fit)
plotTracks(fit)
# Even if not explicitly specified, the estimated
# threshold parameter is returned by the caller
stopifnot(fit$params$deltaAB == deltaAB)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calling segments in loss-of-heterozygosity (LOH)
# NOTE: Ideally, this should be done on whole-genome data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Explicitly estimate the threshold in C1 for calling LOH
# (which be done by default by the caller, if skipped here)
deltaLOH <- estimateDeltaLOH(fit, flavor="minC1|nonAB", verbose=verbose)
print(deltaLOH)
## [1] 0.625175
fit <- callLOH(fit, delta=deltaLOH, verbose=verbose)
print(fit)
plotTracks(fit)
# Even if not explicitly specified, the estimated
# threshold parameter is returned by the caller
stopifnot(fit$params$deltaLOH == deltaLOH)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(PSCBS)
PSCBS v0.61.0 (2016-02-03) successfully loaded. See ?PSCBS for help.
Attaching package: 'PSCBS'
The following objects are masked from 'package:base':
append, load
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/PSCBS/segmentByPairedPSCBS.Rd_%03d_medium.png", width=480, height=480)
> ### Name: segmentByPairedPSCBS
> ### Title: Segment total copy numbers and allele B fractions using the
> ### Paired PSCBS method
> ### Aliases: segmentByPairedPSCBS.default segmentByPairedPSCBS
> ### segmentByPairedPSCBS.data.frame segmentByPairedPSCBS.PairedPSCBS
> ### segmentByPairedPSCBS
> ### Keywords: IO
>
> ### ** Examples
>
> verbose <- R.utils::Arguments$getVerbose(-10*interactive(), timestamp=TRUE)
>
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # Load SNP microarray data
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> data <- PSCBS::exampleData("paired.chr01")
> str(data)
'data.frame': 73346 obs. of 6 variables:
$ chromosome: int 1 1 1 1 1 1 1 1 1 1 ...
$ x : int 1145994 2224111 2319424 2543484 2926730 2941694 3084986 3155127 3292731 3695086 ...
$ CT : num 1.625 1.071 1.406 1.18 0.856 ...
$ betaT : num 0.757 0.771 0.834 0.778 0.229 ...
$ CN : num 2.36 2.13 2.59 1.93 1.71 ...
$ betaN : num 0.827 0.875 0.887 0.884 0.103 ...
>
>
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # Paired PSCBS segmentation
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # Drop single-locus outliers
> dataS <- dropSegmentationOutliers(data)
>
> # Speed up example by segmenting fewer loci
> dataS <- dataS[seq(from=1, to=nrow(data), by=10),]
>
> str(dataS)
'data.frame': 7335 obs. of 6 variables:
$ chromosome: int 1 1 1 1 1 1 1 1 1 1 ...
$ x : int 1145994 3710825 4276892 4714611 5034491 5534316 6266412 7560566 8418532 10275699 ...
$ CT : num 1.63 1.41 1.16 2.38 1.35 ...
$ betaT : num 0.7574 0.2357 0.0576 0.7228 0.8391 ...
$ CN : num 2.36 2.26 2.32 2.68 2.33 ...
$ betaN : num 0.8274 0.1671 0.0421 0.7877 0.9406 ...
>
> R.oo::attachLocally(dataS)
>
> # Paired PSCBS segmentation
> fit <- segmentByPairedPSCBS(CT, betaT=betaT, betaN=betaN,
+ chromosome=chromosome, x=x,
+ seed=0xBEEF, verbose=verbose)
Warning message:
In segmentByPairedPSCBS.default(CT, betaT = betaT, betaN = betaN, :
Argument 'preserveScale' for segmentByPairedPSCBS() now defaults to FALSE. Prior to PSCBS v0.50.0 (October 2015) the default was TRUE. To avoid this warning, explicitly specify this argument when calling segmentByPairedPSCBS() or make sure to set option 'PSCBS/preserveScale' to either TRUE or FALSE. This warning will be removed in a future version.
> print(fit)
chromosome tcnId dhId start end tcnNbrOfLoci tcnMean tcnNbrOfSNPs
1 1 1 1 554484 143691368 3766 1.3857 1047
2 1 2 1 143691368 185472128 1363 2.0701 388
3 1 3 1 185472128 246679946 2196 2.6482 664
tcnNbrOfHets dhNbrOfLoci dhMean c1Mean c2Mean
1 1047 1047 0.5059 0.3423372 1.043363
2 388 388 0.0976 0.9340291 1.136071
3 664 664 0.2313 1.0178357 1.630364
>
> # Plot results
> plotTracks(fit)
>
>
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # Bootstrap segment level estimates
> # (used by the AB caller, which, if skipped here,
> # will do it automatically)
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> fit <- bootstrapTCNandDHByRegion(fit, B=100, verbose=verbose)
> print(fit)
chromosome tcnId dhId start end tcnNbrOfLoci tcnMean tcnNbrOfSNPs
1 1 1 1 554484 143691368 3766 1.3857 1047
2 1 2 1 143691368 185472128 1363 2.0701 388
3 1 3 1 185472128 246679946 2196 2.6482 664
tcnNbrOfHets dhNbrOfLoci dhMean c1Mean c2Mean
1 1047 1047 0.5059 0.3423372 1.043363
2 388 388 0.0976 0.9340291 1.136071
3 664 664 0.2313 1.0178357 1.630364
> plotTracks(fit)
>
>
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # Calling segments in allelic balance (AB)
> # NOTE: Ideally, this should be done on whole-genome data
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # Explicitly estimate the threshold in DH for calling AB
> # (which be done by default by the caller, if skipped here)
> deltaAB <- estimateDeltaAB(fit, flavor="qq(DH)", verbose=verbose)
> print(deltaAB)
[1] 0.2991486
> ## [1] 0.1657131
>
> fit <- callAB(fit, delta=deltaAB, verbose=verbose)
> print(fit)
chromosome tcnId dhId start end tcnNbrOfLoci tcnMean tcnNbrOfSNPs
1 1 1 1 554484 143691368 3766 1.3857 1047
2 1 2 1 143691368 185472128 1363 2.0701 388
3 1 3 1 185472128 246679946 2196 2.6482 664
tcnNbrOfHets dhNbrOfLoci dhMean c1Mean c2Mean abCall
1 1047 1047 0.5059 0.3423372 1.043363 FALSE
2 388 388 0.0976 0.9340291 1.136071 TRUE
3 664 664 0.2313 1.0178357 1.630364 TRUE
> plotTracks(fit)
>
> # Even if not explicitly specified, the estimated
> # threshold parameter is returned by the caller
> stopifnot(fit$params$deltaAB == deltaAB)
>
>
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # Calling segments in loss-of-heterozygosity (LOH)
> # NOTE: Ideally, this should be done on whole-genome data
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # Explicitly estimate the threshold in C1 for calling LOH
> # (which be done by default by the caller, if skipped here)
> deltaLOH <- estimateDeltaLOH(fit, flavor="minC1|nonAB", verbose=verbose)
> print(deltaLOH)
[1] 0.7799147
> ## [1] 0.625175
>
> fit <- callLOH(fit, delta=deltaLOH, verbose=verbose)
> print(fit)
chromosome tcnId dhId start end tcnNbrOfLoci tcnMean tcnNbrOfSNPs
1 1 1 1 554484 143691368 3766 1.3857 1047
2 1 2 1 143691368 185472128 1363 2.0701 388
3 1 3 1 185472128 246679946 2196 2.6482 664
tcnNbrOfHets dhNbrOfLoci dhMean c1Mean c2Mean abCall lohCall
1 1047 1047 0.5059 0.3423372 1.043363 FALSE TRUE
2 388 388 0.0976 0.9340291 1.136071 TRUE FALSE
3 664 664 0.2313 1.0178357 1.630364 TRUE FALSE
> plotTracks(fit)
>
> # Even if not explicitly specified, the estimated
> # threshold parameter is returned by the caller
> stopifnot(fit$params$deltaLOH == deltaLOH)
>
>
>
>
>
> dev.off()
null device
1
>