Segment genomic signals using the CBS method of the DNAcopy package.
This is a convenient low-level wrapper for the DNAcopy::segment()
method. It is intended to be applied to a sample at the time.
For more details on the Circular Binary Segmentation (CBS) method
see [1,2].
A numericvector of J genomic signals to be segmented.
chromosome
Optional numericvector of length J, specifying
the chromosome of each loci. If a scalar, it is expanded to
a vector of length J.
x
Optional numericvector of J genomic locations.
If NULL, index locations 1:J are used.
index
An optional integervector of length J specifying
the genomewide indices of the loci.
w
Optional numericvector in [0,1] of J weights.
undo
A non-negative numeric. If greater than zero, then
arguments undo.splits="sdundo" and undo.SD=undo
are passed to DNAcopy::segment().
In the special case when undo is +Inf, the segmentation
result will not contain any changepoints (in addition to what
is specified by argument knownSegments).
avg
A character string specifying how to calculating
segment mean levels after change points have been
identified.
...
Additional arguments passed to the DNAcopy::segment()
segmentation function.
joinSegments
If TRUE, there are no gaps between neighboring
segments.
If FALSE, the boundaries of a segment are defined by the support
that the loci in the segments provides, i.e. there exist a locus
at each end point of each segment. This also means that there
is a gap between any neighboring segments, unless the change point
is in the middle of multiple loci with the same position.
The latter is what DNAcopy::segment() returns.
knownSegments
Optional data.frame specifying
non-overlapping known segments. These segments must
not share loci. See findLargeGaps() and gapsToSegments().
seed
An (optional) integer specifying the random seed to be
set before calling the segmentation method. The random seed is
set to its original state when exiting. If NULL, it is not set.
verbose
See Verbose.
Details
Internally segment of DNAcopy is used to
segment the signals.
This segmentation method support weighted segmentation.
Value
Returns a CBS object.
Reproducibility
The DNAcopy::segment() implementation of CBS uses approximation
through random sampling for some estimates. Because of this,
repeated calls using the same signals may result in slightly
different results, unless the random seed is set/fixed.
Missing and non-finite values
Signals may contain missing values (NA or NaN), but not
infinite values (+/-Inf). Loci with missing-value signals
are preserved and keep in the result.
Likewise, genomic positions may contain missing values.
However, if they do, such loci are silently excluded before
performing the segmentation, and are not kept in the results.
The mapping between the input locus-level data and ditto of
the result can be inferred from the index column of
the locus-level data of the result.
None of the input data may have infinite values,
i.e. -Inf or +Inf. If so, an informative error is thrown.
Author(s)
Henrik Bengtsson
References
[1] A.B. Olshen, E.S. Venkatraman (aka Venkatraman E. Seshan), R. Lucito and M. Wigler, Circular binary segmentation for the analysis of array-based DNA copy number data, Biostatistics, 2004
[2] E.S. Venkatraman and A.B. Olshen, A faster circular binary segmentation algorithm for the analysis of array CGH data, Bioinformatics, 2007
See Also
To segment allele-specific tumor copy-number signals from a tumor
with a matched normal, see segmentByPairedPSCBS().
For the same without a matched normal,
see segmentByNonPairedPSCBS().
It is also possible to prune change points after segmentation (with
identical results) using
pruneBySdUndo().