R: Segmentation and classification of copy number profiles
HMMcopy Segmentation
R Documentation
Segmentation and classification of copy number profiles
Description
Takes in a copy number profile and segments it into predicted regions of
equal copy number, and assigns a biologically motivated copy number state
to each region using a Hidden Markov Model (HMM). This is an extension
to the HMM described in Shah et al., 2006.
If none is provided, will generate a reasonable set of parameters based on the
input data, which can optionally be returned for inspection and manual
adjustment by setting 'getparam' to TRUE.
See Details for more information on parameters.
A matrix with parameters values in columns for each state in rows:
e
Probability of extending a segment, increase to lengthen segments,
decrase to shorten segments. Range: (0, 1)
strength
Strength of initial e suggestion, reducing allows e to
change, increasing makes e undefiable. Range: [0, Inf)
mu
Suggested median for copy numbers in state, change to readjust
classification of states. Range: (-Inf, Inf)
lambda
Suggested precision (inversed variance) for copy numbers
in state, increase to reduce overlap between states. Range: [0, Inf)
nu
Suggested degree of freedom between states, increase to reduce
overlap between states. Range: [0, Inf)
kappa
Suggested distribution of states. Should sum to 1.
m
Optimal value for mu, difference from corresponding mu value
determines elasticity of the mu value. i.e. Set to identical value
as mu if you don't want mu to move much.
eta
Mobility of mu, increase to allow more movement. Range: [0, Inf)
gamma
Prior shape on lambda, gamma distribution. Effects flexibility
of lambda.
S
Prior scale on lambda, gamma distribution. Effects flexibility of
lambda.
autosomes
Array of LOGICAL values corresponding to the 'chr' argument where an
element is TRUE if the chromosome is an autosome, otherwise FALSE. If not
provided, will automatically set the following chromosomes to false:
"X", "Y", "23", "24", "chrX", chrY", "M", "MT", "chrM".
maxiter
The maximum number of iterations allows for the Maximum-Expectation algorithm,
reduce to decrease running time at the expense of robustness.
getparam
If TRUE, generates and returns parameters without running segmentation.
verbose
Set to FALSE if messages are not desired
Details
HMMsegment is a two stage algorithm that first runs an
Expectation-Maximization algorithm to find the optimal set of parameters
based on suggested parameter inputs, and allowed flexibilities. After
iteratively finding the optimal parameters, the actual segmentation of the
data is conducted with the Viterbi algorithm, finally output segmented
states. This is an extension to the hidden Markov model described in Shah
et al., 2006.
Parameters are divided into two main categories:
Initial parameters: e, mu, lambda, nu, kappa
Flexibility parameters: strength, m, eta, gamma, S
Where initial parameters are treated as starting suggestions for the
parameter optimization algorithm, and flexibility parameters (hyperparameters)
define how much the initial parameters are allowed to deviate during the
search for the optimal parameters.
With a good copy number dataset, in theory, given enough flexibility, the
algorithm should always find a similar set of optimal parameters regardless
of initial parameters (although running times will vary).
If for some reason you wish to manually set the parameters for the
final segmentation process, one should tune all flexibility parameters to
minimal values. For example, if you wish to increase the length of segments,
you could set:
Which suggests that segments should be very long, and gives minimal to
non-existant flexibility to your suggestion.
See vignette for diagrammed example:
vignette("HMMcopy")
Value
A list object containing multiple values, although in practice only the
state assigned to each copy number value in 'states' and the segments of
non-overlapping states in 'segs' are of interest.
By default, there are 6 states, which in a diploid sample corresponds to the
following chromosomal copies and biological state:
1
<=0 copies, homozogous deletion
2
1 copy, heterozogous deletion
3
2 copies, neutral
4
3 copies, gain
5
4 copies, amplification
6
>=5 copies, high level amplification
The full list of output is
as follows:
states
The state assigned to each copy number value
segs
Non-overlapping segments and medians of each segment
mus
Optimal median of of copy numbers in state
lambda
Optimal precision of copy numbers in state
pi
Optimal state distribution
loglik
The likelihood values of each EM algorithm iteration
rho
Posterior marginals (responsibilities) for each position and state
Author(s)
Daniel Lai, Gavin Ha, Sohrab Shah
References
Sohrab P Shah, Xiang Xuan, Ron J DeLeeuw, Mehrnoush Khojasteh, Wan L Lam, Raymond Ng, and
Kevin P Murphy. Integrating copy number polymorphisms into array cgh analysis using a robust hmm.
Bioinformatics, 22(14):e431-9, Jul 2006.
See Also
correctReadcount, to correct the readcounts prior to
segmentation and classification for better results.
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(HMMcopy)
Loading required package: IRanges
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: geneplotter
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: lattice
Loading required package: annotate
Loading required package: AnnotationDbi
Loading required package: XML
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/HMMcopy/HMMsegment.Rd_%03d_medium.png", width=480, height=480)
> ### Name: HMMcopy Segmentation
> ### Title: Segmentation and classification of copy number profiles
> ### Aliases: HMMsegment
> ### Keywords: IO
>
> ### ** Examples
>
>
> data(tumour) # Load tumour_copy
> tumour_segments <- HMMsegment(tumour_copy)
Initialization
EM iteration: 1 Log likelihood: -Inf
Expectation
Maximization
EM iteration: 2 Log likelihood: 752.739516090235
Expectation
Maximization
EM iteration: 3 Log likelihood: 1828.75626798331
Expectation
Maximization
EM iteration: 4 Log likelihood: 2031.03705088824
Expectation
Maximization
EM iteration: 5 Log likelihood: 2078.6968179617
Expectation
Maximization
EM iteration: 6 Log likelihood: 2092.52391956819
Expectation
Maximization
EM iteration: 7 Log likelihood: 2098.16682904567
Expectation
Maximization
EM iteration: 8 Log likelihood: 2100.64426469102
Expectation
Maximization
EM iteration: 9 Log likelihood: 2101.2658084141
Expectation
Maximization
Re-calculating latest responsibilties for output
Optimal parameters found, segmenting and classifying
>
>
>
>
>
>
> dev.off()
null device
1
>