This function allows the detection of breakpoints in genomic profiles
obtained by array CGH technology and affects a status (gain, normal
or lost) to each clone.
If TRUE, a smoothing step over the whole
genome is performed and a "clustering throughout the genome" allows
to identify a cluster corresponding to the Normal DNA level. The threshold used in the daglad
function (deltaN, forceGL, amplicon, deletion) and then
compared to the median of this cluster.
normalrefcenter
If TRUE, the LogRatio are centered
through the median of the cluster identified during the genomestep.
OnlySmoothing
If TRUE, only segmentation is performed without optimization of breakpoints and calling.
OnlyOptimCall
If TRUE, the user can provide data which have been already segmented. In this case, profileCGH$profileValues must contain a field with the name "Smoothing". The daglad function skip the smoothing step but bith the optimization of breakpoints and calling are performed.
smoothfunc
Type of algorithm used to smooth LogRatio by a
piecewise constant function. Choose either lawsglad,
haarseg, aws or
laws (aws package).
lkern
lkern determines the location kernel to be used
(see laws in aws package for details).
model
model determines the distribution type of LogRatio
(see laws in aws package for details).
qlambda
qlambda determines the scale parameter qlambda for the
stochastic penalty (see laws in aws package for details).
base
If TRUE, the position of clone is the physical position onto
the chromosome, otherwise the rank position is used.
sigma
Value to be passed to either argument sigma2
of aws (see aws package) function or shape of
laws (see aws package). If NULL, sigma is calculated from
the data.
bandwidth
Set the maximal bandwidth hmax in the
aws or laws functions in aws package. For
example, if bandwidth=10 then the hmax value is set
to 10*X_N where X_N is the position of the last clone.
round
The smoothing results of either aws
or laws functions (in aws package) are rounded or not depending on
the round argument. The round value is passed to the
argument digits of the round function.
lambdabreak
Penalty term (λ') used during the
"Optimization of the number of breakpoints" step.
lambdaclusterGen
Penalty term (λ*) used during the "clustering throughout the genome" step.
param
Parameter of kernel used in the penalty term.
alpha
Risk alpha used for the "Outlier detection" step.
msize
The outliers MAD are calculated on regions with a
cardinality greater or equal to msize.
method
The agglomeration method to be used during the "clustering throughout the genome" steps.
nmin
Minimum number of clusters (N*max) allowed during
the "clustering throughout the genome" clustering step.
nmax
Maximum number of clusters (N*max) allowed during
the "clustering throughout the genome" clustering step.
region.size
The breakpoints which define regions with a number of probes lower or equal to
this value are discared.
amplicon
Level (and outliers) with a smoothing value (log-ratio
value) greater than this threshold are consider as amplicon. Note
that first, the data are centered on the normal reference value
computed during the "clustering throughout the genome" step.
deletion
Level (and outliers) with a smoothing value (log-ratio
value) lower than this threshold are consider as deletion. Note
that first, the data are centered on the normal reference value
computed during the "clustering throughout the genome" step.
deltaN
Region with smoothing values in between the interval
[-deltaN,+deltaN] are supposed to be normal.
forceGL
Level with smoothing value greater (lower) than
rangeGL[1] (rangeGL[2]) are considered as gain
(lost). Note that first, the data are centered on the normal reference value
computed during the "clustering throughout the genome" step.
nbsigma
For each breakpoints, a weight is calculated which is
a function of absolute value of the Gap between the smoothing values
of the two consecutive regions. Weight = 1-
kernelpen(abs(Gap),param=c(d=nbsigma*Sigma)) where Sigma is the
standard deviation of the LogRatio.
MinBkpWeight
Breakpoints which GNLchange==0 and
Weight less than MinBkpWeight are discarded.
DelBkpInAmp
If TRUE, the breakpoints identified inside
amplicon regions are deleted. For amplicon, the log-ratio values are highly
variable which lead to identification of false positive breakpoints.
DelBkpInDel
If TRUE, the breakpoints identified inside
deletion regions are deleted. For deletion, the log-ratio values are highly
variable which lead to identification of false positive breakpoints.
CheckBkpPos
If TRUE, the accuracy position of each
breakpoints is checked.
assignGNLOut
If FALSE the status (gain/normal/loss) is
not assigned for outliers.
breaksFdrQ
breaksFdrQ for HaarSeg algorithm.
haarStartLevel
haarStartLevel for
HaarSeg algorithm.
haarEndLevel
haarEndLevel for HaarSeg algorithm.
weights.name
The name of the fields which contains the weights
used for the haarseg algorithm. By default, the value is set to NULL
meaning that all the observations have the same weights. If provided,
the field must contain positive values.
An object of class "profileCGH" with the following
attributes:
profileValues
is a data.frame with the following information:
SmoothingThe smoothing values correspond to the median
of each Level
BreakpointsThe last position of a region with identical amount
of DNA is flagged by 1 otherwise it is 0. Note that during the
"Optimization of the number of breakpoints" step, removed
breakpoints are flagged by -1.
LevelEach position with equal smoothing value are labelled the
same way with an integer value starting from one. The label is
incremented by one when a new level occurs or when moving to the
next chromosome.
OutliersAwsEach AWS outliers are flagged -1 (if it is
in the α/2 lower tail of the distribution) or 1 (if it is
in the α/2 upper tail of the distribution)
otherwise it is 0.
OutliersMadEach MAD outliers are flagged -1 (if it is
in the α/2 lower tail of the distribution) or 1 (if it is
in the α/2 upper tail of the distribution)
otherwise it is 0.
OutliersTotOutliersAws + OutliersMad.
NormalRefClusters which have been used to set the normal
reference during the "clustering throughout the genome" step are
code by 0. Note that if genomestep=FALSE, all the value
are set to 0.
ZoneGNLStatus of each clone: Gain is coded by 1, Loss by -1,
Amplicon by 2, deletion by -10 and
Normal by 0.
BkpInfo
is a data.frame sum up the information for each
breakpoint:
ChromosomeChromosome name.
SmoothingSmoothing value for the breakpoint.
Gapabsolute value of the gap between the smoothing values
of the two consecutive regions.
SigmaThe estimation of the standard-deviation of the chromosome.
ZoneGNLStatus of the level where is the breakpoint.
GNLchangeTakes the value 1 if the ZoneGNL of the two
consecutive regions are different.
LogRatioTest over Reference log-ratio.
NormalRef
If genomestep=TRUE and
normalrefcenter=FALSE, then NormalRef is the median of the cluster which has been used to set the normal
reference during the "clustering throughout the genome"
step. Otherwise NormalRef is 0.
Note
People interested in tools dealing with array CGH analysis can
visit our web-page http://bioinfo.curie.fr.
Polzehl and Spokoiny (WIAS-Preprint 787, 2002): Local likelihood modelling by adaptive weights smoothing.
Ben-Yaacov and Eldar (Bioinformatics, 2008): A fast and flexible method for the segmentation of aCGH data.
See Also
glad.
Examples
data(snijders)
gm13330$Clone <- gm13330$BAC
profileCGH <- as.profileCGH(gm13330)
###########################################################
###
### daglad function
###
###########################################################
res <- daglad(profileCGH, mediancenter=FALSE, normalrefcenter=FALSE, genomestep=FALSE,
smoothfunc="lawsglad", lkern="Exponential", model="Gaussian",
qlambda=0.999, bandwidth=10, base=FALSE, round=1.5,
lambdabreak=8, lambdaclusterGen=40, param=c(d=6), alpha=0.001, msize=2,
method="centroid", nmin=1, nmax=8,
amplicon=1, deletion=-5, deltaN=0.10, forceGL=c(-0.15,0.15), nbsigma=3,
MinBkpWeight=0.35, CheckBkpPos=TRUE)
### data for cytoband
data(cytoband)
### Genomic profile on the whole genome
plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing",
main="Breakpoints detection: DAGLAD analysis", cytoband = cytoband)
###Genomic profile for chromosome 1
plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1,
Smoothing="Smoothing", main="Chromosome 1: DAGLAD analysis", cytoband = cytoband)
### The standard-deviation of LogRatio are:
res$SigmaC
### The list of breakpoints is:
res$BkpInfo
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(GLAD)
######################################################################################
Have fun with GLAD
For smoothing it is possible to use either
the AWS algorithm (Polzehl and Spokoiny, 2002,
or the HaarSeg algorithm (Ben-Yaacov and Eldar, Bioinformatics, 2008,
If you use the package with AWS, please cite:
Hupe et al. (Bioinformatics, 2004, and Polzehl and Spokoiny (2002,
If you use the package with HaarSeg, please cite:
Hupe et al. (Bioinformatics, 2004, and (Ben-Yaacov and Eldar, Bioinformatics, 2008,
For fast computation it is recommanded to use
the daglad function with smoothfunc=haarseg
######################################################################################
New options are available in daglad: see help for details.
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GLAD/daglad.Rd_%03d_medium.png", width=480, height=480)
> ### Name: daglad
> ### Title: Analysis of array CGH data
> ### Aliases: daglad daglad.profileCGH
> ### Keywords: models
>
> ### ** Examples
>
>
> data(snijders)
> gm13330$Clone <- gm13330$BAC
> profileCGH <- as.profileCGH(gm13330)
>
>
> ###########################################################
> ###
> ### daglad function
> ###
> ###########################################################
>
>
> res <- daglad(profileCGH, mediancenter=FALSE, normalrefcenter=FALSE, genomestep=FALSE,
+ smoothfunc="lawsglad", lkern="Exponential", model="Gaussian",
+ qlambda=0.999, bandwidth=10, base=FALSE, round=1.5,
+ lambdabreak=8, lambdaclusterGen=40, param=c(d=6), alpha=0.001, msize=2,
+ method="centroid", nmin=1, nmax=8,
+ amplicon=1, deletion=-5, deltaN=0.10, forceGL=c(-0.15,0.15), nbsigma=3,
+ MinBkpWeight=0.35, CheckBkpPos=TRUE)
[1] "Smoothing for each Chromosome"
[1] "Optimization of the Breakpoints and DNA copy number calling"
[1] "Check Breakpoints Position"
[1] "Results Preparation"
>
>
> ### data for cytoband
> data(cytoband)
>
> ### Genomic profile on the whole genome
> plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing",
+ main="Breakpoints detection: DAGLAD analysis", cytoband = cytoband)
>
>
>
> ###Genomic profile for chromosome 1
> plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1,
+ Smoothing="Smoothing", main="Chromosome 1: DAGLAD analysis", cytoband = cytoband)
>
>
> ### The standard-deviation of LogRatio are:
> res$SigmaC
Chromosome Value
1 1 0.07788535
2 2 0.06850393
3 3 0.06506617
4 4 0.07644038
5 5 0.08342531
6 6 0.07988090
7 7 0.08071331
8 8 0.07094549
9 9 0.06955477
10 10 0.06717754
11 11 0.06593160
12 12 0.07461950
13 13 0.06104140
14 14 0.08988050
15 15 0.09300629
16 16 0.06872091
17 17 0.07717453
18 18 0.08011845
19 19 0.09860941
20 20 0.07678643
21 21 0.06994379
22 22 0.05492636
23 23 0.05234015
>
>
> ### The list of breakpoints is:
> res$BkpInfo
Clone PosBase Chromosome Weight GNLchange Smoothing Gap
1 RP11-234m03 156276 1 1.0000000 1 0.0221935 0.4976845
2 RP11-272O03 173943 4 1.0000000 1 -0.0714645 0.7681605
3 RP11-23E05 23609 11 0.9537355 1 0.0083130 0.1705440
4 RP11-193O09 33075 11 0.8867582 1 -0.1622310 0.1586670
5 RP11-187A08 39389 11 0.6153643 0 -0.0035640 0.1282735
6 RP11-12d19 46795 11 0.9733512 0 -0.1318375 0.1757320
7 RP11-7H07 83201 11 0.8566989 0 0.0438945 0.1545125
8 RP11-2F21 120000 11 0.9471314 0 -0.1106180 0.1690835
Sigma LogRatio ZoneGNL PosOrder
1 0.07788535 -0.095741 0 91
2 0.07644038 -0.205004 0 468
3 0.06593160 -0.043762 0 1342
4 0.06593160 -0.164159 -1 1355
5 0.06593160 0.000318 0 1370
6 0.06593160 -0.056248 0 1378
7 0.06593160 -0.011897 0 1436
8 0.06593160 -0.113474 0 1465
>
>
>
>
>
>
>
> dev.off()
null device
1
>