This function allows the detection of breakpoints in genomic profiles
obtained by array CGH technology and affects a status (gain, normal
or lost) to each clone.
Type of algorithm used to smooth LogRatio by a
piecewise constant function. Choose either lawsglad,
haarseg, aws or
laws in aws package.
bandwidth
Set the maximal bandwidth hmax in the
aws or laws functions in aws package. For
example, if bandwidth=10 then the hmax value is set
to 10*X_N where X_N is the position of the last clone.
round
The smoothing results are rounded or not depending on
the round argument. The round value is passed to the
argument digits of the round function.
model
Determines the distribution type of the LogRatio. Keep
always the model as "Gaussian" (see laws in aws package).
lkern
Determines the location kernel to be used (see aws or
laws in aws package).
qlambda
Determines the scale parameter for the
stochastic penalty (see aws or
laws in aws package)
base
If TRUE, the position of clone is the physical position on
the chromosome, otherwise the rank position is used.
sigma
Value to be passed to either argument sigma2
ofaws function or shape of
laws (see aws package). If NULL, sigma is calculated from
the data.
lambdabreak
Penalty term (λ') used during the
Optimization of the number of breakpoints step.
lambdacluster
Penalty term (λ*) used during the MSHR
clustering by chromosome step.
lambdaclusterGen
Penalty term (λ*) used during the HCSR
clustering throughout the genome step.
type
Type of kernel function used in the penalty term during the Optimization of the
number of breakpoints step, the MSHR
clustering by chromosome step and the HCSR
clustering throughout the genome step.
param
Parameter of kernel used in the penalty term.
alpha
Risk alpha used for the Outlier detection step.
msize
The outliers MAD are calculated on regions with a
cardinality greater or equal to msize.
method
The agglomeration method to be used during the MSHR
clustering by chromosome and the HCSR
clustering throughout the genome clustering steps.
nmax
Maximum number of clusters (N*max) allowed during
the the MSHR
clustering by chromosome and the HCSR
clustering throughout the genome clustering steps.
assignGNLOut
If FALSE the status (gain/normal/loss) is
not assigned for outliers.
The principles of the GLAD algorithm:
First, the detection of breakpoints is based on the estimation of a
piecewise constant function with the Adaptive Weights Smoothing (AWS)
procedure (Polzehl and Spokoiny, 2002). Alternatively, it is possible
to use the HaarSeg algorithm (Ben-Yaacov and Eldar, Bioinformatics, 2008).
Then, a procedure based on penalyzed maximum likelihood optimizes the number of
breakpoints and removes the undesirable breakpoints.
Finally, based on the regions previously identified, a two-step
unsupervised classification (MSHR
clustering by chromosome and the HCSR
clustering throughout the genome) with model selection criteria
allows a status to be assigned for each region (gain, loss or
normal).
Main parameters to be tuned:
qlambda
if you want the smoothing to fit some very local effect, choose a smaller qlambda.
bandwidth
choose a bandwidth not to small otherwise you
will have a lot of little discontinuities.
lambdabreak
The higher the parameter is, the higher the number
of undesirable breakpoints is.
lambdacluster
The higher the parameter is, the higher is
the
number of the regions
within a chromosome which belong to the same cluster.
lambdaclusterGen
More the parameter is high more the
regions over the whole genome are supposed to belong to the same cluster.
Value
An object of class "profileCGH" with the following attributes:
profileValues:
a data.frame with the following added information:
SmoothingThe smoothing values correspond to the median
of each MSHR (i.e. Region).
BreakpointsThe last position of a region with identical amount
of DNA is flagged by 1 otherwise it is 0. Note that during the
"Optimization of the number of breakpoints" step, removed
breakpoints are flagged by -1.
RegionEach position between two breakpoints are labelled the
same way with an integer value starting from one. The label is
incremented by one when a new breakpoint is found or when moving to
the next chromosome. The variable region is what we call MSHR.
LevelEach position with equal smoothing value is labelled the
same way with an integer value starting from one. The label is
incremented by one when a new level is found or when moving to the
next chromosome.
OutliersAwsEach AWS outliers are flagged -1 or 1
otherwise it is 0.
OutliersMadEach MAD outliers are flagged -1 (if it is
in the α/2 lower tail of the distribution) or 1 (if it is
in the α/2 upper tail of the distribution)
otherwise it is 0.
OutliersTotOutliersAws + OutliersMad.
ZoneChrClusters identified after MSHR (i.e. Region)
clustering by chromosome.
ZoneGenClusters identified after HCSR clustering throughout the
genome.
ZoneGNLStatus of each clone : Gain is coded by 1, Loss by -1 and
Normal by 0.
BkpInfo:
the data.frame attribute BkpInfo which gives
the list of breakpoints:
PosOrderThe rank position of each clone on the genome.
PosBaseThe base position of each clone on the genome.
ChromosomeChromosome name.
SigmaC:
the data.frame attribute SigmaC gives the estimation of the LogRatio standard-deviation for each chromosome:
ChromosomeChromosome name.
ValueThe estimation is based on the Inter Quartile Range.
Note
People interested in tools dealing with array CGH analysis can
visit our web-page http://bioinfo.curie.fr.
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(GLAD)
######################################################################################
Have fun with GLAD
For smoothing it is possible to use either
the AWS algorithm (Polzehl and Spokoiny, 2002,
or the HaarSeg algorithm (Ben-Yaacov and Eldar, Bioinformatics, 2008,
If you use the package with AWS, please cite:
Hupe et al. (Bioinformatics, 2004, and Polzehl and Spokoiny (2002,
If you use the package with HaarSeg, please cite:
Hupe et al. (Bioinformatics, 2004, and (Ben-Yaacov and Eldar, Bioinformatics, 2008,
For fast computation it is recommanded to use
the daglad function with smoothfunc=haarseg
######################################################################################
New options are available in daglad: see help for details.
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GLAD/glad.Rd_%03d_medium.png", width=480, height=480)
> ### Encoding: UTF-8
>
> ### Name: glad
> ### Title: Analysis of array CGH data
> ### Aliases: glad glad.profileCGH
> ### Keywords: models
>
> ### ** Examples
>
>
> data(snijders)
>
> ### Creation of "profileCGH" object
> gm13330$Clone <- gm13330$BAC
> profileCGH <- as.profileCGH(gm13330)
>
>
>
> ###########################################################
> ###
> ### glad function as described in Hupé et al. (2004)
> ###
> ###########################################################
>
>
> res <- glad(profileCGH, mediancenter=FALSE,
+ smoothfunc="lawsglad", bandwidth=10, round=1.5,
+ model="Gaussian", lkern="Exponential", qlambda=0.999,
+ base=FALSE,
+ lambdabreak=8, lambdacluster=8, lambdaclusterGen=40,
+ type="tricubic", param=c(d=6),
+ alpha=0.001, msize=5,
+ method="centroid", nmax=8,
+ verbose=FALSE)
[1] "Smoothing for each Chromosome"
[1] "Optimization of the Breakpoints"
[1] "Results Preparation"
>
> ### cytoband data to plot chromosomes
> data(cytoband)
>
> ### Genomic profile on the whole genome
> plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing",
+ main="Breakpoints detection: GLAD analysis", cytoband = cytoband)
>
> ###Genomic profile for chromosome 1
> plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1,
+ Smoothing="Smoothing", main="Chromosome 1: GLAD analysis", cytoband = cytoband)
>
> ### The standard-deviation of LogRatio are:
> res$SigmaC
Chromosome Value
1 1 0.07788535
2 2 0.06850393
3 3 0.06506617
4 4 0.07644038
5 5 0.08342531
6 6 0.07988090
7 7 0.08071331
8 8 0.07094549
9 9 0.06955477
10 10 0.06717754
11 11 0.06593160
12 12 0.07461950
13 13 0.06104140
14 14 0.08988050
15 15 0.09300629
16 16 0.06872091
17 17 0.07717453
18 18 0.08011845
19 19 0.09860941
20 20 0.07678643
21 21 0.06994379
22 22 0.05492636
23 23 0.05234015
>
> ### The list of breakpoints is:
> res$BkpInfo
PosOrder PosBase Clone Chromosome
82 91 156276 RP11-234m03 1
429 468 173943 RP11-272O03 4
>
>
>
>
>
>
> dev.off()
null device
1
>