R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Analysis of array CGH data

glad	R Documentation

Analysis of array CGH data

Description

This function allows the detection of breakpoints in genomic profiles obtained by array CGH technology and affects a status (gain, normal or lost) to each clone.

Usage

## S3 method for class 'profileCGH'
glad(profileCGH, mediancenter=FALSE,
                smoothfunc="lawsglad", bandwidth=10, round=1.5,
                model="Gaussian", lkern="Exponential", qlambda=0.999,
                base=FALSE, sigma,
                lambdabreak=8, lambdacluster=8, lambdaclusterGen=40,
                type="tricubic", param=c(d=6),
                alpha=0.001, msize=5,
                method="centroid", nmax=8, assignGNLOut=TRUE,
                breaksFdrQ = 0.0001, haarStartLevel = 1, haarEndLevel = 5,
                verbose=FALSE, ...)

Arguments

`profileCGH`	Object of class `profileCGH`
`mediancenter`	If `TRUE`, LogRatio are centered on their median.
`smoothfunc`	Type of algorithm used to smooth `LogRatio` by a piecewise constant function. Choose either `lawsglad`, `haarseg`, `aws` or `laws` in aws package.
`bandwidth`	Set the maximal bandwidth `hmax` in the `aws` or `laws` functions in aws package. For example, if `bandwidth=10` then the `hmax` value is set to 10*X_N where X_N is the position of the last clone.
`round`	The smoothing results are rounded or not depending on the `round` argument. The `round` value is passed to the argument `digits` of the `round` function.
`model`	Determines the distribution type of the LogRatio. Keep always the model as "Gaussian" (see `laws` in aws package).
`lkern`	Determines the location kernel to be used (see `aws` or `laws` in aws package).
`qlambda`	Determines the scale parameter for the stochastic penalty (see `aws` or `laws` in aws package)
`base`	If `TRUE`, the position of clone is the physical position on the chromosome, otherwise the rank position is used.
`sigma`	Value to be passed to either argument `sigma2` of`aws` function or `shape` of `laws` (see aws package). If `NULL`, sigma is calculated from the data.
`lambdabreak`	Penalty term (λ') used during the Optimization of the number of breakpoints step.
`lambdacluster`	Penalty term (λ) used during the MSHR clustering by chromosome* step.
`lambdaclusterGen`	Penalty term (λ) used during the HCSR clustering throughout the genome* step.
`type`	Type of kernel function used in the penalty term during the Optimization of the number of breakpoints step, the MSHR clustering by chromosome step and the HCSR clustering throughout the genome step.
`param`	Parameter of kernel used in the penalty term.
`alpha`	Risk alpha used for the Outlier detection step.
`msize`	The outliers MAD are calculated on regions with a cardinality greater or equal to msize.
`method`	The agglomeration method to be used during the MSHR clustering by chromosome and the HCSR clustering throughout the genome clustering steps.
`nmax`	Maximum number of clusters (Nmax) allowed during the the MSHR clustering by chromosome* and the HCSR clustering throughout the genome clustering steps.
`assignGNLOut`	If `FALSE` the status (gain/normal/loss) is not assigned for outliers.
`breaksFdrQ`	breaksFdrQ for HaarSeg algorithm.
`haarStartLevel`	haarStartLevel for HaarSeg algorithm.
`haarEndLevel`	for HaarSeg algorithm.
`verbose`	If `TRUE` some information are printed
`...`	...

Details

The function glad implements the methodology which is described in the article: Analysis of array CGH data: from signal ratio to gain and loss of DNA regions (HupÃ© et al., Bioinformatics, 2004).

The principles of the GLAD algorithm: First, the detection of breakpoints is based on the estimation of a piecewise constant function with the Adaptive Weights Smoothing (AWS) procedure (Polzehl and Spokoiny, 2002). Alternatively, it is possible to use the HaarSeg algorithm (Ben-Yaacov and Eldar, Bioinformatics, 2008). Then, a procedure based on penalyzed maximum likelihood optimizes the number of breakpoints and removes the undesirable breakpoints. Finally, based on the regions previously identified, a two-step unsupervised classification (MSHR clustering by chromosome and the HCSR clustering throughout the genome) with model selection criteria allows a status to be assigned for each region (gain, loss or normal).

Main parameters to be tuned:

`qlambda`	if you want the smoothing to fit some very local effect, choose a smaller `qlambda`.
`bandwidth`	choose a bandwidth not to small otherwise you will have a lot of little discontinuities.
`lambdabreak`	The higher the parameter is, the higher the number of undesirable breakpoints is.
`lambdacluster`	The higher the parameter is, the higher is the number of the regions within a chromosome which belong to the same cluster.
`lambdaclusterGen`	More the parameter is high more the regions over the whole genome are supposed to belong to the same cluster.

Value

	An object of class "profileCGH" with the following attributes:
`profileValues:`	a data.frame with the following added information: SmoothingThe smoothing values correspond to the median of each MSHR (i.e. `Region`). BreakpointsThe last position of a region with identical amount of DNA is flagged by 1 otherwise it is 0. Note that during the "Optimization of the number of breakpoints" step, removed breakpoints are flagged by -1. RegionEach position between two breakpoints are labelled the same way with an integer value starting from one. The label is incremented by one when a new breakpoint is found or when moving to the next chromosome. The variable `region` is what we call MSHR. LevelEach position with equal smoothing value is labelled the same way with an integer value starting from one. The label is incremented by one when a new level is found or when moving to the next chromosome. OutliersAwsEach AWS outliers are flagged -1 or 1 otherwise it is 0. OutliersMadEach MAD outliers are flagged -1 (if it is in the α/2 lower tail of the distribution) or 1 (if it is in the α/2 upper tail of the distribution) otherwise it is 0. OutliersTotOutliersAws + OutliersMad. ZoneChrClusters identified after MSHR (i.e. `Region`) clustering by chromosome. ZoneGenClusters identified after HCSR clustering throughout the genome. ZoneGNLStatus of each clone : Gain is coded by 1, Loss by -1 and Normal by 0.
`BkpInfo:`	the data.frame attribute `BkpInfo` which gives the list of breakpoints: PosOrderThe rank position of each clone on the genome. PosBaseThe base position of each clone on the genome. ChromosomeChromosome name.
`SigmaC:`	the data.frame attribute `SigmaC` gives the estimation of the LogRatio standard-deviation for each chromosome: ChromosomeChromosome name. ValueThe estimation is based on the Inter Quartile Range.

Note

People interested in tools dealing with array CGH analysis can visit our web-page http://bioinfo.curie.fr.

Author(s)

References

Polzehl and Spokoiny (WIAS-Preprint 787, 2002)Local likelihood modelling by adaptive weights smoothing.
Ben-Yaacov and Eldar (Bioinformatics, 2008)A fast and flexible method for the segmentation of aCGH data.

Examples


data(snijders)

### Creation of "profileCGH" object
gm13330$Clone <- gm13330$BAC
profileCGH <- as.profileCGH(gm13330)



###########################################################
###
###  glad function as described in HupÃ© et al. (2004)
###
###########################################################


res <- glad(profileCGH, mediancenter=FALSE,
                smoothfunc="lawsglad", bandwidth=10, round=1.5,
                model="Gaussian", lkern="Exponential", qlambda=0.999,
                base=FALSE,
                lambdabreak=8, lambdacluster=8, lambdaclusterGen=40,
                type="tricubic", param=c(d=6),
                alpha=0.001, msize=5,
                method="centroid", nmax=8,
                verbose=FALSE)

### cytoband data to plot chromosomes
data(cytoband)

### Genomic profile on the whole genome
plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing",
main="Breakpoints detection: GLAD analysis", cytoband = cytoband)

###Genomic profile for chromosome 1
plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1,
Smoothing="Smoothing", main="Chromosome 1: GLAD analysis", cytoband = cytoband)

### The standard-deviation of LogRatio are:
res$SigmaC

### The list of breakpoints is:
res$BkpInfo

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GLAD)

######################################################################################

Have fun with GLAD

For smoothing it is possible to use either
the AWS algorithm (Polzehl and Spokoiny, 2002,
or the HaarSeg algorithm (Ben-Yaacov and Eldar, Bioinformatics,  2008,

If you use the package with AWS, please cite:
Hupe et al. (Bioinformatics, 2004, and Polzehl and Spokoiny (2002,

If you use the package with HaarSeg, please cite:
Hupe et al. (Bioinformatics, 2004, and (Ben-Yaacov and Eldar, Bioinformatics, 2008,

For fast computation it is recommanded to use
the daglad function with smoothfunc=haarseg

######################################################################################

New options are available in daglad: see help for details.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GLAD/glad.Rd_%03d_medium.png", width=480, height=480)
> ### Encoding: UTF-8
> 
> ### Name: glad
> ### Title: Analysis of array CGH data
> ### Aliases: glad glad.profileCGH
> ### Keywords: models
> 
> ### ** Examples
> 
> 
> data(snijders)
> 
> ### Creation of "profileCGH" object
> gm13330$Clone <- gm13330$BAC
> profileCGH <- as.profileCGH(gm13330)
> 
> 
> 
> ###########################################################
> ###
> ###  glad function as described in Hupé et al. (2004)
> ###
> ###########################################################
> 
> 
> res <- glad(profileCGH, mediancenter=FALSE,
+                 smoothfunc="lawsglad", bandwidth=10, round=1.5,
+                 model="Gaussian", lkern="Exponential", qlambda=0.999,
+                 base=FALSE,
+                 lambdabreak=8, lambdacluster=8, lambdaclusterGen=40,
+                 type="tricubic", param=c(d=6),
+                 alpha=0.001, msize=5,
+                 method="centroid", nmax=8,
+                 verbose=FALSE)
[1] "Smoothing for each Chromosome"
[1] "Optimization of the Breakpoints"
[1] "Results Preparation"
> 
> ### cytoband data to plot chromosomes
> data(cytoband)
> 
> ### Genomic profile on the whole genome
> plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing",
+ main="Breakpoints detection: GLAD analysis", cytoband = cytoband)
> 
> ###Genomic profile for chromosome 1
> plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1,
+ Smoothing="Smoothing", main="Chromosome 1: GLAD analysis", cytoband = cytoband)
> 
> ### The standard-deviation of LogRatio are:
> res$SigmaC
   Chromosome      Value
1           1 0.07788535
2           2 0.06850393
3           3 0.06506617
4           4 0.07644038
5           5 0.08342531
6           6 0.07988090
7           7 0.08071331
8           8 0.07094549
9           9 0.06955477
10         10 0.06717754
11         11 0.06593160
12         12 0.07461950
13         13 0.06104140
14         14 0.08988050
15         15 0.09300629
16         16 0.06872091
17         17 0.07717453
18         18 0.08011845
19         19 0.09860941
20         20 0.07678643
21         21 0.06994379
22         22 0.05492636
23         23 0.05234015
> 
> ### The list of breakpoints is:
> res$BkpInfo
    PosOrder PosBase       Clone Chromosome
82        91  156276 RP11-234m03          1
429      468  173943 RP11-272O03          4
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>