R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Analysis of array CGH data

daglad

R Documentation

Analysis of array CGH data

Description

This function allows the detection of breakpoints in genomic profiles obtained by array CGH technology and affects a status (gain, normal or lost) to each clone.

Usage


## S3 method for class 'profileCGH'
daglad(profileCGH, mediancenter=FALSE,
	normalrefcenter=FALSE, genomestep=FALSE,
	OnlySmoothing = FALSE, OnlyOptimCall = FALSE, 
	smoothfunc="lawsglad", lkern="Exponential",
	model="Gaussian", qlambda=0.999, bandwidth=10, 
	sigma=NULL, base=FALSE, round=2,
	lambdabreak=8, lambdaclusterGen=40, param=c(d=6), 
	alpha=0.001, msize=2, method="centroid", nmin=1, nmax=8, region.size=2,
	amplicon=1, deletion=-5, deltaN=0.10,  forceGL=c(-0.15,0.15), 
	nbsigma=3, MinBkpWeight=0.35, DelBkpInAmp=TRUE, DelBkpInDel=TRUE,
	CheckBkpPos=TRUE, assignGNLOut=TRUE,
	breaksFdrQ = 0.0001, haarStartLevel = 1,
	haarEndLevel = 5, weights.name = NULL,
	verbose=FALSE, ...)

Arguments

`profileCGH`	Object of class `profileCGH`
`mediancenter`	If `TRUE`, LogRatio are center on their median.
`genomestep`	If `TRUE`, a smoothing step over the whole genome is performed and a "clustering throughout the genome" allows to identify a cluster corresponding to the Normal DNA level. The threshold used in the `daglad` function (`deltaN, forceGL, amplicon, deletion`) and then compared to the median of this cluster.
`normalrefcenter`	If `TRUE`, the LogRatio are centered through the median of the cluster identified during the `genomestep`.
`OnlySmoothing`	If `TRUE`, only segmentation is performed without optimization of breakpoints and calling.
`OnlyOptimCall`	If `TRUE`, the user can provide data which have been already segmented. In this case, profileCGH$profileValues must contain a field with the name "Smoothing". The daglad function skip the smoothing step but bith the optimization of breakpoints and calling are performed.
`smoothfunc`	Type of algorithm used to smooth `LogRatio` by a piecewise constant function. Choose either `lawsglad`, `haarseg`, `aws` or `laws` (aws package).
`lkern`	lkern determines the location kernel to be used (see `laws` in aws package for details).
`model`	model determines the distribution type of LogRatio (see `laws` in aws package for details).
`qlambda`	qlambda determines the scale parameter qlambda for the stochastic penalty (see `laws` in aws package for details).
`base`	If TRUE, the position of clone is the physical position onto the chromosome, otherwise the rank position is used.
`sigma`	Value to be passed to either argument `sigma2` of `aws` (see aws package) function or `shape` of `laws` (see aws package). If `NULL`, sigma is calculated from the data.
`bandwidth`	Set the maximal bandwidth `hmax` in the `aws` or `laws` functions in aws package. For example, if `bandwidth=10` then the `hmax` value is set to 10*X_N where X_N is the position of the last clone.
`round`	The smoothing results of either `aws` or `laws` functions (in aws package) are rounded or not depending on the `round` argument. The `round` value is passed to the argument `digits` of the `round` function.
`lambdabreak`	Penalty term (λ') used during the "Optimization of the number of breakpoints" step.
`lambdaclusterGen`	Penalty term (λ*) used during the "clustering throughout the genome" step.
`param`	Parameter of kernel used in the penalty term.
`alpha`	Risk alpha used for the "Outlier detection" step.
`msize`	The outliers MAD are calculated on regions with a cardinality greater or equal to msize.
`method`	The agglomeration method to be used during the "clustering throughout the genome" steps.
`nmin`	Minimum number of clusters (N*max) allowed during the "clustering throughout the genome" clustering step.
`nmax`	Maximum number of clusters (N*max) allowed during the "clustering throughout the genome" clustering step.
`region.size`	The breakpoints which define regions with a number of probes lower or equal to this value are discared.
`amplicon`	Level (and outliers) with a smoothing value (log-ratio value) greater than this threshold are consider as amplicon. Note that first, the data are centered on the normal reference value computed during the "clustering throughout the genome" step.
`deletion`	Level (and outliers) with a smoothing value (log-ratio value) lower than this threshold are consider as deletion. Note that first, the data are centered on the normal reference value computed during the "clustering throughout the genome" step.
`deltaN`	Region with smoothing values in between the interval [-deltaN,+deltaN] are supposed to be normal.
`forceGL`	Level with smoothing value greater (lower) than `rangeGL[1]` (`rangeGL[2]`) are considered as gain (lost). Note that first, the data are centered on the normal reference value computed during the "clustering throughout the genome" step.
`nbsigma`	For each breakpoints, a weight is calculated which is a function of absolute value of the Gap between the smoothing values of the two consecutive regions. Weight = 1- kernelpen(abs(Gap),param=c(d=nbsigma*Sigma)) where Sigma is the standard deviation of the LogRatio.
`MinBkpWeight`	Breakpoints which `GNLchange`==0 and `Weight` less than `MinBkpWeight` are discarded.
`DelBkpInAmp`	If TRUE, the breakpoints identified inside amplicon regions are deleted. For amplicon, the log-ratio values are highly variable which lead to identification of false positive breakpoints.
`DelBkpInDel`	If TRUE, the breakpoints identified inside deletion regions are deleted. For deletion, the log-ratio values are highly variable which lead to identification of false positive breakpoints.
`CheckBkpPos`	If `TRUE`, the accuracy position of each breakpoints is checked.
`assignGNLOut`	If `FALSE` the status (gain/normal/loss) is not assigned for outliers.
`breaksFdrQ`	breaksFdrQ for HaarSeg algorithm.
`haarStartLevel`	haarStartLevel for HaarSeg algorithm.
`haarEndLevel`	haarEndLevel for HaarSeg algorithm.
`weights.name`	The name of the fields which contains the weights used for the haarseg algorithm. By default, the value is set to NULL meaning that all the observations have the same weights. If provided, the field must contain positive values.
`verbose`	If `TRUE` some information are printed.
`...`	...

Details

The function daglad implements a slightly modified version of the methodology described in the article : Analysis of array CGH data: from signal ratio to gain and loss of DNA regions (HupÃ© et al., Bioinformatics, 2004). For smoothing, it is possible to use either the AWS algorithm (Polzehl and Spokoiny, 2002) or the HaarSeg algorithm (Ben-Yaacov and Eldar, Bioinformatics, 2008). The daglad function allows to choose some threshold to help the algorithm to identify the status of the genomic regions. The threshodls are given in the following parameters:

deltaN
forceGL
deletion
amplicon

Value

	An object of class "profileCGH" with the following attributes:
`profileValues`	is a data.frame with the following information: SmoothingThe smoothing values correspond to the median of each Level BreakpointsThe last position of a region with identical amount of DNA is flagged by 1 otherwise it is 0. Note that during the "Optimization of the number of breakpoints" step, removed breakpoints are flagged by -1. LevelEach position with equal smoothing value are labelled the same way with an integer value starting from one. The label is incremented by one when a new level occurs or when moving to the next chromosome. OutliersAwsEach AWS outliers are flagged -1 (if it is in the α/2 lower tail of the distribution) or 1 (if it is in the α/2 upper tail of the distribution) otherwise it is 0. OutliersMadEach MAD outliers are flagged -1 (if it is in the α/2 lower tail of the distribution) or 1 (if it is in the α/2 upper tail of the distribution) otherwise it is 0. OutliersTotOutliersAws + OutliersMad. NormalRefClusters which have been used to set the normal reference during the "clustering throughout the genome" step are code by 0. Note that if `genomestep=FALSE`, all the value are set to 0. ZoneGNLStatus of each clone: Gain is coded by 1, Loss by -1, Amplicon by 2, deletion by -10 and Normal by 0.
`BkpInfo`	is a data.frame sum up the information for each breakpoint: ChromosomeChromosome name. SmoothingSmoothing value for the breakpoint. Gapabsolute value of the gap between the smoothing values of the two consecutive regions. SigmaThe estimation of the standard-deviation of the chromosome. Weight1 - `kernelpen`(Gap, type, param=c(d=nbsigmaSigma)) ZoneGNLStatus of the level where is the breakpoint. GNLchangeTakes the value 1 if the ZoneGNL of the two consecutive regions are different. LogRatio*Test over Reference log-ratio.
`NormalRef`	If `genomestep=TRUE` and `normalrefcenter=FALSE`, then NormalRef is the median of the cluster which has been used to set the normal reference during the "clustering throughout the genome" step. Otherwise NormalRef is 0.

Note

People interested in tools dealing with array CGH analysis can visit our web-page http://bioinfo.curie.fr.

Author(s)

Philippe HupÃ©, glad@curie.fr.

References

Polzehl and Spokoiny (WIAS-Preprint 787, 2002): Local likelihood modelling by adaptive weights smoothing.

Ben-Yaacov and Eldar (Bioinformatics, 2008): A fast and flexible method for the segmentation of aCGH data.

Examples


data(snijders)
gm13330$Clone <- gm13330$BAC
profileCGH <- as.profileCGH(gm13330)


###########################################################
###
###  daglad function
###
###########################################################


res <- daglad(profileCGH, mediancenter=FALSE, normalrefcenter=FALSE, genomestep=FALSE,
              smoothfunc="lawsglad", lkern="Exponential", model="Gaussian",
              qlambda=0.999,  bandwidth=10, base=FALSE, round=1.5,
              lambdabreak=8, lambdaclusterGen=40, param=c(d=6), alpha=0.001, msize=2,
              method="centroid", nmin=1, nmax=8,
              amplicon=1, deletion=-5, deltaN=0.10,  forceGL=c(-0.15,0.15), nbsigma=3,
              MinBkpWeight=0.35, CheckBkpPos=TRUE)


### data for cytoband
data(cytoband)

### Genomic profile on the whole genome
plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing",
main="Breakpoints detection: DAGLAD analysis", cytoband = cytoband)



###Genomic profile for chromosome 1
plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1,
Smoothing="Smoothing", main="Chromosome 1: DAGLAD analysis", cytoband = cytoband)


### The standard-deviation of LogRatio are:
res$SigmaC


### The list of breakpoints is:
res$BkpInfo

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GLAD)

######################################################################################

Have fun with GLAD

For smoothing it is possible to use either
the AWS algorithm (Polzehl and Spokoiny, 2002,
or the HaarSeg algorithm (Ben-Yaacov and Eldar, Bioinformatics,  2008,

If you use the package with AWS, please cite:
Hupe et al. (Bioinformatics, 2004, and Polzehl and Spokoiny (2002,

If you use the package with HaarSeg, please cite:
Hupe et al. (Bioinformatics, 2004, and (Ben-Yaacov and Eldar, Bioinformatics, 2008,

For fast computation it is recommanded to use
the daglad function with smoothfunc=haarseg

######################################################################################

New options are available in daglad: see help for details.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GLAD/daglad.Rd_%03d_medium.png", width=480, height=480)
> ### Name: daglad
> ### Title: Analysis of array CGH data
> ### Aliases: daglad daglad.profileCGH
> ### Keywords: models
> 
> ### ** Examples
> 
> 
> data(snijders)
> gm13330$Clone <- gm13330$BAC
> profileCGH <- as.profileCGH(gm13330)
> 
> 
> ###########################################################
> ###
> ###  daglad function
> ###
> ###########################################################
> 
> 
> res <- daglad(profileCGH, mediancenter=FALSE, normalrefcenter=FALSE, genomestep=FALSE,
+               smoothfunc="lawsglad", lkern="Exponential", model="Gaussian",
+               qlambda=0.999,  bandwidth=10, base=FALSE, round=1.5,
+               lambdabreak=8, lambdaclusterGen=40, param=c(d=6), alpha=0.001, msize=2,
+               method="centroid", nmin=1, nmax=8,
+               amplicon=1, deletion=-5, deltaN=0.10,  forceGL=c(-0.15,0.15), nbsigma=3,
+               MinBkpWeight=0.35, CheckBkpPos=TRUE)
[1] "Smoothing for each Chromosome"
[1] "Optimization of the Breakpoints and DNA copy number calling"
[1] "Check Breakpoints Position"
[1] "Results Preparation"
> 
> 
> ### data for cytoband
> data(cytoband)
> 
> ### Genomic profile on the whole genome
> plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing",
+ main="Breakpoints detection: DAGLAD analysis", cytoband = cytoband)
> 
> 
> 
> ###Genomic profile for chromosome 1
> plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1,
+ Smoothing="Smoothing", main="Chromosome 1: DAGLAD analysis", cytoband = cytoband)
> 
> 
> ### The standard-deviation of LogRatio are:
> res$SigmaC
   Chromosome      Value
1           1 0.07788535
2           2 0.06850393
3           3 0.06506617
4           4 0.07644038
5           5 0.08342531
6           6 0.07988090
7           7 0.08071331
8           8 0.07094549
9           9 0.06955477
10         10 0.06717754
11         11 0.06593160
12         12 0.07461950
13         13 0.06104140
14         14 0.08988050
15         15 0.09300629
16         16 0.06872091
17         17 0.07717453
18         18 0.08011845
19         19 0.09860941
20         20 0.07678643
21         21 0.06994379
22         22 0.05492636
23         23 0.05234015
> 
> 
> ### The list of breakpoints is:
> res$BkpInfo
        Clone PosBase Chromosome    Weight GNLchange  Smoothing       Gap
1 RP11-234m03  156276          1 1.0000000         1  0.0221935 0.4976845
2 RP11-272O03  173943          4 1.0000000         1 -0.0714645 0.7681605
3  RP11-23E05   23609         11 0.9537355         1  0.0083130 0.1705440
4 RP11-193O09   33075         11 0.8867582         1 -0.1622310 0.1586670
5 RP11-187A08   39389         11 0.6153643         0 -0.0035640 0.1282735
6  RP11-12d19   46795         11 0.9733512         0 -0.1318375 0.1757320
7   RP11-7H07   83201         11 0.8566989         0  0.0438945 0.1545125
8   RP11-2F21  120000         11 0.9471314         0 -0.1106180 0.1690835
       Sigma  LogRatio ZoneGNL PosOrder
1 0.07788535 -0.095741       0       91
2 0.07644038 -0.205004       0      468
3 0.06593160 -0.043762       0     1342
4 0.06593160 -0.164159      -1     1355
5 0.06593160  0.000318       0     1370
6 0.06593160 -0.056248       0     1378
7 0.06593160 -0.011897       0     1436
8 0.06593160 -0.113474       0     1465
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>