Last data update: 2014.03.03

R: WRAPPER FOR THE DMRforPairs ANALSYSIS
DMRforPairsR Documentation

WRAPPER FOR THE DMRforPairs ANALSYSIS

Description

Wrapper for the DMRforPairs analysis. Includes recoding of the probe classes, identification of sufficiently probe-dense regions and analysis (calculation of statistics and testing).

Usage

DMRforPairs(classes_gene,classes_island,targetID, chr, position,m.v,beta.v,min_n=4,min_distance=200,min_dM=1.4,recode=1,sep=";",method="fdr",debug.v=FALSE,gs,do.parallel=0)

Arguments

classes_gene

m x 1 data frame with the relation to gene for each probe. Illumina annotates this as Body, 5'UTR, 3'UTR, 1stExon, TSS1500 or TSS200. When the lumiMethyR function of the lumi package is used to import the data from GenomeStudio's final report, this information can be extracted via the fData(function), e.g. fData(data) $UCSC_REFGENE_GROUP where data is the MethyLumiM object resulting from the import. If these classes are unknown an m x 1 character vector with "unknown.gene" can be supplied and recode should be set to 2.

classes_island

m x 1 data frame with the relation to CpG island. Illumina annotates this as Island, N_Shelf, N_Shore, S_Shelf or S_Shore. Information can be accessed via (fData(data)$RELATION_TO_UCSC_CPG_ISLAND. If these classes are unknown an m x 1 character vector with "unknown.island" can be supplied and recode should be set to 2.

targetID

m x 1 data frame containing the identifier of each of the m probes, i.e. cgxxxxxxxx. Information can be accessed via (fData(data)$TargetID

chr

m x 1 data frame containing the chromosome each probe is annotated to. Information can be accessed via (fData(data)$CHR

position

m x 1 data frame containing the genomic position each probe is annotated to. Information can be accessed via (fData(data)$MAPINFO

m.v

m x n matrix or data frame containing the M-values for each probe for each sample. These can be directly extracted from the MethyLumiM using the exprs(estimateM()) functions form the lumi package. Alternatively any m x n matrix can be used (for example output from an external normalization algorithm that does not use the MethyLumiM format or methylation values from another platform.

beta.v

See m.v. Use and exprs(estimateBeta(data)) to extract beta values from the MethyLumiM object.

min_n

Minimal number of probes required to consider a group of subsequent probes for inclusion as a region (=potential DMR). Default (minimum) is 4 as the Mann Whitney U test requires a minimum of 7 samples to ever reach a value < 0.05 (2x4=8).

min_distance

Maximal distance between adjacent probes to accept when including them in one region. The default value of 200bp is based on the findings in Marimba et al. 2013 en Eckhardt et al 2006 regarding rapid drop of co-methylation of adjacent probes when these are further apart.

min_dM

Minimal median difference between M-values in a region to consider the region for formal testing. Default value of 1.4 is based on the findings in Du et al 2010.

recode

Recoding scheme to group or discard probes annotated to certain functional genomic regions (see also classes_gene and classes_island parameters and the merge_classes function). (default=1)

sep

Separator used in the second column of the recode parameter. Use ";" or do not specify if using the built-in schemes. (default=";")

method

Method to use for correction for multiple testing. See p.adjust() function in R for possible settings. Default is 'fdr' implicating correction according to Benjamini-Hochberg.

debug.v

For development. If TRUE, only the first chromosome is analyzed. (default=FALSE)

gs

m x 1 data frame with associated gene symbols. From a MethyLumiM object this information can be extracted via the fData(function), e.g. fData(data) $UCSC_REFGENE_NAME

do.parallel

Use parallel processes to compute statistics per region. 0=no parallelization, -1=use all available cores, n>1 use n cores (default=0)

Details

This wrapper subsequently:

  1. Recodes probe classes according to a custom or build in scheme. See recode parameter and merge_classes.

  2. Identifies regions with sufficient probe density (i.e. number of probes and proximity) over all genomic regions at which probes are annotated in the dataset. See regionfinder

  3. Calculates relevant statistics (e.g. median (difference in) M and beta values). If the median difference is sufficiently large (>=min_dM), the function performs formal testing of the difference. If indicated, pairwise testing is performed as well. See testregion and calc_stats.

Export & visualization are not directly included in the wrapper, as these require an internet connection and the analysis itself might take place on a compute server that is not connected to the internet. See export_data for more information about this. This wrapper does not store the results, but returns a list of data structures to be processed further or saved by the user.

Value

Returns a list of objects resulting from the different steps of the DMRforPairs algorithm. Please see the description of the associated functions for more information.

$classes

includes the results from merge_classes

$regions

includes the results from regionfinder

$tested

includes the results from testregion

Author(s)

Martin Rijlaarsdam, m.a.rijlaarsdam@gmail.com

References

  • Marabita, F., et al., An evaluation of analysis pipelines for DNA methylation profiling using the Illumina HumanMethylation450 BeadChip platform. Epigenetics, 2013. 8(3): p. 333-46.

  • Eckhardt, F., et al., DNA methylation profiling of human chromosomes 6, 20 and 22. Nat Genet, 2006. 38(12): p. 1378-85.

  • Du, P., et al., Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics, 2010. 11: p. 587.

See Also

merge_classes, regionfinder, testregion, calc_stats.

Examples

#load vignette data & select a region of 4 MB on chr 7 for speed
data(DMRforPairs_data)
CL.methy=CL.methy[which(CL.methy$position<=1.07E+8 & 
                        CL.methy$position>=1.06E+8),]

output=DMRforPairs(classes_gene=CL.methy$class.gene.related, 
                classes_island=CL.methy$class.island.related, 
                targetID=CL.methy$targetID, 
                chr=CL.methy$chromosome, 
                position=CL.methy$position, 
                m.v=CL.methy[,c(7:8)], 
                beta.v=CL.methy[,c(11:12)],gs=CL.methy$gene.symbol,
				do.parallel=0)

#primary output of merge_classes()
head(output$classes$pclass) #orginal probe classes
head(output$classes$pclass_recoded) #recoded probe classes
head(output$classes$no.pclass) #row numbers of probes without a recoded class
head(output$classes$u_pclass) #classes used for recoding

#primary output of regionfinder()
#output listing the potential regions of interest identified
head(output$regions$boundaries)
#probes with associated class after recoding (=valid)
head(output$regions$valid.probes) 
#... and associated m values for all samples
head(output$regions$valid.m) 
#... and associated beta values for al samples
head(output$regions$valid.beta) 
#matrix of valid probes (rows) and recoded probe classes (columns) with 
#either NA if not included in any potential region of interest or the 
#ID of the region the probe is assigned to.
head(output$regions$perprobe,10) 

#primary output of testregion() and calc_stats()
#these results are similar to output$regions$boundaries 
#but are supplemented with descriptive statistics 
#and formal test results per region.
head(output$tested)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(DMRforPairs)
Loading required package: Gviz
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit


Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: grid
Loading required package: R2HTML
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/DMRforPairs/DMRforPairs.Rd_%03d_medium.png", width=480, height=480)
> ### Name: DMRforPairs
> ### Title: WRAPPER FOR THE DMRforPairs ANALSYSIS
> ### Aliases: DMRforPairs
> ### Keywords: analysis
> 
> ### ** Examples
> 
> #load vignette data & select a region of 4 MB on chr 7 for speed
> data(DMRforPairs_data)
> CL.methy=CL.methy[which(CL.methy$position<=1.07E+8 & 
+                         CL.methy$position>=1.06E+8),]
> 
> output=DMRforPairs(classes_gene=CL.methy$class.gene.related, 
+                 classes_island=CL.methy$class.island.related, 
+                 targetID=CL.methy$targetID, 
+                 chr=CL.methy$chromosome, 
+                 position=CL.methy$position, 
+                 m.v=CL.methy[,c(7:8)], 
+                 beta.v=CL.methy[,c(11:12)],gs=CL.methy$gene.symbol,
+ 				do.parallel=0)
Recoding annotation classes...
Body --> gene
5'UTR --> gene
3'UTR --> gene
1stExon --> gene
TSS1500 --> tss
TSS200 --> tss
Island --> island
N_Shelf --> island
N_Shore --> island
S_Shelf --> island
S_Shore --> island
Regionfinder: processing chr7 (1/1)
Calculating statistics and testing differences between samples per region. Please be patient.
> 
> #primary output of merge_classes()
> head(output$classes$pclass) #orginal probe classes
           [,1]                   
cg00139681 ";"                    
cg00493363 ";"                    
cg00502268 ";"                    
cg00510718 "5'UTR;1stExon;N_Shelf"
cg00530438 ";"                    
cg00604356 "5'UTR;N_Shore"        
> head(output$classes$pclass_recoded) #recoded probe classes
           [,1]            
cg00139681 "NA;NA;NA"      
cg00493363 "NA;NA;NA"      
cg00502268 "NA;NA;NA"      
cg00510718 "gene;NA;island"
cg00530438 "NA;NA;NA"      
cg00604356 "gene;NA;island"
> head(output$classes$no.pclass) #row numbers of probes without a recoded class
[1]  1  2  3  5 12 13
> head(output$classes$u_pclass) #classes used for recoding
[1] gene   tss    island
Levels: gene island tss
> 
> #primary output of regionfinder()
> #output listing the potential regions of interest identified
> head(output$regions$boundaries)
  chr  start_bp    end_bp length_bp n_probes regionID regionIDall   ClassAll
1   7 106505688 106505914       227        4        1           1        tss
2   7 106685015 106685033        19        4        2         2;6 tss;island
3   7 106808956 106809317       362        4        3         3;7 tss;island
4   7 106301534 106301767       234        5        4           4     island
5   7 106505688 106505961       274        5        5           5     island
> #probes with associated class after recoding (=valid)
> head(output$regions$valid.probes) 
           rowID    probeID chr  position         pClass
cg00510718     1 cg00510718   7 106505961 gene;NA;island
cg00604356     2 cg00604356   7 106507474 gene;NA;island
cg00641009     3 cg00641009   7 106685553 gene;NA;island
cg00661777     4 cg00661777   7 106511741 gene;NA;island
cg00722445     5 cg00722445   7 106685033  NA;tss;island
cg00888463     6 cg00888463   7 106686041 gene;NA;island
> #... and associated m values for all samples
> head(output$regions$valid.m) 
               A431.M     MCF7.M
cg00510718  4.0881599  2.1574129
cg00604356 -5.8483430  0.9878616
cg00641009 -3.2539855 -4.1970080
cg00661777  0.4914001  4.3152130
cg00722445 -7.8694567 -8.4304158
cg00888463  0.5938111 -2.3134735
> #... and associated beta values for al samples
> head(output$regions$valid.beta) 
             A431.beta   MCF7.beta
cg00510718 0.942722713 0.810021256
cg00604356 0.029456622 0.664791992
cg00641009 0.094881967 0.051866652
cg00661777 0.584313207 0.951622201
cg00722445 0.004425962 0.002932602
cg00888463 0.601380445 0.167484255
> #matrix of valid probes (rows) and recoded probe classes (columns) with 
> #either NA if not included in any potential region of interest or the 
> #ID of the region the probe is assigned to.
> head(output$regions$perprobe,10) 
           gene tss island
cg00510718   NA  NA      5
cg00604356   NA  NA     NA
cg00641009   NA  NA     NA
cg00661777   NA  NA     NA
cg00722445   NA   2      6
cg00888463   NA  NA     NA
cg01353569   NA  NA     NA
cg02392881   NA  NA     NA
cg02696742   NA  NA     NA
cg02873168   NA   3      7
> 
> #primary output of testregion() and calc_stats()
> #these results are similar to output$regions$boundaries 
> #but are supplemented with descriptive statistics 
> #and formal test results per region.
> head(output$tested)
  chr  start_bp    end_bp length_bp n_probes regionID regionIDall   ClassAll
1   7 106505688 106505914       227        4        1           1        tss
2   7 106685015 106685033        19        4        2         2;6 tss;island
3   7 106808956 106809317       362        4        3         3;7 tss;island
4   7 106301534 106301767       234        5        4           4     island
5   7 106505688 106505961       274        5        5           5     island
  beta.median.A431.beta beta.median.MCF7.beta m.median.A431.M m.median.MCF7.M
1           0.915774546           0.969722268        3.497561        5.020344
2           0.004506115           0.002804754       -7.832774       -8.521158
3           0.004657216           0.016171985       -7.860358       -5.973694
4           0.014250283           0.029839109       -6.125118       -5.028141
5           0.937348260           0.965548428        3.912801        4.811377
  median.delta.beta.A431.M.minus.MCF7.M median.delta.m.A431.M.minus.MCF7.M
1                          -0.053947722                         -1.5227827
2                           0.001044902                          0.4507334
3                          -0.012278616                         -1.8866638
4                          -0.015588826                         -1.0969761
5                          -0.028200168                         -0.8985752
  pairwise.p.A431.M.vs.MCF7.M max.abs.median.delta   p.value p.value.adjusted
1                          NA            1.5227827 0.4857143        0.4857143
2                          NA            0.4507334        NA               NA
3                          NA            1.8866638 0.2000000        0.4000000
4                          NA            1.0969761        NA               NA
5                          NA            0.8985752        NA               NA
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>