Last data update: 2014.03.03

R: Calculate FDR q-values for differentially methylated regions...
dmrFdrR Documentation

Calculate FDR q-values for differentially methylated regions (DMRs)

Description

Estimate false discovery rate q-values for a set of differentially methylated regions (found using the dmrFinder function) using a permutation approach. For differentially methylated regions found using the dmrFind function, use the qval function instead.

Usage

dmrFdr(dmr, compare = 1, numPerms = 1000, seed = NULL, verbose = TRUE)

Arguments

dmr

a dmr object as returned by dmrFinder

compare

The dmr table for which to calculate DMRs. See details.

numPerms

Number of permutations

seed

Random seed (for reproducibility)

verbose

Boolean

Details

This function estimates false discovery rate q-values for a dmr object returned by dmrFinder. dmrFinder can return a set of DMR tables with one or more pair-wise comparisons between groups. dmrFdr currently only calculated q-values for one of these at a time. The dmr table to use (if the dmr object contains more than one) is specified by the compare option.

Value

a list object in the same format as the input, but with extra p-val and q-val columns for the tabs element.

Author(s)

Martin Aryee <aryee@jhu.edu>

See Also

qval, dmrFinder, dmrPlot, regionPlot

Examples

	if (require(charmData) & require(BSgenome.Hsapiens.UCSC.hg18)) {
		phenodataDir <- system.file("extdata", package="charmData")
		pd <- read.delim(file.path(phenodataDir, "phenodata.txt"))
		pd <- subset(pd, tissue %in% c("liver", "colon"))
		# Validate format of sample description file		
		res <- validatePd(pd)
		dataDir <- system.file("data", package="charmData")
		setwd(dataDir)
		# Read in raw data
		rawData <- readCharm(files=pd$filename, sampleKey=pd)
		# Find non-CpG control probes
		ctrlIdx <- getControlIndex(rawData, subject=Hsapiens)
		# Estimate methylation
		p <- methp(rawData, controlIndex=ctrlIdx)
		# Find differentially methylated regions
		grp <- pData(rawData)$tissue
		dmr <- dmrFinder(rawData, p=p, groups=grp, 
                        removeIf=expression(nprobes<4 | abs(diff)<.05 | abs(maxdiff)<.05),
			compare=c("liver", "colon"), cutoff=0.95)
		head(dmr$tabs[[1]])
		# Estimate false discovery rate for DMRs
		dmr <- dmrFdr(dmr, numPerms=3, seed=123) 
		head(dmr$tabs[[1]])

                ##Not run:
                ## Plot top 10 DMRs:
                #cpg.cur = read.delim("http://rafalab.jhsph.edu/CGI/model-based-cpg-islands-hg18.txt", as.is=TRUE)
                #dmrPlot(dmr=dmr, which.table=1, which.plot=1:5, legend.size=1, all.lines=TRUE, all.points=TRUE, colors.l=c("blue","black"), colors.p=c("blue","black"), outpath=".", cpg.islands=cpg.cur, Genome=Hsapiens)

                ## plot any given genomic regions using this data, supplying the regions in a data frame that must have columns with names "chr", "start", and "end": 
                #mytab = data.frame(chr=as.character(c(dmr$tabs[[1]]$chr[1],"chrY",dmr$tabs[[1]]$chr[-1])), start=as.numeric(c(dmr$tabs[[1]]$start[1],1,dmr$tabs[[1]]$start[-1])), end=as.numeric(c(dmr$tabs[[1]]$end[1],100,dmr$tabs[[1]]$end[-1])), stringsAsFactors=FALSE)[1:5,]
                #regionPlot(tab=mytab, dmr=dmr, cpg.islands=cpg.cur, Genome=Hsapiens, outfile="./myregions.pdf", which.plot=1:5, plot.these=c("liver","colon"), cl=c("blue","black"), legend.size=1, buffer=3000)
                ## note that region 2 is not plotted since it is not on the array.

                ## Example of paired analysis:
                pData(rawData)$pair = c(1,2,1,2) ## fake pairing information for this example.
                dmr2 <- dmrFinder(rawData, p=p, groups=grp, 
                                  compare=c("colon", "liver"),
                                  removeIf=expression(nprobes<4 | abs(diff)<.05 | abs(maxdiff)<.05),
                                  paired=TRUE, pairs=pData(rawData)$pair, cutoff=0.95)

                #dmrPlot(dmr=dmr2, which.table=1, which.plot=c(3), legend.size=1, all.lines=TRUE, all.points=TRUE, colors.l=c("black"), colors.p=c("black"), outpath=".", cpg.islands=cpg.cur, Genome=Hsapiens)
                #regionPlot(tab=mytab, dmr=dmr2, cpg.islands=cpg.cur, Genome=Hsapiens, outfile="myregions.pdf", which.plot=1:5, plot.these=c("colon-liver"), cl=c("black"), legend.size=1, buffer=3000)
	}

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(charm)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: SQN
Loading required package: mclust
Package 'mclust' version 5.2
Type 'citation("mclust")' for citing this R package in publications.
Loading required package: nor1mix
Loading required package: fields
Loading required package: spam
Loading required package: grid
Spam version 1.3-0 (2015-10-24) is loaded.
Type 'help( Spam)' or 'demo( spam)' for a short introduction 
and overview of this package.
Help for individual functions is also obtained by adding the
suffix '.spam' to the function name, e.g. 'help( chol.spam)'.

Attaching package: 'spam'

The following objects are masked from 'package:base':

    backsolve, forwardsolve

Loading required package: maps

 # maps v3.1: updated 'world': all lakes moved to separate new #
 # 'lakes' database. Type '?world' or 'news(package="maps")'.  #



Attaching package: 'maps'

The following object is masked from 'package:mclust':

    map

Loading required package: RColorBrewer
Loading required package: genefilter
Welcome to charm version 2.18.0
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/charm/dmrFdr.Rd_%03d_medium.png", width=480, height=480)
> ### Name: dmrFdr
> ### Title: Calculate FDR q-values for differentially methylated regions
> ###   (DMRs)
> ### Aliases: dmrFdr
> 
> ### ** Examples
> 
> 	if (require(charmData) & require(BSgenome.Hsapiens.UCSC.hg18)) {
+ 		phenodataDir <- system.file("extdata", package="charmData")
+ 		pd <- read.delim(file.path(phenodataDir, "phenodata.txt"))
+ 		pd <- subset(pd, tissue %in% c("liver", "colon"))
+ 		# Validate format of sample description file		
+ 		res <- validatePd(pd)
+ 		dataDir <- system.file("data", package="charmData")
+ 		setwd(dataDir)
+ 		# Read in raw data
+ 		rawData <- readCharm(files=pd$filename, sampleKey=pd)
+ 		# Find non-CpG control probes
+ 		ctrlIdx <- getControlIndex(rawData, subject=Hsapiens)
+ 		# Estimate methylation
+ 		p <- methp(rawData, controlIndex=ctrlIdx)
+ 		# Find differentially methylated regions
+ 		grp <- pData(rawData)$tissue
+ 		dmr <- dmrFinder(rawData, p=p, groups=grp, 
+                         removeIf=expression(nprobes<4 | abs(diff)<.05 | abs(maxdiff)<.05),
+ 			compare=c("liver", "colon"), cutoff=0.95)
+ 		head(dmr$tabs[[1]])
+ 		# Estimate false discovery rate for DMRs
+ 		dmr <- dmrFdr(dmr, numPerms=3, seed=123) 
+ 		head(dmr$tabs[[1]])
+ 
+                 ##Not run:
+                 ## Plot top 10 DMRs:
+                 #cpg.cur = read.delim("http://rafalab.jhsph.edu/CGI/model-based-cpg-islands-hg18.txt", as.is=TRUE)
+                 #dmrPlot(dmr=dmr, which.table=1, which.plot=1:5, legend.size=1, all.lines=TRUE, all.points=TRUE, colors.l=c("blue","black"), colors.p=c("blue","black"), outpath=".", cpg.islands=cpg.cur, Genome=Hsapiens)
+ 
+                 ## plot any given genomic regions using this data, supplying the regions in a data frame that must have columns with names "chr", "start", and "end": 
+                 #mytab = data.frame(chr=as.character(c(dmr$tabs[[1]]$chr[1],"chrY",dmr$tabs[[1]]$chr[-1])), start=as.numeric(c(dmr$tabs[[1]]$start[1],1,dmr$tabs[[1]]$start[-1])), end=as.numeric(c(dmr$tabs[[1]]$end[1],100,dmr$tabs[[1]]$end[-1])), stringsAsFactors=FALSE)[1:5,]
+                 #regionPlot(tab=mytab, dmr=dmr, cpg.islands=cpg.cur, Genome=Hsapiens, outfile="./myregions.pdf", which.plot=1:5, plot.these=c("liver","colon"), cl=c("blue","black"), legend.size=1, buffer=3000)
+                 ## note that region 2 is not plotted since it is not on the array.
+ 
+                 ## Example of paired analysis:
+                 pData(rawData)$pair = c(1,2,1,2) ## fake pairing information for this example.
+                 dmr2 <- dmrFinder(rawData, p=p, groups=grp, 
+                                   compare=c("colon", "liver"),
+                                   removeIf=expression(nprobes<4 | abs(diff)<.05 | abs(maxdiff)<.05),
+                                   paired=TRUE, pairs=pData(rawData)$pair, cutoff=0.95)
+ 
+                 #dmrPlot(dmr=dmr2, which.table=1, which.plot=c(3), legend.size=1, all.lines=TRUE, all.points=TRUE, colors.l=c("black"), colors.p=c("black"), outpath=".", cpg.islands=cpg.cur, Genome=Hsapiens)
+                 #regionPlot(tab=mytab, dmr=dmr2, cpg.islands=cpg.cur, Genome=Hsapiens, outfile="myregions.pdf", which.plot=1:5, plot.these=c("colon-liver"), cl=c("black"), legend.size=1, buffer=3000)
+ 	}
Loading required package: charmData
Loading required package: pd.charm.hg18.example
Loading required package: RSQLite
Loading required package: DBI
Loading required package: oligoClasses
Welcome to oligoClasses version 1.34.0
Loading required package: oligo
Loading required package: Biostrings
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'stats4'

The following object is masked from 'package:spam':

    mle


Attaching package: 'S4Vectors'

The following objects are masked from 'package:spam':

    colMeans, colSums, rowMeans, rowSums

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
================================================================================
Welcome to oligo version 1.36.1
================================================================================
Loading required package: BSgenome.Hsapiens.UCSC.hg18
Loading required package: BSgenome
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: rtracklayer
Filenames:
  fileNameColumn not specified. Trying to guess.

  OK - Found in column1
Sample names:
  sampleNameColumn column not specified. Trying to guess.
  OK - Found in column2
Group labels:
  groupColumn column not specified. Trying to guess.
  OK - Found in column3
The following columns in sampleKey contain discrepant values between channels and are being removed: 1
Platform design info loaded.
Checking designs for each XYS file... Done.
Allocating memory... Done.
Reading ./136421_532.xys.
Reading ./3788602_532.xys.
Reading ./5739902_532.xys.
Reading ./5875602_532.xys.
Checking designs for each XYS file... Done.
Allocating memory... Done.
Reading ./136421_635.xys.
Reading ./3788602_635.xys.
Reading ./5739902_635.xys.
Reading ./5875602_635.xys.
Spatial normalization
Background removal
Within sample normalization: loess
Between sample normalization: quantile
Estimating percentage methylation
Computing group medians and SDs for 2 groups:

1

2

Done.

Smoothing:

================================================================================
Done.
Finding DMRs for each pairwise comparison.

liver-colon
..740 DMR candidates found using cutoff=0.95.

Done
Calculating q-values for DMRs between liver-colon

Finding permuted data DMRs. Estimating time remaining
1/3 (0 minutes remaining)
2/3 (0 minutes remaining)
3/3 (0 minutes remaining)
Comparison 1 is just using raw differences!  May want to set lower CUTOFF.
Smoothing mean differences using window size of 15:
================================================================================
Finding DMRs for each pairwise comparison.

colon-liver
..31 DMR candidates found using cutoff=0.95.

Done
There were 34 warnings (use warnings() to see them)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>