Last data update: 2014.03.03

R: Calculate the threshold value corresponding to control FDR at...
getThresholdR Documentation

Calculate the threshold value corresponding to control FDR at a desired level

Description

Calculate the threshold value corresponding to control FDR at a desired level

Usage

getThreshold(winscores, permutatedScores, FDR)

Arguments

winscores

Numeric vector with score values obtained from the sigWin function

permutatedScores

Numeric vector with the permutated read-enrichment score values

FDR

Numeric value with the desired FDR control

Details

This is a very simple function to obtain the threshold value of our test statistic controlling FDR at a desired level. Other functions implemented in R (eg: multtest) could be more sophisticated. Basically, for each possible threshold value, the proportion of error type I is calculated assuming that the permutated score distribution is a optimal estimation of the score distribution under the null hypothesis. This is, the proportion of permutated scores exceding the considered threshold value is used as an estimation of the error type I of our statisitic. FDR is obtained as the ratio of the proportion of error type I by the proportion of significant tests.

Value

A table with the columns being:

threshold

The threshold value

p-value

The p-value obtained from the permutated score ditribution

FDR

The FDR control obtained using threshold

Author(s)

Jose M Muino, jose.muino@wur.nl

References

Muino et al. (submitted). Plant ChIP-seq Analyzer: An R package for the statistcal detection of protein-bound genomic regions.
Kaufmann et al.(2009).Target genes of the MADS transcription factor SEPALLATA3: integration of developmental and hormonal pathways in the Arabidopsis flower. PLoS Biology; 7(4):e1000090.

See Also

CSAR-package,getPermutatedWinScores, sigWin

Examples

##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009)
data("CSAR-dataset");
##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb
nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000))
nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000))


##We calculate a score for each nucleotide position
test<-ChIPseqScore(control=nhitsC,sample=nhitsS)

##We calculate the candidate read-enriched regions
win<-sigWin(test)


##We calculate two sets of read-enrichment scores through permutation
permutatedWinScores(nn=1,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000))
permutatedWinScores(nn=2,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000))

###Next function will get all permutated score values generated by permutatedWinScores function. 
##This represent the score distribution under the null hypotesis and therefore it can be use to control the error of our test.
nulldist<-getPermutatedWinScores(file="test",nn=1:2)

##From this distribution, several cut-off values can be calculated to control the error of our test. 
##Several functions  in R can be used for this purpose.
##In this package we had implemented a simple method for the control of the error based on FDR"
getThreshold(winscores=values(win)$score,permutatedScores=nulldist,FDR=.01)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(CSAR)
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit


Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/CSAR/getThreshold.Rd_%03d_medium.png", width=480, height=480)
> ### Name: getThreshold
> ### Title: Calculate the threshold value corresponding to control FDR at a
> ###   desired level
> ### Aliases: getThreshold
> 
> ### ** Examples
> 
> ##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009)
> data("CSAR-dataset");
> ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb
> nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000))
mappedReads2Nhits has just finished   CHR1v01212004 ...
> nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000))
mappedReads2Nhits has just finished   CHR1v01212004 ...
> 
> 
> ##We calculate a score for each nucleotide position
> test<-ChIPseqScore(control=nhitsC,sample=nhitsS)
CHR1v01212004  done...
> 
> ##We calculate the candidate read-enriched regions
> win<-sigWin(test)
CHR1v01212004 done...
> 
> 
> ##We calculate two sets of read-enrichment scores through permutation
> permutatedWinScores(nn=1,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000))
mappedReads2Nhits has just finished   CHR1v01212004 ...
mappedReads2Nhits has just finished   CHR1v01212004 ...
CHR1v01212004  done...
CHR1v01212004 done...
Win file for permutation 1 can be found at test-1.permutatedWin
> permutatedWinScores(nn=2,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000))
mappedReads2Nhits has just finished   CHR1v01212004 ...
mappedReads2Nhits has just finished   CHR1v01212004 ...
CHR1v01212004  done...
CHR1v01212004 done...
Win file for permutation 2 can be found at test-2.permutatedWin
> 
> ###Next function will get all permutated score values generated by permutatedWinScores function. 
> ##This represent the score distribution under the null hypotesis and therefore it can be use to control the error of our test.
> nulldist<-getPermutatedWinScores(file="test",nn=1:2)
Read 74 items
Read 73 items
> 
> ##From this distribution, several cut-off values can be calculated to control the error of our test. 
> ##Several functions  in R can be used for this purpose.
> ##In this package we had implemented a simple method for the control of the error based on FDR"
> getThreshold(winscores=values(win)$score,permutatedScores=nulldist,FDR=.01)
   threshold Error_type_I FDR
28      3.56            0   0
Warning message:
In getThreshold(winscores = values(win)$score, permutatedScores = nulldist,  :
  The number of permutated scores is low.
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>