R: Calculate the threshold value corresponding to control FDR at...
getThreshold
R Documentation
Calculate the threshold value corresponding to control FDR at a desired level
Description
Calculate the threshold value corresponding to control FDR at a desired level
Usage
getThreshold(winscores, permutatedScores, FDR)
Arguments
winscores
Numeric vector with score values obtained from the sigWin function
permutatedScores
Numeric vector with the permutated read-enrichment score values
FDR
Numeric value with the desired FDR control
Details
This is a very simple function to obtain the threshold value of our test statistic controlling FDR at a desired level. Other functions implemented in R (eg: multtest) could be more sophisticated.
Basically, for each possible threshold value, the proportion of error type I is calculated assuming that the permutated score distribution is a optimal estimation of the score distribution under the null hypothesis.
This is, the proportion of permutated scores exceding the considered threshold value is used as an estimation of the error type I of our statisitic.
FDR is obtained as the ratio of the proportion of error type I by the proportion of significant tests.
Value
A table with the columns being:
threshold
The threshold value
p-value
The p-value obtained from the permutated score ditribution
Muino et al. (submitted). Plant ChIP-seq Analyzer: An R package for the statistcal detection of protein-bound genomic regions. Kaufmann et al.(2009).Target genes of the MADS transcription factor SEPALLATA3: integration of developmental and hormonal pathways in the Arabidopsis flower. PLoS Biology; 7(4):e1000090.
See Also
CSAR-package,getPermutatedWinScores, sigWin
Examples
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009)
data("CSAR-dataset");
##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb
nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000))
nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000))
##We calculate a score for each nucleotide position
test<-ChIPseqScore(control=nhitsC,sample=nhitsS)
##We calculate the candidate read-enriched regions
win<-sigWin(test)
##We calculate two sets of read-enrichment scores through permutation
permutatedWinScores(nn=1,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000))
permutatedWinScores(nn=2,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000))
###Next function will get all permutated score values generated by permutatedWinScores function.
##This represent the score distribution under the null hypotesis and therefore it can be use to control the error of our test.
nulldist<-getPermutatedWinScores(file="test",nn=1:2)
##From this distribution, several cut-off values can be calculated to control the error of our test.
##Several functions in R can be used for this purpose.
##In this package we had implemented a simple method for the control of the error based on FDR"
getThreshold(winscores=values(win)$score,permutatedScores=nulldist,FDR=.01)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(CSAR)
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/CSAR/getThreshold.Rd_%03d_medium.png", width=480, height=480)
> ### Name: getThreshold
> ### Title: Calculate the threshold value corresponding to control FDR at a
> ### desired level
> ### Aliases: getThreshold
>
> ### ** Examples
>
> ##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009)
> data("CSAR-dataset");
> ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb
> nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000))
mappedReads2Nhits has just finished CHR1v01212004 ...
> nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000))
mappedReads2Nhits has just finished CHR1v01212004 ...
>
>
> ##We calculate a score for each nucleotide position
> test<-ChIPseqScore(control=nhitsC,sample=nhitsS)
CHR1v01212004 done...
>
> ##We calculate the candidate read-enriched regions
> win<-sigWin(test)
CHR1v01212004 done...
>
>
> ##We calculate two sets of read-enrichment scores through permutation
> permutatedWinScores(nn=1,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000))
mappedReads2Nhits has just finished CHR1v01212004 ...
mappedReads2Nhits has just finished CHR1v01212004 ...
CHR1v01212004 done...
CHR1v01212004 done...
Win file for permutation 1 can be found at test-1.permutatedWin
> permutatedWinScores(nn=2,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000))
mappedReads2Nhits has just finished CHR1v01212004 ...
mappedReads2Nhits has just finished CHR1v01212004 ...
CHR1v01212004 done...
CHR1v01212004 done...
Win file for permutation 2 can be found at test-2.permutatedWin
>
> ###Next function will get all permutated score values generated by permutatedWinScores function.
> ##This represent the score distribution under the null hypotesis and therefore it can be use to control the error of our test.
> nulldist<-getPermutatedWinScores(file="test",nn=1:2)
Read 74 items
Read 73 items
>
> ##From this distribution, several cut-off values can be calculated to control the error of our test.
> ##Several functions in R can be used for this purpose.
> ##In this package we had implemented a simple method for the control of the error based on FDR"
> getThreshold(winscores=values(win)$score,permutatedScores=nulldist,FDR=.01)
threshold Error_type_I FDR
28 3.56 0 0
Warning message:
In getThreshold(winscores = values(win)$score, permutatedScores = nulldist, :
The number of permutated scores is low.
>
>
>
>
>
> dev.off()
null device
1
>