R: Statistical tools for the analysis of ChIP-seq data
CSAR-package
R Documentation
Statistical tools for the analysis of ChIP-seq data
Description
Statistical tools for ChIP-seq data analysis. The package is oriented to plant organisms, and compatible with standard file formats in the plant research field.
Details
Package:
CSAR
Type:
Package
Version:
1.0
Date:
2009-11-09
License:
Artistic-2.0
LazyLoad:
yes
Author(s)
Jose M Muino
Maintainer: Jose M Muino <jose.muino@wur.nl>
References
Muino et al. (submitted). Plant ChIP-seq Analyzer: An R package for the statistcal detection of protein-bound genomic regions. Kaufmann et al.(2009).Target genes of the MADS transcription factor SEPALLATA3: integration of developmental and hormonal pathways in the Arabidopsis flower. PLoS Biology; 7(4):e1000090.
Examples
##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009)
data("CSAR-dataset");
##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb
nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000))
nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000))
##We calculate a score for each nucleotide position
test<-ChIPseqScore(control=nhitsC,sample=nhitsS)
##We calculate the candidate read-enriched regions
win<-sigWin(test)
##We generate a wig file of the results to visualize tehm in a genome browser
score2wig(test,file="test.wig")
##We calculate relative positions of read-enriched regions regarding gene position
d<-distance2Genes(win=win,gff=TAIR8_genes_test)
##We calculate table of genes with read-enriched regions, and their location
genes<-genesWithPeaks(d)
##We calculate two sets of read-enrichment scores through permutation
permutatedWinScores(nn=1,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000))
permutatedWinScores(nn=2,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000))
###Next function will get all permutated score values generated by permutatedWinScores function.
##This represent the score distribution under the null hypotesis and therefore it can be use to control the error of our test.
nulldist<-getPermutatedWinScores(file="test",nn=1:2)
##From this distribution, several cut-off values can be calculated to control the error of our test.
##Several functions in R can be used for this purpose.
##In this package we had implemented a simple method for the control of the error based on FDR"
getThreshold(winscores=values(win)$score,permutatedScores=nulldist,FDR=.01)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(CSAR)
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/CSAR/CSAR-package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: CSAR-package
> ### Title: Statistical tools for the analysis of ChIP-seq data
> ### Aliases: CSAR-package
>
> ### ** Examples
>
>
> ##For this example we will use the a subset of the SEP3 ChIP-seq data (Kaufmann, 2009)
> data("CSAR-dataset");
> ##We calculate the number of hits for each nucleotide posotion for the control and sample. We do that just for chromosome chr1, and for positions 1 to 10kb
> nhitsS<-mappedReads2Nhits(sampleSEP3_test,file="sampleSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000))
mappedReads2Nhits has just finished CHR1v01212004 ...
> nhitsC<-mappedReads2Nhits(controlSEP3_test,file="controlSEP3_test",chr=c("CHR1v01212004"),chrL=c(10000))
mappedReads2Nhits has just finished CHR1v01212004 ...
>
>
> ##We calculate a score for each nucleotide position
> test<-ChIPseqScore(control=nhitsC,sample=nhitsS)
CHR1v01212004 done...
>
> ##We calculate the candidate read-enriched regions
> win<-sigWin(test)
CHR1v01212004 done...
>
> ##We generate a wig file of the results to visualize tehm in a genome browser
> score2wig(test,file="test.wig")
CHR1v01212004 done...
>
> ##We calculate relative positions of read-enriched regions regarding gene position
> d<-distance2Genes(win=win,gff=TAIR8_genes_test)
Starting CHR1v01212004 ...
>
> ##We calculate table of genes with read-enriched regions, and their location
> genes<-genesWithPeaks(d)
>
> ##We calculate two sets of read-enrichment scores through permutation
> permutatedWinScores(nn=1,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000))
mappedReads2Nhits has just finished CHR1v01212004 ...
mappedReads2Nhits has just finished CHR1v01212004 ...
CHR1v01212004 done...
CHR1v01212004 done...
Win file for permutation 1 can be found at test-1.permutatedWin
> permutatedWinScores(nn=2,sample=sampleSEP3_test,control=controlSEP3_test,fileOutput="test",chr=c("CHR1v01212004"),chrL=c(100000))
mappedReads2Nhits has just finished CHR1v01212004 ...
mappedReads2Nhits has just finished CHR1v01212004 ...
CHR1v01212004 done...
CHR1v01212004 done...
Win file for permutation 2 can be found at test-2.permutatedWin
>
> ###Next function will get all permutated score values generated by permutatedWinScores function.
> ##This represent the score distribution under the null hypotesis and therefore it can be use to control the error of our test.
> nulldist<-getPermutatedWinScores(file="test",nn=1:2)
Read 66 items
Read 70 items
>
> ##From this distribution, several cut-off values can be calculated to control the error of our test.
> ##Several functions in R can be used for this purpose.
> ##In this package we had implemented a simple method for the control of the error based on FDR"
> getThreshold(winscores=values(win)$score,permutatedScores=nulldist,FDR=.01)
threshold Error_type_I FDR
21 3.29 0 0
Warning message:
In getThreshold(winscores = values(win)$score, permutatedScores = nulldist, :
The number of permutated scores is low.
>
>
>
>
>
>
>
> dev.off()
null device
1
>