R Graphical Manual

Browse All

Last data update: 2014.03.03

R: make differential binding sites data frame

makeCountSet

R Documentation

make differential binding sites data frame

Description

This is an utility function to create a data frame. The data frame contains binding sites merged by peaks from two conditions, count ChIP read counts, smoothing control counts for each candidate region, and indicate the common peaks from two conditions.

Usage

	makeCountSet(conf,design,filetype,species,peak.center=FALSE,peak.ext=0,binsize=50,mva.span=c(1000,5000,10000))

Arguments

`conf`	A data frame that represents the ChIP experiments information. It contains 6 columns,`sampleID`,`condition`,`factor`,`ipReads`,`ctReads`,`peaks`. `condition` refers to treatment condition or cell line; `factor` refers to transcription factor or histone modification; `ipReads` is the ChIP sequence data in bam or bed format; `ctReads` is the control sequence data in bam or bed format; `peaks` is the called peaks from existing peak-calling software.
`design`	Two column design matrix. The number of rows equals number of ChIP samples from two conditions. The first column are all 1s, which indicates intercept in regression model. The second column are 1s for one condition and 0s for another condition.
`filetype`	Two sequence file types are supported (bed or bam).
`species`	Two species are supported (hg19 or mm9).
`peak.center`	This argument is coupled with `peak.ext`. Default is FALSE. The argument is used when centered regions of peaks are more of interest.
`peak.ext`	This argument is coupled with `peak.center`. Default is 0.
`binsize`	binsize in bp to calculate the smooth local lambda in poisson distribution. The default is 50bp.
`mva.span`	1 kb, 5 kb or 10 kb window centered at the peak location in the control sample.

Value

A object ChIPComp. Column chr,start,end are the binding site genomic coordinate; Column ip_c(#condition)_r(#replicate) indicates the ChIP counts in #replicate in #condition; Column ct_c(#condition)_r(#replicate) indicates the smoothing control counts in #replicate in #condition; Column commonPeak indicates the common binding sites.

Examples

	conf=data.frame(
		SampleID=1:4,
		condition=c("Helas3","Helas3","K562","K562"),
		factor=c("H3k27ac","H3k27ac","H3k27ac","H3k27ac"),
		ipReads=system.file("extdata",c("Helas3.ip1.bed","Helas3.ip2.bed","K562.ip1.bed","K562.ip2.bed"),package="ChIPComp"),
		ctReads=system.file("extdata",c("Helas3.ct.bed","Helas3.ct.bed","K562.ct.bed","K562.ct.bed"),package="ChIPComp"),
		peaks=system.file("extdata",c("Helas3.peak.bed","Helas3.peak.bed","K562.peak.bed","K562.peak.bed"),package="ChIPComp")
	)
	design=as.data.frame(lapply(conf[,c("condition","factor")],as.numeric))-1
	design=as.data.frame(model.matrix(~condition,design))
	countSet=makeCountSet(conf,design,filetype="bed", species="hg19",binsize=1000)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ChIPComp)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/ChIPComp/makeCountSet.Rd_%03d_medium.png", width=480, height=480)
> ### Name: makeCountSet
> ### Title: make differential binding sites data frame
> ### Aliases: makeCountSet
> 
> ### ** Examples
> 
> 	conf=data.frame(
+ 		SampleID=1:4,
+ 		condition=c("Helas3","Helas3","K562","K562"),
+ 		factor=c("H3k27ac","H3k27ac","H3k27ac","H3k27ac"),
+ 		ipReads=system.file("extdata",c("Helas3.ip1.bed","Helas3.ip2.bed","K562.ip1.bed","K562.ip2.bed"),package="ChIPComp"),
+ 		ctReads=system.file("extdata",c("Helas3.ct.bed","Helas3.ct.bed","K562.ct.bed","K562.ct.bed"),package="ChIPComp"),
+ 		peaks=system.file("extdata",c("Helas3.peak.bed","Helas3.peak.bed","K562.peak.bed","K562.peak.bed"),package="ChIPComp")
+ 	)
> 	design=as.data.frame(lapply(conf[,c("condition","factor")],as.numeric))-1
> 	design=as.data.frame(model.matrix(~condition,design))
> 	countSet=makeCountSet(conf,design,filetype="bed", species="hg19",binsize=1000)
Making peak list......

Making ip counts......
Making control counts......

> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>