Last data update: 2014.03.03

R: make differential binding sites data frame
makeCountSetR Documentation

make differential binding sites data frame

Description

This is an utility function to create a data frame. The data frame contains binding sites merged by peaks from two conditions, count ChIP read counts, smoothing control counts for each candidate region, and indicate the common peaks from two conditions.

Usage

	makeCountSet(conf,design,filetype,species,peak.center=FALSE,peak.ext=0,binsize=50,mva.span=c(1000,5000,10000))

Arguments

conf

A data frame that represents the ChIP experiments information. It contains 6 columns,sampleID,condition,factor,ipReads,ctReads,peaks. condition refers to treatment condition or cell line; factor refers to transcription factor or histone modification; ipReads is the ChIP sequence data in bam or bed format; ctReads is the control sequence data in bam or bed format; peaks is the called peaks from existing peak-calling software.

design

Two column design matrix. The number of rows equals number of ChIP samples from two conditions. The first column are all 1s, which indicates intercept in regression model. The second column are 1s for one condition and 0s for another condition.

filetype

Two sequence file types are supported (bed or bam).

species

Two species are supported (hg19 or mm9).

peak.center

This argument is coupled with peak.ext. Default is FALSE. The argument is used when centered regions of peaks are more of interest.

peak.ext

This argument is coupled with peak.center. Default is 0.

binsize

binsize in bp to calculate the smooth local lambda in poisson distribution. The default is 50bp.

mva.span

1 kb, 5 kb or 10 kb window centered at the peak location in the control sample.

Value

A object ChIPComp. Column chr,start,end are the binding site genomic coordinate; Column ip_c(#condition)_r(#replicate) indicates the ChIP counts in #replicate in #condition; Column ct_c(#condition)_r(#replicate) indicates the smoothing control counts in #replicate in #condition; Column commonPeak indicates the common binding sites.

Examples

	conf=data.frame(
		SampleID=1:4,
		condition=c("Helas3","Helas3","K562","K562"),
		factor=c("H3k27ac","H3k27ac","H3k27ac","H3k27ac"),
		ipReads=system.file("extdata",c("Helas3.ip1.bed","Helas3.ip2.bed","K562.ip1.bed","K562.ip2.bed"),package="ChIPComp"),
		ctReads=system.file("extdata",c("Helas3.ct.bed","Helas3.ct.bed","K562.ct.bed","K562.ct.bed"),package="ChIPComp"),
		peaks=system.file("extdata",c("Helas3.peak.bed","Helas3.peak.bed","K562.peak.bed","K562.peak.bed"),package="ChIPComp")
	)
	design=as.data.frame(lapply(conf[,c("condition","factor")],as.numeric))-1
	design=as.data.frame(model.matrix(~condition,design))
	countSet=makeCountSet(conf,design,filetype="bed", species="hg19",binsize=1000)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ChIPComp)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/ChIPComp/makeCountSet.Rd_%03d_medium.png", width=480, height=480)
> ### Name: makeCountSet
> ### Title: make differential binding sites data frame
> ### Aliases: makeCountSet
> 
> ### ** Examples
> 
> 	conf=data.frame(
+ 		SampleID=1:4,
+ 		condition=c("Helas3","Helas3","K562","K562"),
+ 		factor=c("H3k27ac","H3k27ac","H3k27ac","H3k27ac"),
+ 		ipReads=system.file("extdata",c("Helas3.ip1.bed","Helas3.ip2.bed","K562.ip1.bed","K562.ip2.bed"),package="ChIPComp"),
+ 		ctReads=system.file("extdata",c("Helas3.ct.bed","Helas3.ct.bed","K562.ct.bed","K562.ct.bed"),package="ChIPComp"),
+ 		peaks=system.file("extdata",c("Helas3.peak.bed","Helas3.peak.bed","K562.peak.bed","K562.peak.bed"),package="ChIPComp")
+ 	)
> 	design=as.data.frame(lapply(conf[,c("condition","factor")],as.numeric))-1
> 	design=as.data.frame(model.matrix(~condition,design))
> 	countSet=makeCountSet(conf,design,filetype="bed", species="hg19",binsize=1000)
Making peak list......

Making ip counts......
Making control counts......

> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>