Last data update: 2014.03.03

R: Calculates CpG enrichment of provided short reads compared to...
MEDIPS.CpGenrichR Documentation

Calculates CpG enrichment of provided short reads compared to the reference genome.

Description

As a quality check for the enrichment of CpG rich DNA fragments obtained by the immunoprecipitation step of a MeDIP experiment, this function provides the functionality to calculate CpG enrichment values. The main idea is to check, how strong the regions are enriched for CpGs compared to the reference genome. For this, the function counts the number of Cs, the number of Gs, the number CpGs, and the total number of bases within the stated reference genome. Subsequently, the function calculates the relative frequency of CpGs and the observed/expected ratio of CpGs present in the reference genome. Additionally, the function calculates the same for the DNA sequences underlying the given regions. The final enrichment values result by dividing the relative frequency of CpGs (or the observed/expected value, respectively) of the regions by the relative frequency of CpGs (or the observed/expected value, respectively) of the reference genome.

Usage

MEDIPS.CpGenrich(file=NULL, BSgenome=NULL, extend=0, shift=0, uniq=1e-3, chr.select=NULL, paired=F, bwa = FALSE)

Arguments

file

Path and file name of the input data

BSgenome

The reference genome name as defined by BSgenome

extend

defines the number of bases by which the region will be extended before the genome vector is calculated. Regions will be extended along the plus or the minus strand as defined by their provided strand information.

shift

As an alternative to the extend parameter, the shift parameter can be specified. Here, the reads are not extended but shifted by the specified number of nucleotides with respect to the given strand infomation. One of the two parameters extend or shift has to be 0.

uniq

The uniq parameter determines, if all reads mapping to exactly the same genomic position should be kept (uniq = 0), replaced by only one representative (uniq = 1), or if the number of stacked reads should be capped by a maximal number of stacked reads per genomic position determined by a poisson distribution of stacked reads genome wide and by a given p-value (1 > uniq > 0) (deafult: 1e-3).

chr.select

only data at the specified chromosomes will be processed.

paired

option for paired end reads

bwa

Indicates, if the alignment file has been generated by bwa (default=FALSE). Enabling bwa allows that the first mate of pairs can be the 'left' or the 'right' mate.

Value

regions.CG

the numbe of CpGs within the regions

regions.C

the number of Cs within the regions

regions.G

the number of Gs within the regions

regions.relH

the relative frequency of CpGs within the regions

regions.GoGe

the observed/expected ratio of CpGs within the regions

genome.CG

the numbe of CpGs within the reference genome

genome.C

the number of Cs within the reference genome

genome.G

the number of Gs within the reference genome

genome.relH

the relative frequency of CpGs within the reference genome

genome.GoGe

the observed/expected ratio of CpGs within the reference genome

enrichment.score.relH

regions.relH/genome.relH

enrichment.score.GoGe

regions.GoGe/genome.GoGe

Author(s)

Joern Dietrich and Matthias Lienhard

Examples


library(MEDIPSData)
library("BSgenome.Hsapiens.UCSC.hg19")
bam.file.hESCs.Rep1.MeDIP = system.file("extdata", "hESCs.MeDIP.Rep1.chr22.bam", package="MEDIPSData")

#er=MEDIPS.CpGenrich(file=bam.file.hESCs.Rep1.MeDIP, BSgenome="BSgenome.Hsapiens.UCSC.hg19", chr.select="chr22", extend=0, shift=0, uniq=1e-3)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(MEDIPS)
Loading required package: BSgenome
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
Loading required package: Rsamtools
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/MEDIPS/MEDIPS.CpGenrich.Rd_%03d_medium.png", width=480, height=480)
> ### Name: MEDIPS.CpGenrich
> ### Title: Calculates CpG enrichment of provided short reads compared to
> ###   the reference genome.
> ### Aliases: MEDIPS.CpGenrich
> 
> ### ** Examples
> 
> 
> library(MEDIPSData)
> library("BSgenome.Hsapiens.UCSC.hg19")
> bam.file.hESCs.Rep1.MeDIP = system.file("extdata", "hESCs.MeDIP.Rep1.chr22.bam", package="MEDIPSData")
> 
> #er=MEDIPS.CpGenrich(file=bam.file.hESCs.Rep1.MeDIP, BSgenome="BSgenome.Hsapiens.UCSC.hg19", chr.select="chr22", extend=0, shift=0, uniq=1e-3)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>