Last data update: 2014.03.03

R: Calculates a normalized correlation score from ChIP-seq and...
integrateDataR Documentation

Calculates a normalized correlation score from ChIP-seq and microarray gene expression data.

Description

This function calculates the product of the standardized differences between two conditions in ChIP-seq data and the respective standardized differences in gene expression data. A score close to zero means that there are no (large) differences in at least one of the two data sets. If the score is positive, equally directed differences exist in both data sets. In case of a negative score, differences have unequal signs in the two data sets.

Usage

integrateData(expr, chipseq, factor, reference)

Arguments

expr

An ExpressionSet holding the gene expression data.

chipseq

A ChIPseqSet holding the ChIP-seq data.

factor

A character giving the name of the factor that describes the conditions to be compared. The factor must be present in the pheno data slot of the objects expr and chipseq. Further, the factor must have exactly two levels and the level names must be the same in both objects.

reference

Optionally, the name of the factor level that should be used as reference. If missing, the first level of factor in the object expr is used.

Details

Let A and B denote the gene expression value of one probe in the group of interest and in the reference group defined by the argument reference. And let X and Y be the ChIP-seq values assigned to that probe. This functions returnes for each probe

Z = (A-B)/σ_{ge} \times (X-Y)/σ_{chip},

where σ_{ge} is the standard deviation estimated from all observed difference in the gene expression data and σ_{chip} the standard deviation in the ChIP-seq data.

If there is more than one sample in any group and data set, the average of the replicates is calcuated first and than plugged into the formula above.

Not all features in expr must also be in chipseq and vice versa. Features present in only one of the two data types are omitted.

Value

A matrix with five columns. The first 4 columns store the (average) expression values and the (average) ChIP-seq values for each of the two conditions. The fith columns store the correlation score. The row names equal common feature names of expr and chipseq.

Author(s)

Hans-Ulrich Klein (h.klein@uni-muenster.de)

See Also

summarizeReads normalizeChIP

Examples

ge <- matrix(c(5,12,5,11,11,10,12,11), nrow=2)
row.names(ge) <- c("100_at", "200_at")
colnames(ge) <- c("c1", "c2", "t1", "t2")
geDf <- data.frame(status=c("control", "control", "treated", "treated"),
  row.names=colnames(ge))
eSet <- ExpressionSet(ge, phenoData=AnnotatedDataFrame(geDf))

chip <- matrix(c(10,20,20,22), nrow=2)
row.names(chip) <- c("100_at", "200_at")
colnames(chip) <- c("c", "t")
rowRanges <- GRanges(IRanges(start=c(10,50), end=c(20,60)), seqnames=c("1","1"))
names(rowRanges) = c("100_at", "200_at")
chipDf <- DataFrame(status=factor(c("control", "treated")),
  totalCount=c(100, 100),
  row.names=colnames(chip))
cSet <- ChIPseqSet(chipVals=chip, rowRanges=rowRanges, colData=chipDf)

integrateData(eSet, cSet, factor="status", reference="control")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(epigenomix)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/epigenomix/integrateData.Rd_%03d_medium.png", width=480, height=480)
> ### Name: integrateData
> ### Title: Calculates a normalized correlation score from ChIP-seq and
> ###   microarray gene expression data.
> ### Aliases: integrateData
> ###   integrateData,ExpressionSet,ChIPseqSet,character,character-method
> ###   integrateData,ExpressionSet,ChIPseqSet,character,missing-method
> ###   integrateData,ExpressionSetIllumina,ChIPseqSet,character,character-method
> ###   integrateData,ExpressionSetIllumina,ChIPseqSet,character,missing-method
> 
> ### ** Examples
> 
> ge <- matrix(c(5,12,5,11,11,10,12,11), nrow=2)
> row.names(ge) <- c("100_at", "200_at")
> colnames(ge) <- c("c1", "c2", "t1", "t2")
> geDf <- data.frame(status=c("control", "control", "treated", "treated"),
+   row.names=colnames(ge))
> eSet <- ExpressionSet(ge, phenoData=AnnotatedDataFrame(geDf))
> 
> chip <- matrix(c(10,20,20,22), nrow=2)
> row.names(chip) <- c("100_at", "200_at")
> colnames(chip) <- c("c", "t")
> rowRanges <- GRanges(IRanges(start=c(10,50), end=c(20,60)), seqnames=c("1","1"))
> names(rowRanges) = c("100_at", "200_at")
> chipDf <- DataFrame(status=factor(c("control", "treated")),
+   totalCount=c(100, 100),
+   row.names=colnames(chip))
> cSet <- ChIPseqSet(chipVals=chip, rowRanges=rowRanges, colData=chipDf)
> 
> integrateData(eSet, cSet, factor="status", reference="control")
       expr_treated expr_control chipseq_treated chipseq_control           z
100_at         11.5          5.0              20              10  2.16666667
200_at         10.5         11.5              22              20 -0.06666667
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>