R: Find ChIP-enriched regions on smoothed ExpressionSet
Find ChIP-enriched regions on smoothed ExpressionSet


Given an ExpressionSet of smoothed probe intensities, an environment with the mapping of probes to chromosomes, and a vector of thresholds for calling genomic sites enriched, this function finds the 'chers' (ChIP-enriched regions) consisting of enriched genomic positions, with probes mapped to them. 'Adjacent' enriched positions are condensed into a single Cher.


findChersOnSmoothed(smoothedX, probeAnno, thresholds, allChr = NULL,
   distCutOff = 600, minProbesInRow = 3, cellType = NULL,
   antibodyColumn=NULL, checkUnique = TRUE, uniqueCodes = c(0),
   verbose = TRUE)



Object of class ExpressionSet holding the smoothed probe intensities, e.g. the result of function computeRunningMedians.


environment containing the probe to genome mapping


numeric vector of threshold above which smoothed probe intensities are considered to correspond to enriched probes. The vector has to be of length equal the number of samples in smoothedX, with a single threshold for each sample.


character vector of all chromosomes on which enriched regions are sought. Every chromosome here has to have probes mapped to it in the probeAnno environment. By default (NULL) the chromosomeNames of the probeAnno object are used.


integer; maximum amount of base pairs at which enriched probes are condensed into one Cher.


integer; minimum number of enriched probes required for a Cher; see details for further explanation.


character; name of cell type the data comes from, is either a. of length one indicating the column of pData(smoothedX) that holds the cell type OR b. of length one indicating the common cell type for all samples in the ExpressionSet OR c. of length equal to ncol(smoothedX) specifying the cell type of each sample individually.


the name or number of the column of the pData(smoothedX) that holds the description of the antibody used for each sample. This information is used to annotate found ChIP-enriched regions accordingly. If NULL (default), the sampleNames of smoothedX are used.


logical; indicates whether the uniqueness indicator of probe matches from the probeAnno environment should be used.


numeric; which numeric codes in the chromosome-wise match-uniqueness elements of the probeAnno environment indicate uniqueness?


logical; extended output to STDOUT?


Specifying a minimum number of probes for a Cher (argument minProbesInRow) guarantees that a Cher is supported by a reasonable number of measurements in probe-sparse regions. For example, if there's only one enriched probe within a certain genomic 1kb region and no other probes can been mapped to that region, this single probe does arguably not provide enough evidence for calling this genomic region enriched.


A list of class cherList, holding objects of class cher that were found on the supplied data.


Joern Toedling

See Also

cherByThreshold,computeRunningMedians, relateChers


  exDir <- system.file("exData",package="Ringo")
  smoothX <- computeRunningMedians(exampleX, probeAnno=exProbeAnno,
       modColumn = "Cy5", allChr = "9", winHalfSize = 400)
  chersX <- findChersOnSmoothed(smoothX, probeAnno=exProbeAnno,
       thresholds=0.45, allChr="9", distCutOff=600, cellType="human")
  if (interactive())
    plot(chersX[[1]], smoothX, probeAnno=exProbeAnno, gff=exGFF)
  chersX <- relateChers(chersX, exGFF)


>   exDir <- system.file("exData",package="Ringo")
>   load(file.path(exDir,"exampleProbeAnno.rda"))
>   load(file.path(exDir,"exampleX.rda"))
>   smoothX <- computeRunningMedians(exampleX, probeAnno=exProbeAnno,
+        modColumn = "Cy5", allChr = "9", winHalfSize = 400)

Chromosome 9 ...
Suz12_vs_total ... 
Construction result ExpressionSet...Done.
>   chersX <- findChersOnSmoothed(smoothX, probeAnno=exProbeAnno,
+        thresholds=0.45, allChr="9", distCutOff=600, cellType="human")

Sample: ...

Chr: 9 ...> #  if (interactive())
>     plot(chersX[[1]], smoothX, probeAnno=exProbeAnno, gff=exGFF)
>   chersX <- relateChers(chersX, exGFF)
Relating 2 ChIP-enriched regions to GFF:
                                name chr    start      end cellType
1   9 34318954 34319928    human
2   9 34579010 34582430    human
1 ENST00000379158 ENST00000379154 ENST00000379155 ENST00000346365 ENST00000337747
2                                                 ENST00000378980 ENST00000351266
  maxLevel     score
1 1.995891  69.47276
2 1.534150 104.44638
