Last data update: 2014.03.03

R: Aggregates data by genomic bins
avgByBinR Documentation

Aggregates data by genomic bins

Description

Computed mean value of binned data. This function assumes that all elements in featureData have identical width. If provided with elements of disparate widths, the respective widths are not weighted averaging. This behaviour may change in future versions of IdeoViz.

Usage

avgByBin(xpr, featureData, target_GR, justReturnBins = FALSE, 
    getBinCountOnly = FALSE, FUN = mean, doSampleCor = FALSE, 
    verbose = FALSE)

Arguments

xpr

(data.frame or matrix) Locus-wise values. Rows correspond to genomic intervals (probes, genes, etc.,) while columns correspond to individual samples

featureData

(data.frame or GRanges) Locus coordinates. Row order must match xpr. Column order should be: 1. chrom, 2. locus start, 3. locus end. All elements are assumed to be of identical width. Coordinates must be zero-based or one-based, but not half-open. Coordinate system must match that of target_GR.

target_GR

(GRanges) Target intervals, with coordinate system matching that of featureData.

justReturnBins

(logical) when TRUE, returns the coordinates of the bin to which each row belongs. Does not aggregate data in any way. This output can be used as input for more complex functions with data from each bin.

getBinCountOnly

(logical) when TRUE, does not aggregate or expect xpr. Only returns number of overlapping subject ranges per bin. Speeds up computation.

FUN

(function) function to aggregate data in bin

doSampleCor

(logical) set to TRUE to compute mean pairwise sample correlation (Pearson correlation) for each bin; when TRUE, this function overrides FUN.

verbose

(logical) print status messages

Details

This function allows the user to bin data if this hasn't already been done, and is a step involved in preparing the data for plotOnIdeo(). This function computes binned within-sample average of probes overlapping the same range. Where a range overlaps multiple bins, it gets counted in all.

Value

(GRanges) Binned data or binning statistics; information returned for non-empty bins only. The default for this function is to return binned data; alternately, if justReturnBins=TRUE or getBinCountOnly=TRUE the function will return statistics on bin counts. The latter may be useful to plot spatial density of the input metric.
The flags and output types are presented in order of evaluation precedence:

  1. If getBinCountOnly=TRUE, returns a list with a single entry: bin_ID: (data.frame) bin information: chrom, start, end, width, strand, index, and count. "index" is the row number of target_GR to which this bin corresponds

  2. If justReturnBins=TRUE and getBinCountOnly=FALSE, returns a list with three entries:

    1. bin_ID: same as bin_ID in output 1 above

    2. xpr:(data.frame) B-by-n columns where B is total number of [target_GR,featureData] overlaps (see next entry, binmap_idx) and n is number of columns in xpr; column order matches xpr. Contains sample-wise data "flattened" so that each [target,subject] pair is presented. More formally, entry [i,j] contains expression for overlap of row i from binmap_idx for sample j (where 1 <= i <= B, 1 <= j <= n)

    3. binmap_idx:(matrix) two-column matrix: 1) target_GR row, 2) row of featureData which overlaps with index in column 1. (matrix output of GenomicRanges::findOverlaps()))

  3. Default: If justReturnBins=FALSE and getBinCountOnly=FALSE, returns a GRanges object. Results are contained in the elementMetadata slot. For a dataset with n samples, the table would have (n+1) columns; the first column is bin_count, and indicates number of units contained in that bin. Columns (2:(n+1)) contain binned values for each sample in column order corresponding to that of xpr.
    For doSampleCor=TRUE, result is in a metadata column with name "mean_pairwise"cor". Bins with a single datapoint per sample get a value of NA.

Author(s)

Shraddha Pai <Shraddha.Pai@camh.ca>, Jingliang Ren

See Also

getIdeo(), getBins()

Examples

ideo_hg19 <- getIdeo("hg19")
data(GSM733664_broadPeaks)
chrom_bins <- getBins(c("chr1","chr2","chrX"), ideo_hg19,stepSize=5*100*1000)
# default binning
mean_peak <- avgByBin(data.frame(value=GSM733664_broadPeaks[,7]),  GSM733664_broadPeaks[,1:3], chrom_bins)
# custom function
median_peak <- avgByBin(data.frame(value=GSM733664_broadPeaks[,7]), GSM733664_broadPeaks[,1:3], chrom_bins, FUN=median)
# mean pairwise sample correlation
data(binned_multiSeries)
bins2 <- getBins(c("chr1"), ideo_hg19, stepSize=5e6)
samplecor <- avgByBin(mcols(binned_multiSeries)[,1:3], binned_multiSeries, bins2, doSampleCor=TRUE)
# just get bin count
binstats <- avgByBin(data.frame(value=GSM733664_broadPeaks[,7]), GSM733664_broadPeaks[,1:3], chrom_bins, getBinCountOnly=TRUE)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(IdeoViz)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: RColorBrewer
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/IdeoViz/avgByBin.Rd_%03d_medium.png", width=480, height=480)
> ### Name: avgByBin
> ### Title: Aggregates data by genomic bins
> ### Aliases: avgByBin
> 
> ### ** Examples
> 
> ideo_hg19 <- getIdeo("hg19")
Error in `genome<-`(`*tmp*`, value = "hg19") : 
  Failed to set session genome to 'hg19'
Calls: getIdeo -> genome<- -> genome<-
Execution halted