Last data update: 2014.03.03

R: Visualize the mean-variance relationship in DGE data using...
dglmStdResidR Documentation

Visualize the mean-variance relationship in DGE data using standardized residuals

Description

Appropriate modelling of the mean-variance relationship in DGE data is important for making inferences about differential expression. However, the standard approach to visualizing the mean-variance relationship is not appropriate for general, complicated experimental designs that require generalized linear models (GLMs) for analysis. Here are functions to compute standardized residuals from a Poisson GLM and plot them for bins based on overall expression level of genes as a way to visualize the mean-variance relationship. A rough estimate of the dispersion parameter can also be obtained from the standardized residuals.

Usage

dglmStdResid(y, design, dispersion=0, offset=0, nbins=100, make.plot=TRUE,
          xlab="Mean", ylab="Ave. binned standardized residual", ...)
getDispersions(binned.object)

Arguments

y

numeric matrix of counts, each row represents one genes, each column represents one DGE library.

design

numeric matrix giving the design matrix of the GLM. Assumed to be full column rank.

dispersion

numeric scalar or vector giving the dispersion parameter for each GLM. Can be a scalar giving one value for all genes, or a vector of length equal to the number of genes giving genewise dispersions.

offset

numeric vector or matrix giving the offset that is to be included in teh log-linear model predictor. Can be a vector of length equal to the number of libraries, or a matrix of the same size as y.

nbins

scalar giving the number of bins (formed by using the quantiles of the genewise mean expression levels) for which to compute average means and variances for exploring the mean-variance relationship. Default is 100 bins

make.plot

logical, whether or not to plot the mean standardized residual for binned data (binned on expression level). Provides a visualization of the mean-variance relationship. Default is TRUE.

xlab

character string giving the label for the x-axis. Standard graphical parameter. If left as the default, then the x-axis label will be set to "Mean".

ylab

character string giving the label for the y-axis. Standard graphical parameter. If left as the default, then the y-axis label will be set to "Ave. binned standardized residual".

...

further arguments passed on to plot

binned.object

list object, which is the output of dglmStdResid.

Details

This function is useful for exploring the mean-variance relationship in the data. Raw or pooled variances cannot be used for complex experimental designs, so instead we can fit a Poisson model using the appropriate design matrix to each gene and use the standardized residuals in place of the pooled variance (as in plotMeanVar) to visualize the mean-variance relationship in the data. The function will plot the average standardized residual for observations split into nbins bins by overall expression level. This provides a useful summary of how the variance of the counts change with respect to average expression level (abundance). A line showing the Poisson mean-variance relationship (mean equals variance) is always shown to illustrate how the genewise variances may differ from a Poisson mean-variance relationship. A log-log scale is used for the plot.

The function mglmLS is used to fit the Poisson models to the data. This code is fast for fitting models, but does not compute the value for the leverage, technically required to compute the standardized residuals. Here, we approximate the standardized residuals by replacing the usual denominator of ( 1 - leverage ) by ( 1 - p/n ) , where n is the number of observations per gene (i.e. number of libraries) and p is the number of parameters in the model (i.e. number of columns in the full-rank design matrix.

Value

dglmStdResid produces a mean-variance plot based on standardized residuals from a Poisson model fit for each gene for the DGE data. dglmStdResid returns a list with the following elements:

ave.means

vector of the average expression level within each bin of observations

ave.std.resid

vector of the average standardized Poisson residual within each bin of genes

bin.means

list containing the average (mean) expression level (given by the fitted value from the given Poisson model) for observations divided into bins based on amount of expression

bin.std.resid

list containing the standardized residual from the given Poisson model for observations divided into bins based on amount of expression

means

vector giving the fitted value for each observed count

standardized.residuals

vector giving approximate standardized residual for each observed count

bins

list containing the indices for the observations, assigning them to bins

nbins

scalar giving the number of bins used to split up the observed counts

ngenes

scalar giving the number of genes in the dataset

nlibs

scalar giving the number of libraries in the dataset

getDispersions computes the dispersion from the standardized residuals and returns a list with the following components:

bin.dispersion

vector giving the estimated dispersion value for each bin of observed counts, computed using the average standardized residual for the bin

bin.dispersion.used

vector giving the actual estimated dispersion value to be used. Some computed dispersions using the method in this function can be negative, which is not allowed. We use the dispersion value from the nearest bin of higher expression level with positive dispersion value in place of any negative dispersions.

dispersion

vector giving the estimated dispersion for each observation, using the binned dispersion estimates from above, so that all of the observations in a given bin get the same dispersion value.

Author(s)

Davis McCarthy

See Also

plotMeanVar, plotMDS.DGEList, plotSmear and maPlot provide more ways of visualizing DGE data.

Examples

y <- matrix(rnbinom(1000,mu=10,size=2),ncol=4)
design <- model.matrix(~c(0,0,1,1)+c(0,1,0,1))
binned <- dglmStdResid(y, design, dispersion=0.5)

getDispersions(binned)$bin.dispersion.used # Look at the estimated dispersions for the bins

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(edgeR)
Loading required package: limma
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/edgeR/dglmStdResid.Rd_%03d_medium.png", width=480, height=480)
> ### Name: dglmStdResid
> ### Title: Visualize the mean-variance relationship in DGE data using
> ###   standardized residuals
> ### Aliases: dglmStdResid getDispersions
> ### Keywords: algebra
> 
> ### ** Examples
> 
> y <- matrix(rnbinom(1000,mu=10,size=2),ncol=4)
> design <- model.matrix(~c(0,0,1,1)+c(0,1,0,1))
> binned <- dglmStdResid(y, design, dispersion=0.5)
Warning messages:
1: In min(whichbin[bin.dispersion > 0 & whichbin > i]) :
  no non-missing arguments to min; returning Inf
2: In min(whichbin[bin.dispersion > 0 & whichbin > i]) :
  no non-missing arguments to min; returning Inf
3: In min(whichbin[bin.dispersion > 0 & whichbin > i]) :
  no non-missing arguments to min; returning Inf
> 
> getDispersions(binned)$bin.dispersion.used # Look at the estimated dispersions for the bins
  [1] 19.98938394  4.71307096  5.49563912  6.71805942  3.95901365  1.04682204
  [7]  1.80670537  1.69478598  1.89507004  1.89507004  0.21216461  0.94434655
 [13]  0.63858948  0.08535751  1.18406655  0.98930931  1.20819231  0.68686045
 [19]  0.67565325  1.46138704  0.99853412  0.84708629  0.83045598  0.78656630
 [25]  0.31943144  0.83152857  0.70377808  0.70435973  0.91166486  0.42157823
 [31]  1.00138447  0.41867679  0.80476463  0.29012345  0.83032177  0.68802918
 [37]  0.87516781  0.96865590  0.22635772  0.81333056  0.62841499  0.99145433
 [43]  0.29392226  0.77038925  1.07763105  0.38975102  0.50736420  0.06660492
 [49]  0.30582054  0.43739748  0.23135342  0.53415871  0.19718633  0.52737767
 [55]  0.17068176  0.07264911  0.47442999  0.51051632  0.27248010  0.40858051
 [61]  0.42702424  0.57233907  0.02707696  0.48042543  0.26078193  0.16871531
 [67]  0.24054954  0.31986116  0.76014445  0.68407710  0.30682413  0.41049085
 [73]  0.52939647  0.27207018  0.38458964  0.22810621  0.13818935  0.15067414
 [79]  0.64545962  0.31527269  0.42389239  0.07207780  0.07258533  0.12262029
 [85]  0.15538191  0.32729914  0.11175629  0.07781707  0.12519266  0.42521322
 [91]  0.02239131  0.07901273  0.24908874  0.01247387  0.13166216  0.09142464
 [97]  0.02804699          NA          NA          NA
Warning messages:
1: In min(whichbin[bin.dispersion > 0 & whichbin > i]) :
  no non-missing arguments to min; returning Inf
2: In min(whichbin[bin.dispersion > 0 & whichbin > i]) :
  no non-missing arguments to min; returning Inf
3: In min(whichbin[bin.dispersion > 0 & whichbin > i]) :
  no non-missing arguments to min; returning Inf
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>