R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Visualize the mean-variance relationship in DGE data using...

dglmStdResid

R Documentation

Visualize the mean-variance relationship in DGE data using standardized residuals

Description

Appropriate modelling of the mean-variance relationship in DGE data is important for making inferences about differential expression. However, the standard approach to visualizing the mean-variance relationship is not appropriate for general, complicated experimental designs that require generalized linear models (GLMs) for analysis. Here are functions to compute standardized residuals from a Poisson GLM and plot them for bins based on overall expression level of genes as a way to visualize the mean-variance relationship. A rough estimate of the dispersion parameter can also be obtained from the standardized residuals.

Usage

dglmStdResid(y, design, dispersion=0, offset=0, nbins=100, make.plot=TRUE,
          xlab="Mean", ylab="Ave. binned standardized residual", ...)
getDispersions(binned.object)

Arguments

`y`	numeric matrix of counts, each row represents one genes, each column represents one DGE library.
`design`	numeric matrix giving the design matrix of the GLM. Assumed to be full column rank.
`dispersion`	numeric scalar or vector giving the dispersion parameter for each GLM. Can be a scalar giving one value for all genes, or a vector of length equal to the number of genes giving genewise dispersions.
`offset`	numeric vector or matrix giving the offset that is to be included in teh log-linear model predictor. Can be a vector of length equal to the number of libraries, or a matrix of the same size as `y`.
`nbins`	scalar giving the number of bins (formed by using the quantiles of the genewise mean expression levels) for which to compute average means and variances for exploring the mean-variance relationship. Default is `100` bins
`make.plot`	logical, whether or not to plot the mean standardized residual for binned data (binned on expression level). Provides a visualization of the mean-variance relationship. Default is `TRUE`.
`xlab`	character string giving the label for the x-axis. Standard graphical parameter. If left as the default, then the x-axis label will be set to "Mean".
`ylab`	character string giving the label for the y-axis. Standard graphical parameter. If left as the default, then the y-axis label will be set to "Ave. binned standardized residual".
`...`	further arguments passed on to `plot`
`binned.object`	list object, which is the output of `dglmStdResid`.

Details

This function is useful for exploring the mean-variance relationship in the data. Raw or pooled variances cannot be used for complex experimental designs, so instead we can fit a Poisson model using the appropriate design matrix to each gene and use the standardized residuals in place of the pooled variance (as in plotMeanVar) to visualize the mean-variance relationship in the data. The function will plot the average standardized residual for observations split into nbins bins by overall expression level. This provides a useful summary of how the variance of the counts change with respect to average expression level (abundance). A line showing the Poisson mean-variance relationship (mean equals variance) is always shown to illustrate how the genewise variances may differ from a Poisson mean-variance relationship. A log-log scale is used for the plot.

The function mglmLS is used to fit the Poisson models to the data. This code is fast for fitting models, but does not compute the value for the leverage, technically required to compute the standardized residuals. Here, we approximate the standardized residuals by replacing the usual denominator of ( 1 - leverage ) by ( 1 - p/n ) , where n is the number of observations per gene (i.e. number of libraries) and p is the number of parameters in the model (i.e. number of columns in the full-rank design matrix.

Value

dglmStdResid produces a mean-variance plot based on standardized residuals from a Poisson model fit for each gene for the DGE data. dglmStdResid returns a list with the following elements:

`ave.means`	vector of the average expression level within each bin of observations
`ave.std.resid`	vector of the average standardized Poisson residual within each bin of genes
`bin.means`	list containing the average (mean) expression level (given by the fitted value from the given Poisson model) for observations divided into bins based on amount of expression
`bin.std.resid`	list containing the standardized residual from the given Poisson model for observations divided into bins based on amount of expression
`means`	vector giving the fitted value for each observed count
`standardized.residuals`	vector giving approximate standardized residual for each observed count
`bins`	list containing the indices for the observations, assigning them to bins
`nbins`	scalar giving the number of bins used to split up the observed counts
`ngenes`	scalar giving the number of genes in the dataset
`nlibs`	scalar giving the number of libraries in the dataset

getDispersions computes the dispersion from the standardized residuals and returns a list with the following components:

`bin.dispersion`	vector giving the estimated dispersion value for each bin of observed counts, computed using the average standardized residual for the bin
`bin.dispersion.used`	vector giving the actual estimated dispersion value to be used. Some computed dispersions using the method in this function can be negative, which is not allowed. We use the dispersion value from the nearest bin of higher expression level with positive dispersion value in place of any negative dispersions.
`dispersion`	vector giving the estimated dispersion for each observation, using the binned dispersion estimates from above, so that all of the observations in a given bin get the same dispersion value.

Author(s)

Davis McCarthy

Examples

y <- matrix(rnbinom(1000,mu=10,size=2),ncol=4)
design <- model.matrix(~c(0,0,1,1)+c(0,1,0,1))
binned <- dglmStdResid(y, design, dispersion=0.5)

getDispersions(binned)$bin.dispersion.used # Look at the estimated dispersions for the bins

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(edgeR)
Loading required package: limma
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/edgeR/dglmStdResid.Rd_%03d_medium.png", width=480, height=480)
> ### Name: dglmStdResid
> ### Title: Visualize the mean-variance relationship in DGE data using
> ###   standardized residuals
> ### Aliases: dglmStdResid getDispersions
> ### Keywords: algebra
> 
> ### ** Examples
> 
> y <- matrix(rnbinom(1000,mu=10,size=2),ncol=4)
> design <- model.matrix(~c(0,0,1,1)+c(0,1,0,1))
> binned <- dglmStdResid(y, design, dispersion=0.5)
Warning messages:
1: In min(whichbin[bin.dispersion > 0 & whichbin > i]) :
  no non-missing arguments to min; returning Inf
2: In min(whichbin[bin.dispersion > 0 & whichbin > i]) :
  no non-missing arguments to min; returning Inf
3: In min(whichbin[bin.dispersion > 0 & whichbin > i]) :
  no non-missing arguments to min; returning Inf
> 
> getDispersions(binned)$bin.dispersion.used # Look at the estimated dispersions for the bins
  [1] 19.98938394  4.71307096  5.49563912  6.71805942  3.95901365  1.04682204
  [7]  1.80670537  1.69478598  1.89507004  1.89507004  0.21216461  0.94434655
 [13]  0.63858948  0.08535751  1.18406655  0.98930931  1.20819231  0.68686045
 [19]  0.67565325  1.46138704  0.99853412  0.84708629  0.83045598  0.78656630
 [25]  0.31943144  0.83152857  0.70377808  0.70435973  0.91166486  0.42157823
 [31]  1.00138447  0.41867679  0.80476463  0.29012345  0.83032177  0.68802918
 [37]  0.87516781  0.96865590  0.22635772  0.81333056  0.62841499  0.99145433
 [43]  0.29392226  0.77038925  1.07763105  0.38975102  0.50736420  0.06660492
 [49]  0.30582054  0.43739748  0.23135342  0.53415871  0.19718633  0.52737767
 [55]  0.17068176  0.07264911  0.47442999  0.51051632  0.27248010  0.40858051
 [61]  0.42702424  0.57233907  0.02707696  0.48042543  0.26078193  0.16871531
 [67]  0.24054954  0.31986116  0.76014445  0.68407710  0.30682413  0.41049085
 [73]  0.52939647  0.27207018  0.38458964  0.22810621  0.13818935  0.15067414
 [79]  0.64545962  0.31527269  0.42389239  0.07207780  0.07258533  0.12262029
 [85]  0.15538191  0.32729914  0.11175629  0.07781707  0.12519266  0.42521322
 [91]  0.02239131  0.07901273  0.24908874  0.01247387  0.13166216  0.09142464
 [97]  0.02804699          NA          NA          NA
Warning messages:
1: In min(whichbin[bin.dispersion > 0 & whichbin > i]) :
  no non-missing arguments to min; returning Inf
2: In min(whichbin[bin.dispersion > 0 & whichbin > i]) :
  no non-missing arguments to min; returning Inf
3: In min(whichbin[bin.dispersion > 0 & whichbin > i]) :
  no non-missing arguments to min; returning Inf
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>