Last data update: 2014.03.03

R: Predictive log-fold changes
predFCR Documentation

Predictive log-fold changes

Description

Computes estimated coefficients for a NB glm in such a way that the log-fold-changes are shrunk towards zero.

Usage

## S3 method for class 'DGEList'
predFC(y, design=NULL, prior.count=0.125, offset=NULL, dispersion=NULL, weights=NULL, ...)
## Default S3 method:
predFC(y, design=NULL, prior.count=0.125, offset=NULL, dispersion=0, weights=NULL, ...)

Arguments

y

a matrix of counts or a DGEList object

design

the design matrix for the experiment

prior.count

the average prior count to be added to each observation. Larger values produce more shrinkage.

offset

numeric vector or matrix giving the offset in the log-linear model predictor, as for glmFit. Usually equal to log library sizes.

dispersion

numeric vector of negative binomial dispersions.

weights

optional numeric matrix giving observation weights

...

other arguments are passed to glmFit.

Details

This function computes predictive log-fold changes (pfc) for a NB glm. The pfc are posterior Bayesian estimators of the true log-fold-changes. They are predictive of values that might be replicated in a future experiment.

Specifically the function adds a small prior count to each observation before estimating the glm. The actual prior count that is added is proportion to the library size. This has the effect that any log-fold-change that was zero prior to augmentation remains zero and non-zero log-fold-changes are shrunk towards zero.

The prior counts can be viewed as equivalent to a prior belief that the log-fold changes are small, and the output can be viewed as posterior log-fold-changes from this Bayesian viewpoint. The output coefficients are called predictive log fold-changes because, depending on the prior, they may be a better prediction of the true log fold-changes than the raw estimates.

Log-fold changes for genes with low counts are shrunk more than those for genes with high counts. In particular, infinite log-fold-changes arising from zero counts are avoided. The exact degree to which this is done depends on the negative binomail dispersion.

If design=NULL, then the function returns a matrix of the same size as y containing log2 counts-per-million, with zero values for the counts avoided. This equivalent to choosing design to be the identity matrix with the same number of columns as y.

Value

Numeric matrix of linear model coefficients (if design is given) or logCPM (if design=NULL) on the log2 scale.

Author(s)

Belinda Phipson and Gordon Smyth

References

Phipson, B. (2013). Empirical Bayes modelling of expression profiles and their associations. PhD Thesis. University of Melbourne, Australia. http://repository.unimelb.edu.au/10187/17614

See Also

glmFit, exactTest

Examples

# generate counts for a two group experiment with n=2 in each group and 100 genes
dispersion <- 0.1
y <- matrix(rnbinom(400,size=1/dispersion,mu=4),nrow=100)
y <- DGEList(y,group=c(1,1,2,2))
design <- model.matrix(~group, data=y$samples)

#estimate the predictive log fold changes
predlfc<-predFC(y,design,dispersion=dispersion,prior.count=1)
logfc <- predFC(y,design,dispersion=dispersion,prior.count=0)
logfc.truncated <- pmax(pmin(logfc,100),-100)

#plot predFC's vs logFC's
plot(predlfc[,2],logfc.truncated[,2],xlab="Predictive log fold changes",ylab="Raw log fold changes")
abline(a=0,b=1)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(edgeR)
Loading required package: limma
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/edgeR/predFC.Rd_%03d_medium.png", width=480, height=480)
> ### Name: predFC
> ### Title: Predictive log-fold changes
> ### Aliases: predFC predFC.DGEList predFC.default
> 
> ### ** Examples
> 
> # generate counts for a two group experiment with n=2 in each group and 100 genes
> dispersion <- 0.1
> y <- matrix(rnbinom(400,size=1/dispersion,mu=4),nrow=100)
> y <- DGEList(y,group=c(1,1,2,2))
> design <- model.matrix(~group, data=y$samples)
> 
> #estimate the predictive log fold changes
> predlfc<-predFC(y,design,dispersion=dispersion,prior.count=1)
> logfc <- predFC(y,design,dispersion=dispersion,prior.count=0)
> logfc.truncated <- pmax(pmin(logfc,100),-100)
> 
> #plot predFC's vs logFC's
> plot(predlfc[,2],logfc.truncated[,2],xlab="Predictive log fold changes",ylab="Raw log fold changes")
> abline(a=0,b=1)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>