Either a genes-by-samples numeric matrix or a
SeqExpressionSet object containing the read counts.
cIdx
A character, logical, or numeric vector indicating the subset of genes to be used as negative controls in the estimation of the factors of unwanted variation.
k
The number of factors of unwanted variation to be estimated from the data.
residuals
A genes-by-samples matrix of residuals obtained from a first-pass regression of the counts on the covariates of interest, usually the negative binomial deviance residuals obtained from edgeR with the residuals method.
center
If TRUE, the residuals are centered, for each gene, to have mean zero across samples.
round
If TRUE, the normalized measures are rounded to form pseudo-counts.
epsilon
A small constant (usually no larger than one) to be added to the counts prior to the log transformation to avoid problems with log(0).
tolerance
Tolerance in the selection of the number of positive singular values, i.e., a singular value must be larger than tolerance to be considered positive.
isLog
Set to TRUE if the input matrix is already log-transformed. Ignored if x is a SeqExpressionSet.
Details
The RUVr procedure performs factor analysis on residuals, such as deviance
residuals from a first-pass GLM regression of the counts on the
covariates of interest using edgeR. The counts may be either unnormalized or
normalized with a method such as upper-quartile (UQ) normalization.
A samples-by-factors matrix with the estimated factors of unwanted variation (W).
The genes-by-samples matrix of normalized expression measures (possibly
rounded) obtained by removing the factors of unwanted variation from the
original read counts (normalizedCounts).
The normalized counts in the normalizedCounts slot.
The estimated factors of unwanted variation as additional columns of the
phenoData slot.
Author(s)
Davide Risso
References
D. Risso, J. Ngai, T. P. Speed, and S. Dudoit.
Normalization of RNA-seq data using factor analysis of control genes or samples.
Nature Biotechnology, 2014. (In press).
D. Risso, J. Ngai, T. P. Speed, and S. Dudoit. The role of spike-in
standards in the normalization of RNA-Seq. In D. Nettleton and S. Datta,
editors, Statistical Analysis of Next Generation Sequence
Data. Springer, 2014. (In press).
See Also
RUVg, RUVs, residuals.
Examples
library(edgeR)
library(zebrafishRNASeq)
data(zfGenes)
## run on a subset of genes for time reasons
## (real analyses should be performed on all genes)
genes <- rownames(zfGenes)[grep("^ENS", rownames(zfGenes))]
spikes <- rownames(zfGenes)[grep("^ERCC", rownames(zfGenes))]
set.seed(123)
idx <- c(sample(genes, 1000), spikes)
seq <- newSeqExpressionSet(as.matrix(zfGenes[idx,]))
# Residuals from negative binomial GLM regression of UQ-normalized
# counts on covariates of interest, with edgeR
x <- as.factor(rep(c("Ctl", "Trt"), each=3))
design <- model.matrix(~x)
y <- DGEList(counts=counts(seq), group=x)
y <- calcNormFactors(y, method="upperquartile")
y <- estimateGLMCommonDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
fit <- glmFit(y, design)
res <- residuals(fit, type="deviance")
# RUVr normalization (after UQ)
seqUQ <- betweenLaneNormalization(seq, which="upper")
controls <- rownames(seq)
seqRUVr <- RUVr(seqUQ, controls, k=1, res)
pData(seqRUVr)
head(normCounts(seqRUVr))
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(RUVSeq)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: EDASeq
Loading required package: ShortRead
Loading required package: BiocParallel
Loading required package: Biostrings
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: XVector
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: edgeR
Loading required package: limma
Attaching package: 'limma'
The following object is masked from 'package:BiocGenerics':
plotMA
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/RUVSeq/RUVr.Rd_%03d_medium.png", width=480, height=480)
> ### Name: RUVr-methods
> ### Title: Remove Unwanted Variation Using Residuals
> ### Aliases: RUVr RUVr-methods RUVr,matrix,ANY,numeric,matrix-method
> ### RUVr,SeqExpressionSet,character,numeric,matrix-method
>
> ### ** Examples
>
> library(edgeR)
> library(zebrafishRNASeq)
> data(zfGenes)
>
> ## run on a subset of genes for time reasons
> ## (real analyses should be performed on all genes)
> genes <- rownames(zfGenes)[grep("^ENS", rownames(zfGenes))]
> spikes <- rownames(zfGenes)[grep("^ERCC", rownames(zfGenes))]
> set.seed(123)
> idx <- c(sample(genes, 1000), spikes)
> seq <- newSeqExpressionSet(as.matrix(zfGenes[idx,]))
>
> # Residuals from negative binomial GLM regression of UQ-normalized
> # counts on covariates of interest, with edgeR
> x <- as.factor(rep(c("Ctl", "Trt"), each=3))
> design <- model.matrix(~x)
> y <- DGEList(counts=counts(seq), group=x)
> y <- calcNormFactors(y, method="upperquartile")
> y <- estimateGLMCommonDisp(y, design)
> y <- estimateGLMTagwiseDisp(y, design)
>
> fit <- glmFit(y, design)
> res <- residuals(fit, type="deviance")
>
> # RUVr normalization (after UQ)
> seqUQ <- betweenLaneNormalization(seq, which="upper")
> controls <- rownames(seq)
> seqRUVr <- RUVr(seqUQ, controls, k=1, res)
>
> pData(seqRUVr)
W_1
Ctl1 -0.342588731
Ctl3 0.194390997
Ctl5 0.150413769
Trt9 0.004932811
Trt11 -0.644733885
Trt13 0.637585041
> head(normCounts(seqRUVr))
Ctl1 Ctl3 Ctl5 Trt9 Trt11 Trt13
ENSDARG00000043686 2 6 2 0 0 0
ENSDARG00000089089 0 0 0 0 0 0
ENSDARG00000060813 355 158 272 220 296 404
ENSDARG00000092245 0 6 2 0 12 4
ENSDARG00000094339 0 0 0 0 0 0
ENSDARG00000007918 99 43 47 109 128 198
>
>
>
>
>
> dev.off()
null device
1
>