Last data update: 2014.03.03

R: Remove Unwanted Variation Using Replicate/Negative Control...
RUVs-methodsR Documentation

Remove Unwanted Variation Using Replicate/Negative Control Samples

Description

This function implements the RUVs method of Risso et al. (2014).

Usage

RUVs(x, cIdx, k, scIdx, round=TRUE, epsilon=1, tolerance=1e-8, isLog=FALSE)

Arguments

x

Either a genes-by-samples numeric matrix or a SeqExpressionSet object containing the read counts.

cIdx

A character, logical, or numeric vector indicating the subset of genes to be used as negative controls in the estimation of the factors of unwanted variation.

k

The number of factors of unwanted variation to be estimated from the data.

scIdx

A numeric matrix specifying the replicate samples for which to compute the count differences used to estimate the factors of unwanted variation (see details).

round

If TRUE, the normalized measures are rounded to form pseudo-counts.

epsilon

A small constant (usually no larger than one) to be added to the counts prior to the log transformation to avoid problems with log(0).

tolerance

Tolerance in the selection of the number of positive singular values, i.e., a singular value must be larger than tolerance to be considered positive.

isLog

Set to TRUE if the input matrix is already log-transformed. Ignored if x is a SeqExpressionSet.

Details

The RUVs procedure performs factor analysis on a matrix of count differences for replicate/negative control samples, for which the biological covariates of interest are constant.

Each row of scIdx should correspond to a set of replicate samples. The number of columns is the size of the largest set of replicates; rows for smaller sets are padded with -1 values.

For example, if the sets of replicate samples are (1,11,21),(2,3),(4,5),(6,7,8), then scIdx should be

1 11 21
2 3 -1
4 5 -1
6 7 8

Methods

signature(x = "matrix", cIdx = "ANY", k = "numeric", scIdx = "matrix")

It returns a list with

  • A samples-by-factors matrix with the estimated factors of unwanted variation (W).

  • The genes-by-samples matrix of normalized expression measures (possibly rounded) obtained by removing the factors of unwanted variation from the original read counts (normalizedCounts).

signature(x = "SeqExpressionSet", cIdx = "character", k="numeric", scIdx = "matrix")

It returns a SeqExpressionSet with

  • The normalized counts in the normalizedCounts slot.

  • The estimated factors of unwanted variation as additional columns of the phenoData slot.

Author(s)

Davide Risso (building on a previous version by Laurent Jacob).

References

D. Risso, J. Ngai, T. P. Speed, and S. Dudoit. Normalization of RNA-seq data using factor analysis of control genes or samples. Nature Biotechnology, 2014. (In press).

D. Risso, J. Ngai, T. P. Speed, and S. Dudoit. The role of spike-in standards in the normalization of RNA-Seq. In D. Nettleton and S. Datta, editors, Statistical Analysis of Next Generation Sequence Data. Springer, 2014. (In press).

See Also

RUVg, RUVr.

Examples

library(zebrafishRNASeq)
data(zfGenes)

## run on a subset of genesfor time reasons 
## (real analyses should be performed on all genes)
genes <- rownames(zfGenes)[grep("^ENS", rownames(zfGenes))]
spikes <- rownames(zfGenes)[grep("^ERCC", rownames(zfGenes))]
set.seed(123)
idx <- c(sample(genes, 1000), spikes)
seq <- newSeqExpressionSet(as.matrix(zfGenes[idx,]))

# RUVs normalization
controls <- rownames(seq)
differences <- matrix(data=c(1:3, 4:6), byrow=TRUE, nrow=2)
seqRUVs <- RUVs(seq, controls, k=1, differences)

pData(seqRUVs)
head(normCounts(seqRUVs))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(RUVSeq)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: EDASeq
Loading required package: ShortRead
Loading required package: BiocParallel
Loading required package: Biostrings
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: edgeR
Loading required package: limma

Attaching package: 'limma'

The following object is masked from 'package:BiocGenerics':

    plotMA

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/RUVSeq/RUVs.Rd_%03d_medium.png", width=480, height=480)
> ### Name: RUVs-methods
> ### Title: Remove Unwanted Variation Using Replicate/Negative Control
> ###   Samples
> ### Aliases: RUVs RUVs-methods RUVs,matrix,ANY,numeric,matrix-method
> ###   RUVs,SeqExpressionSet,character,numeric,matrix-method
> 
> ### ** Examples
> 
> library(zebrafishRNASeq)
> data(zfGenes)
> 
> ## run on a subset of genesfor time reasons 
> ## (real analyses should be performed on all genes)
> genes <- rownames(zfGenes)[grep("^ENS", rownames(zfGenes))]
> spikes <- rownames(zfGenes)[grep("^ERCC", rownames(zfGenes))]
> set.seed(123)
> idx <- c(sample(genes, 1000), spikes)
> seq <- newSeqExpressionSet(as.matrix(zfGenes[idx,]))
> 
> # RUVs normalization
> controls <- rownames(seq)
> differences <- matrix(data=c(1:3, 4:6), byrow=TRUE, nrow=2)
> seqRUVs <- RUVs(seq, controls, k=1, differences)
> 
> pData(seqRUVs)
            W_1
Ctl1  0.7424567
Ctl3  1.3081432
Ctl5  1.1198213
Trt9  0.8036292
Trt11 0.1018215
Trt13 1.3916951
> head(normCounts(seqRUVs))
                   Ctl1 Ctl3 Ctl5 Trt9 Trt11 Trt13
ENSDARG00000043686    1    6    1    0     0     0
ENSDARG00000089089    0    0    0    0     0     0
ENSDARG00000060813   82   41   62   49    74   103
ENSDARG00000092245    0   12    4    0    11     4
ENSDARG00000094339    0    0    0    0     0     0
ENSDARG00000007918  158   78   59  100   123   182
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>