R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Remove Unwanted Variation Using Replicate/Negative Control...

RUVs-methods

R Documentation

Remove Unwanted Variation Using Replicate/Negative Control Samples

Description

This function implements the RUVs method of Risso et al. (2014).

Usage

RUVs(x, cIdx, k, scIdx, round=TRUE, epsilon=1, tolerance=1e-8, isLog=FALSE)

Arguments

`x`	Either a genes-by-samples numeric matrix or a SeqExpressionSet object containing the read counts.
`cIdx`	A character, logical, or numeric vector indicating the subset of genes to be used as negative controls in the estimation of the factors of unwanted variation.
`k`	The number of factors of unwanted variation to be estimated from the data.
`scIdx`	A numeric matrix specifying the replicate samples for which to compute the count differences used to estimate the factors of unwanted variation (see details).
`round`	If `TRUE`, the normalized measures are rounded to form pseudo-counts.
`epsilon`	A small constant (usually no larger than one) to be added to the counts prior to the log transformation to avoid problems with log(0).
`tolerance`	Tolerance in the selection of the number of positive singular values, i.e., a singular value must be larger than `tolerance` to be considered positive.
`isLog`	Set to `TRUE` if the input matrix is already log-transformed. Ignored if `x` is a SeqExpressionSet.

Details

The RUVs procedure performs factor analysis on a matrix of count differences for replicate/negative control samples, for which the biological covariates of interest are constant.

Each row of scIdx should correspond to a set of replicate samples. The number of columns is the size of the largest set of replicates; rows for smaller sets are padded with -1 values.

For example, if the sets of replicate samples are (1,11,21),(2,3),(4,5),(6,7,8), then scIdx should be

1 11 21
2 3 -1
4 5 -1
6 7 8

Methods

signature(x = "matrix", cIdx = "ANY", k = "numeric", scIdx = "matrix")

It returns a list with

A samples-by-factors matrix with the estimated factors of unwanted variation (W).
The genes-by-samples matrix of normalized expression measures (possibly rounded) obtained by removing the factors of unwanted variation from the original read counts (normalizedCounts).

signature(x = "SeqExpressionSet", cIdx = "character", k="numeric", scIdx = "matrix")

It returns a SeqExpressionSet with

The normalized counts in the normalizedCounts slot.
The estimated factors of unwanted variation as additional columns of the phenoData slot.

Author(s)

Davide Risso (building on a previous version by Laurent Jacob).

References

D. Risso, J. Ngai, T. P. Speed, and S. Dudoit. Normalization of RNA-seq data using factor analysis of control genes or samples. Nature Biotechnology, 2014. (In press).

D. Risso, J. Ngai, T. P. Speed, and S. Dudoit. The role of spike-in standards in the normalization of RNA-Seq. In D. Nettleton and S. Datta, editors, Statistical Analysis of Next Generation Sequence Data. Springer, 2014. (In press).

Examples

library(zebrafishRNASeq)
data(zfGenes)

## run on a subset of genesfor time reasons 
## (real analyses should be performed on all genes)
genes <- rownames(zfGenes)[grep("^ENS", rownames(zfGenes))]
spikes <- rownames(zfGenes)[grep("^ERCC", rownames(zfGenes))]
set.seed(123)
idx <- c(sample(genes, 1000), spikes)
seq <- newSeqExpressionSet(as.matrix(zfGenes[idx,]))

# RUVs normalization
controls <- rownames(seq)
differences <- matrix(data=c(1:3, 4:6), byrow=TRUE, nrow=2)
seqRUVs <- RUVs(seq, controls, k=1, differences)

pData(seqRUVs)
head(normCounts(seqRUVs))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(RUVSeq)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: EDASeq
Loading required package: ShortRead
Loading required package: BiocParallel
Loading required package: Biostrings
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: edgeR
Loading required package: limma

Attaching package: 'limma'

The following object is masked from 'package:BiocGenerics':

    plotMA

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/RUVSeq/RUVs.Rd_%03d_medium.png", width=480, height=480)
> ### Name: RUVs-methods
> ### Title: Remove Unwanted Variation Using Replicate/Negative Control
> ###   Samples
> ### Aliases: RUVs RUVs-methods RUVs,matrix,ANY,numeric,matrix-method
> ###   RUVs,SeqExpressionSet,character,numeric,matrix-method
> 
> ### ** Examples
> 
> library(zebrafishRNASeq)
> data(zfGenes)
> 
> ## run on a subset of genesfor time reasons 
> ## (real analyses should be performed on all genes)
> genes <- rownames(zfGenes)[grep("^ENS", rownames(zfGenes))]
> spikes <- rownames(zfGenes)[grep("^ERCC", rownames(zfGenes))]
> set.seed(123)
> idx <- c(sample(genes, 1000), spikes)
> seq <- newSeqExpressionSet(as.matrix(zfGenes[idx,]))
> 
> # RUVs normalization
> controls <- rownames(seq)
> differences <- matrix(data=c(1:3, 4:6), byrow=TRUE, nrow=2)
> seqRUVs <- RUVs(seq, controls, k=1, differences)
> 
> pData(seqRUVs)
            W_1
Ctl1  0.7424567
Ctl3  1.3081432
Ctl5  1.1198213
Trt9  0.8036292
Trt11 0.1018215
Trt13 1.3916951
> head(normCounts(seqRUVs))
                   Ctl1 Ctl3 Ctl5 Trt9 Trt11 Trt13
ENSDARG00000043686    1    6    1    0     0     0
ENSDARG00000089089    0    0    0    0     0     0
ENSDARG00000060813   82   41   62   49    74   103
ENSDARG00000092245    0   12    4    0    11     4
ENSDARG00000094339    0    0    0    0     0     0
ENSDARG00000007918  158   78   59  100   123   182
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>