Last data update: 2014.03.03

R: Test for significantly large variances
testVarR Documentation

Test for significantly large variances

Description

Test for whether the total variance exceeds that expected under some null hypothesis, for sample variances estimated from normally distributed observations.

Usage

testVar(total, null, df, design=NULL) 

Arguments

total

A numeric vector of total variances for all genes.

null

A numeric scalar or vector of expected variances under the null hypothesis for all genes.

df

An integer scalar specifying the degrees of freedom on which the variances were estimated.

design

A design matrix, used to determine the degrees of freedom if df is missing.

Details

The null hypothesis states that the true variance for each gene is equal to null. Variance estimates should follow a scaled chi-squared distribution on df degrees of freedom, where the scaling factor is equal to null divided by df. This can be used to compute a p-value for total being greater than null. The assumption is that the original observations were normally distributed – using log-CPMs tends to work reasonably well for count data.

The idea is to use this function to identify significantly highly variable genes. For example, the null vector can be set to the values of the trend fitted to the spike-in variances. This will identify genes with variances significantly greater than technical noise. Alternatively, it can be set to the trend fitted to the cellular variances, which will identify those that are significantly more variable than the bulk of genes.

Note that the test statistic is the (scaled) ratio of total over null for each gene. This may not be ideal when null is small, e.g., for high-abundance genes, where a high ratio/low p-value may not represent a large absolute increase in the variance. To avoid detecting irrelevant genes, users can modify null by adding a minimum threshold of, say, 0.5. This tests for variances that are significantly greater than null+0.5 and rules out genes that have high ratios but small absolute increases.

Value

A numeric vector of p-values for all genes.

Author(s)

Aaron Lun

References

Law CW, Chen Y, Shi W and Smyth GK (2014). voom: precision weights unlock linear model analysis tools for RNA-seq read counts Genome Biol. 15(2), R29.

See Also

trendVar, decomposeVar

Examples

set.seed(100)
null <- 100/runif(1000, 50, 2000)
df <- 30
total <- null * rchisq(length(null), df=df)/df

# Direct test:
out <- testVar(total, null, df=df)
hist(out)

# Rejecting the null:
alt <- null * 5 * rchisq(length(null), df=df)/df
out <- testVar(alt, null, df=df)
plot(alt[order(out)]-null)

# Focusing on genes that have high absolute increases in variability:
out <- testVar(alt, null+0.5, df=df)
plot(alt[order(out)]-null)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(scran)
Loading required package: BiocParallel
Loading required package: scater
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: ggplot2

Attaching package: 'scater'

The following object is masked from 'package:stats':

    filter

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/scran/testVar.Rd_%03d_medium.png", width=480, height=480)
> ### Name: testVar
> ### Title: Test for significantly large variances
> ### Aliases: testVar
> ### Keywords: variance
> 
> ### ** Examples
> 
> set.seed(100)
> null <- 100/runif(1000, 50, 2000)
> df <- 30
> total <- null * rchisq(length(null), df=df)/df
> 
> # Direct test:
> out <- testVar(total, null, df=df)
> hist(out)
> 
> # Rejecting the null:
> alt <- null * 5 * rchisq(length(null), df=df)/df
> out <- testVar(alt, null, df=df)
> plot(alt[order(out)]-null)
> 
> # Focusing on genes that have high absolute increases in variability:
> out <- testVar(alt, null+0.5, df=df)
> plot(alt[order(out)]-null)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>