Last data update: 2014.03.03

R: Sample correlations for random pairs of genes
CorrSampleR Documentation

Sample correlations for random pairs of genes

Description

CorrSample calculates the correlations, standard deviations and some auxiliary variables for random pairs of genes. A plot of the resulting object that shows that these correlations dependend systematically on the genes' variability, suggests a lack of normalization.

RandPairs is a helper function for generating random pairs from a list of genes.

Usage

CorrSample(x, np, seed, rp, ndx)

RandPairs(probes, number)

Arguments

x

a gene expression matrix, with samples as columns and genes as rows; missing values are accepted.

np, number

the number of random pairs

seed

an optional seed for the random sampling

rp

an optional matrix with two columns specifying the random pairs, see Details.

ndx

an optional logical matrix of the same dimension as x that allows to eliminate a subset of the expression values from the calculation of the correlations, standard deviations and auxiliary variables.

probes

a vector of genes from which to draw random pairs; can be integer, as a vector of row indices, or character, as a vector of row names.

Details

The sample of random pairs can be specified in a replicable manner either via np and seed, or by using the output from RandPairs for the parameter rp. In case we want to use the same set of random pairs (e.g. when comparing different expression measures on the same data set), the second option will be faster.

Value

An object of class corr.sample; this is just a data frame with an extra class tag to allow for a plotting method.

The data frame has np rows and nine columns:

Correlation

the correlation between the two genes across samples

StdDev

the geometric mean of the standard deviations of the two genes

sd1,sd2

the standard deviations of the genes

m1,m2

the means of the genes

ndx1,ndx2

the indices of the two genes; by default, these will be the corresponding row indices of x, but if rp is specified, they might be gene names.

Author(s)

Alexander Ploner Alexander.Ploner@ki.se

References

Ploner A, Miller LD, Hall P, Bergh J, Pawitan Y. Correlation test to assess low-level processing of high-density oligonucleotide microarray data. BMC Bioinformatics, 2005, 6(1):80 http://www.pubmedcentral.gov/articlerender.fcgi?tool=pubmed&pubmedid=15799785

See Also

plot.corr.sample

Examples

# Get small example data
data(oligodata)
dim(datA.rma)

# Compute the correlations for 500 random pairs, 
# that is ca. 1/1000 of all possible pairs 
# Larger numbers are reasonable for larger data sets
cs1 = CorrSample(datA.rma, 500, seed=210)
cs1[1:5,]

# Clear correlation for pairs of genes with low average variability 
plot(cs1)

# A different way of specifying the same 
set.seed(210)
rp = RandPairs(rownames(datA.rma), 500)
cs2 = CorrSample(datA.rma, rp=rp)
cs2[1:5,]
plot(cs2)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(maCorrPlot)
Loading required package: lattice
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/maCorrPlot/CorrSample.Rd_%03d_medium.png", width=480, height=480)
> ### Name: CorrSample
> ### Title: Sample correlations for random pairs of genes
> ### Aliases: CorrSample RandPairs
> ### Keywords: datagen
> 
> ### ** Examples
> 
> # Get small example data
> data(oligodata)
> dim(datA.rma)
[1] 1000   30
> 
> # Compute the correlations for 500 random pairs, 
> # that is ca. 1/1000 of all possible pairs 
> # Larger numbers are reasonable for larger data sets
> cs1 = CorrSample(datA.rma, 500, seed=210)
> cs1[1:5,]
  Correlation     StdDev     Mean       sd1       sd2        m1        m2 ndx1
1  -0.1558784 0.08655279 7.649430 0.1904952 0.4543569  7.010783  8.288077  513
2  -0.3357701 0.24912586 9.249972 0.5554773 0.4484897  7.424855 11.075090  894
3  -0.1475604 0.07512037 6.364299 0.2128482 0.3529294  5.868567  6.860031  299
4   0.1271111 0.08460454 7.558578 0.2529661 0.3344501  7.490713  7.626443  278
5  -0.2654174 0.05902537 7.821525 0.3925727 0.1503553 10.225949  5.417101  464
  ndx2
1  835
2  446
3   82
4   39
5  201
> 
> # Clear correlation for pairs of genes with low average variability 
> plot(cs1)
> 
> # A different way of specifying the same 
> set.seed(210)
> rp = RandPairs(rownames(datA.rma), 500)
> cs2 = CorrSample(datA.rma, rp=rp)
> cs2[1:5,]
  Correlation     StdDev     Mean       sd1       sd2        m1        m2
1  -0.1558784 0.08655279 7.649430 0.1904952 0.4543569  7.010783  8.288077
2  -0.3357701 0.24912586 9.249972 0.5554773 0.4484897  7.424855 11.075090
3  -0.1475604 0.07512037 6.364299 0.2128482 0.3529294  5.868567  6.860031
4   0.1271111 0.08460454 7.558578 0.2529661 0.3344501  7.490713  7.626443
5  -0.2654174 0.05902537 7.821525 0.3925727 0.1503553 10.225949  5.417101
    ndx1   ndx2
1 g11447 g18678
2 g20108  g9801
3  g6570  g1925
4  g5975   g930
5 g10165  g4530
> plot(cs2)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>