Last data update: 2014.03.03

R: General method to merge different ExpressionSets
mergeR Documentation

General method to merge different ExpressionSets

Description

General method to merge different ExpressionSets by applying different techniques to remove inter-study bias.

Usage


merge(esets, method='NONE');

Arguments

esets

List of ExpressionSet objects.

method

Merging method aimed at removing inter-study bias. Possible options are: BMC, COMBAT, GENENORM and XPN. If none are specified, the merging More information about each method is given below in the details.

Details

Currently the following different merging techniques are provided:

'BMC':

In [1] they successfully applied a technique similar to z-score normalization for merging breast cancer datasets. They transformed the data by batch mean-centering, which means that the mean is subtracted.

'COMBAT':

Empirical Bayes [2] (also called EJLR or COMBAT) is a method that estimates the parameters of a model for mean and variance for each gene and then adjusts the genes in each batch to meet the assumed model. The parameters are estimated by pooling information from multiple genes in each batch.

'GENENORM':

One of the simplest mathematical transformations to make datasets more comparable is z-score normalization. In this method, for each gene expression value in each study separately all values are altered by subtracting the mean of the gene in that dataset divided by its standard deviation.

'NONE':

Combine esets without any additional transformation. Similar to 'combine' function.

'XPN':

The basic idea behind the cross-platform normalization [4] approach is to find blocks (clusters) of genes and samples in both studies that have similar expression characteristics. In XPN, a gene measurement can be considered as a scaled and shifted block mean.

Note that after using any of those methods the resulting merged dataset only contains the common list of genes/probes between all studies.

Value

A (merged) ExpressionSet object.

References

[1] A. Sims, et al., The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets - improving meta-analysis and prediction of prognosis, BMC Medical Genomics, vol. 1, no. 1, p. 42, 2008.

[2] C. Li and A. Rabinovic, Adjusting batch effects in microarray expression data using empirical bayes methods, Biostatistics, vol. 8, no. 1, pp. 118-127, 2007.

[3] M. Benito, et al., Adjustment of systematic microarray data biases, Bioinformatics, vol. 20, no. 1, pp. 105-114, 2004.

[4] A. A. Shabalin, et al., Merging two gene-expression studies via cross-platform normalization, Bioinformatics, vol. 24, no. 9, pp. 1154-1160, 2008.

Examples


# retrieve two datasets:
library(inSilicoDb);
InSilicoLogin("rpackage_tester@insilicodb.com", "5c4d0b231e5cba4a0bc54783b385cc9a");
eset1 = getDataset("GSE18842", "GPL570", norm="FRMA", features="GENE");
eset2 = getDataset("GSE31547", "GPL96",  norm="FRMA", features="GENE");
esets = list(eset1,eset2);

# merge them using different methods:
library(inSilicoMerging);
eset_FRMA = merge(esets);
eset_COMBAT = merge(esets, method="COMBAT");

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(inSilicoMerging)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Attaching package: 'inSilicoMerging'

The following object is masked from 'package:base':

    merge

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/inSilicoMerging/merge.Rd_%03d_medium.png", width=480, height=480)
> ### Name: merge
> ### Title: General method to merge different ExpressionSets
> ### Aliases: merge
> 
> ### ** Examples
> 
> 
> # retrieve two datasets:
> library(inSilicoDb);
Loading required package: rjson
Loading required package: RCurl
Loading required package: bitops
> InSilicoLogin("rpackage_tester@insilicodb.com", "5c4d0b231e5cba4a0bc54783b385cc9a");
  INSILICODB: Welcome RPackage Tester
[1] 5296
> eset1 = getDataset("GSE18842", "GPL570", norm="FRMA", features="GENE");
  INSILICODB: The dataset you requested could not be computed. We are sorry for the inconvenience.
Error: Stopped because of previous errors
Execution halted