R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Merging data.frames based on common identifiers

mergeData

R Documentation

Merging data.frames based on common identifiers

Description

This utility function is used for merging specific columns from a set of distinct data.frames based on a specific set of identifiers. For instance this utility function can be used to retrieve from multiple data.frames the ranking statistics and the identifiers that will be used for computing the correspondence at the top curves.

Usage

mergeData(listOfDataFrames, idCol=1, byCol=2)

Arguments

`listOfDataFrames`	list. This object is a list of distinct data.frames to be merged based on common identifiers. The data.frames to be merged must contain at least two common columns, one for the identifiers (as specified by `idCol`), and one for the ranking statistics (as specified by `byCol`). Redundant features are not allowed, and should be previously removed using `filterRedundant`.
`idCol`	character or numeris. Name or index of the column containing the common identifiers (e.g. ENTREZID, SYMBOLS, ...).
`byCol`	character or numeric . Name of index the column containing the ranking statistics.

Details

This function first identifies the common set of features across all the data.frames contained in the listOfDataFrames object. Subsequently, for this common set of features, it returns a single data.frame containing the ranking statistics values of choice collected from each data.frame.

Value

A data.frame containing the identifiers and the ranking statistics common to all data.frames in listOfDataFrames to be used for computing the correspondence at the top (see Irizarry et al, Nat Methods (2005))

Author(s)

Luigi Marchionni marchion@jhu.edu

References

Irizarry, R. A.; Warren, D.; Spencer, F.; Kim, I. F.; Biswal, S.; Frank, B. C.; Gabrielson, E.; Garcia, J. G. N.; Geoghegan, J.; Germino, G.; Griffin, C.; Hilmer, S. C.; Hoffman, E.; Jedlicka, A. E.; Kawasaki, E.; Martinez-Murillo, F.; Morsberger, L.; Lee, H.; Petersen, D.; Quackenbush, J.; Scott, A.; Wilson, M.; Yang, Y.; Ye, S. Q. and Yu, W. Multiple-laboratory comparison of microarray platforms. Nat Methods, 2005, 2, 345-350

Ross, A. E.; Marchionni, L.; Vuica-Ross, M.; Cheadle, C.; Fan, J.; Berman, D. M.; and Schaeffer E. M. Gene Expression Pathways of High Grade Localized Prostate Cancer. Prostate 2011, 71, 1568-1578

Benassi, B.; Flavin, R.; Marchionni, L.; Zanata, S.; Pan, Y.; Chowdhury, D.; Marani, M.; Strano, S.; Muti, P.; and Blandino, G. c-Myc is activated via USP2a-mediated modulation of microRNAs in prostate cancer. Cancer Discovery, 2012, March, 2, 236-247

Examples

###load data
data(matchBoxExpression)

###the column name for the identifiers
idCol <- "SYMBOL"

###the column name for the ranking statistics
byCol <- "t"

###use lapply to remove redundancy from all data.frames
###default method is "maxORmin"
newMatchBoxExpression <- lapply(matchBoxExpression, filterRedundant, idCol=idCol, byCol=byCol)

###select t-statistics and merge into a new data.frame using SYMBOL
mat <- mergeData(listOfDataFrames = newMatchBoxExpression, idCol = idCol,
byCol = byCol)

###structure of mat
str(mat)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(matchBox)
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/matchBox/mergeData.Rd_%03d_medium.png", width=480, height=480)
> ### Name: mergeData
> ### Title: Merging data.frames based on common identifiers
> ### Aliases: mergeData
> ### Keywords: manip
> 
> ### ** Examples
> 
> ###load data
> data(matchBoxExpression)
> 
> ###the column name for the identifiers
> idCol <- "SYMBOL"
> 
> ###the column name for the ranking statistics
> byCol <- "t"
> 
> ###use lapply to remove redundancy from all data.frames
> ###default method is "maxORmin"
> newMatchBoxExpression <- lapply(matchBoxExpression, filterRedundant, idCol=idCol, byCol=byCol)
> 
> ###select t-statistics and merge into a new data.frame using SYMBOL
> mat <- mergeData(listOfDataFrames = newMatchBoxExpression, idCol = idCol,
+ byCol = byCol)
> 
> ###structure of mat
> str(mat)
'data.frame':	506 obs. of  4 variables:
 $ commonID  : chr  "A1CF" "AARS2" "ABCB1" "ABCG8" ...
 $ dataSetA.t: num  2.82 1.91 -2.27 2.26 -1.93 ...
 $ dataSetB.t: num  -3.61 -2.8 -3.07 3.07 -2.81 ...
 $ dataSetC.t: num  -3.454 -0.319 1.324 -1.797 -4.777 ...
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>