R: Merging data.frames based on common identifiers
mergeData
R Documentation
Merging data.frames based on common identifiers
Description
This utility function is used for merging specific columns
from a set of distinct data.frames based on a specific set
of identifiers. For instance this utility function can be
used to retrieve from multiple data.frames the ranking
statistics and the identifiers that will be used for computing
the correspondence at the top curves.
Usage
mergeData(listOfDataFrames, idCol=1, byCol=2)
Arguments
listOfDataFrames
list. This object is a list
of distinct data.frames to be merged based on
common identifiers. The data.frames to be merged
must contain at least two common columns,
one for the identifiers (as specified by idCol),
and one for the ranking statistics (as specified by
byCol).
Redundant features are not allowed, and should
be previously removed using filterRedundant.
idCol
character or numeris. Name or index of the column
containing the common identifiers (e.g. ENTREZID, SYMBOLS, ...).
byCol
character or numeric . Name of index the column
containing the ranking statistics.
Details
This function first identifies the common set of features
across all the data.frames contained in the listOfDataFrames
object. Subsequently, for this common set of features,
it returns a single data.frame containing the ranking statistics
values of choice collected from each data.frame.
Value
A data.frame containing the identifiers and
the ranking statistics common to all data.frames
in listOfDataFrames to be used for computing
the correspondence at the top
(see Irizarry et al, Nat Methods (2005))
Irizarry, R. A.; Warren, D.; Spencer, F.; Kim, I. F.; Biswal, S.;
Frank, B. C.; Gabrielson, E.; Garcia, J. G. N.; Geoghegan, J.;
Germino, G.; Griffin, C.; Hilmer, S. C.; Hoffman, E.;
Jedlicka, A. E.; Kawasaki, E.; Martinez-Murillo, F.;
Morsberger, L.; Lee, H.; Petersen, D.; Quackenbush, J.;
Scott, A.; Wilson, M.; Yang, Y.; Ye, S. Q.
and Yu, W. Multiple-laboratory comparison of microarray platforms.
Nat Methods, 2005, 2, 345-350
Ross, A. E.; Marchionni, L.; Vuica-Ross, M.; Cheadle, C.;
Fan, J.; Berman, D. M.; and Schaeffer E. M.
Gene Expression Pathways of High Grade Localized Prostate Cancer.
Prostate 2011, 71, 1568-1578
Benassi, B.; Flavin, R.; Marchionni, L.; Zanata, S.; Pan, Y.;
Chowdhury, D.; Marani, M.; Strano, S.; Muti, P.; and Blandino, G.
c-Myc is activated via USP2a-mediated modulation of microRNAs
in prostate cancer. Cancer Discovery, 2012, March, 2, 236-247
See Also
See filterRedundant.
Examples
###load data
data(matchBoxExpression)
###the column name for the identifiers
idCol <- "SYMBOL"
###the column name for the ranking statistics
byCol <- "t"
###use lapply to remove redundancy from all data.frames
###default method is "maxORmin"
newMatchBoxExpression <- lapply(matchBoxExpression, filterRedundant, idCol=idCol, byCol=byCol)
###select t-statistics and merge into a new data.frame using SYMBOL
mat <- mergeData(listOfDataFrames = newMatchBoxExpression, idCol = idCol,
byCol = byCol)
###structure of mat
str(mat)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(matchBox)
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/matchBox/mergeData.Rd_%03d_medium.png", width=480, height=480)
> ### Name: mergeData
> ### Title: Merging data.frames based on common identifiers
> ### Aliases: mergeData
> ### Keywords: manip
>
> ### ** Examples
>
> ###load data
> data(matchBoxExpression)
>
> ###the column name for the identifiers
> idCol <- "SYMBOL"
>
> ###the column name for the ranking statistics
> byCol <- "t"
>
> ###use lapply to remove redundancy from all data.frames
> ###default method is "maxORmin"
> newMatchBoxExpression <- lapply(matchBoxExpression, filterRedundant, idCol=idCol, byCol=byCol)
>
> ###select t-statistics and merge into a new data.frame using SYMBOL
> mat <- mergeData(listOfDataFrames = newMatchBoxExpression, idCol = idCol,
+ byCol = byCol)
>
> ###structure of mat
> str(mat)
'data.frame': 506 obs. of 4 variables:
$ commonID : chr "A1CF" "AARS2" "ABCB1" "ABCG8" ...
$ dataSetA.t: num 2.82 1.91 -2.27 2.26 -1.93 ...
$ dataSetB.t: num -3.61 -2.8 -3.07 3.07 -2.81 ...
$ dataSetC.t: num -3.454 -0.319 1.324 -1.797 -4.777 ...
>
>
>
>
>
> dev.off()
null device
1
>