a list of ExpressionSets, containing the numeric and phenotypic data to be analyzed.
separator
a delimitor to use between dataset names and sample names
corFinder.args
a list of arguments to be passed to the corFinder function.
phenoFinder.args
a list of arguments to be passed to the phenoFinder function. If
NULL, samples with similar phenotypes will not be searched for.
outlierFinder.expr.args
a list of arguments to be passed to outlierFinder when called for expression data
outlierFinder.pheno.args
a list of arguments to be passed to outlierFinder when called for phenotype data
smokingGunFinder.args
a list of arguments to be passed to smokingGunFinder
impute.knn.args
a list of arguments to be passed to impute::impute.knn. Set to
NULL to do no knn imputation.
manual.smokingguns
a character vector of phenoData columns that, if identical, will
be considered evidence of duplication
automatic.smokingguns
automatically look for "smoking guns." If TRUE, look for
phenotype variables that are unique to each patient in dataset 1,
also unique to each patient in dataset 2, but contain exact
matches between datasets 1 and 2.
within.datasets.only
If TRUE, only search within each dataset for doppelgangers.
intermediate.pruning
The default setting FALSE will result in output with no missing
values, but uses extra memory because all results from the
expression, phenotype, and smoking gun doppelganger searches must
be saved until the end. Setting this to TRUE will save memory for
very large searches, but distance metrics will only be available
if that value was identified as a doppelganger (for example,
phenotype doppelgangers will have missing values for the
expression and smoking gun similarity).
cache.dir
The name of a directory in which to cache or look up results to save
re-calculating correlations. Set to NULL for no caching.
BPPARAM
Argument for BiocParallel::bplapply(), by default will use all
cores of a multi-core machine
verbose
Print progress information
Value
Returns an object of S4-class "DoppelGang". See ?DoppelGang-class.
Author(s)
Levi Waldron, Markus Riester, Marcel Ramos
See Also
?BiocParallel::'BiocParallelParam-class'
Examples
example("phenoFinder")
results2 <- doppelgangR(esets2, cache.dir = NULL)
results2
plot(results2)
summary(results2)
## Set phenoFinder.args=NULL to ignore similar phenotypes, and
## turn off ComBat batch correction:
## results2 <- doppelgangR(testesets, corFinder.args=list(use.ComBat=FALSE), phenoFinder.args=NULL, cache.dir=NULL)
## summary(results2)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(doppelgangR)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: BiocParallel
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/doppelgangR/doppelgangR.Rd_%03d_medium.png", width=480, height=480)
> ### Name: doppelgangR
> ### Title: doppelgangR
> ### Aliases: doppelgangR
>
> ### ** Examples
>
> example("phenoFinder")
phnFnd> library(curatedOvarianData)
Loading required package: affy
phnFnd> data(GSE32063_eset)
phnFnd> data(GSE17260_eset)
phnFnd> esets2 <- list(JapaneseB=GSE32063_eset,
phnFnd+ Yoshihara2010=GSE17260_eset)
phnFnd> ## standardize the sample ids to improve matching based on clinical annotation
phnFnd> esets2 <- lapply(esets2, function(X){
phnFnd+ X$alt_sample_name <- paste(X$sample_type, gsub("[^0-9]", "", X$alt_sample_name), sep="_")
phnFnd+
phnFnd+ ## Removal of columns that cannot possibly match also helps duplicated patients to stand out
phnFnd+ pData(X) <- pData(X)[, !grepl("uncurated_author_metadata", colnames(pData(X)))]
phnFnd+ X <- X[, 1:20] ##speed computations
phnFnd+ return(X) })
phnFnd> ## See first six samples in both rows and columns
phnFnd> phenoFinder(esets2)[1:6, 1:6]
GSM432220 GSM432221 GSM432222 GSM432223 GSM432224 GSM432225
GSM795125 0.2351904 0.1014047 0.3525417 0.7274151 0.2189890 0.27397077
GSM795126 0.5404524 0.2588727 0.4083015 0.4079720 0.2927870 0.74123368
GSM795127 0.3791279 0.5008562 0.4983502 0.4981226 0.6385506 0.04416984
GSM795128 0.2351904 0.1014047 0.3525417 0.3523760 0.2189890 0.27397077
GSM795129 0.1076309 0.2395470 0.2190910 0.2189890 0.3643260 0.16030839
GSM795130 0.2603947 0.1344290 0.1077761 0.1076793 0.2489234 0.29544860
>
> results2 <- doppelgangR(esets2, cache.dir = NULL)
Working on datasets JapaneseB and JapaneseB
Calculating correlations...
Identifying correlation doppelgangers...
Calculating phenotype similarities...
Identifying phenotype doppelgangers...
Working on datasets Yoshihara2010 and Yoshihara2010
Calculating correlations...
Identifying correlation doppelgangers...
Calculating phenotype similarities...
Identifying phenotype doppelgangers...
Working on datasets JapaneseB and Yoshihara2010
Calculating correlations...
Found 2 batches
Adjusting for 0 covariate(s) or covariate level(s)
Standardizing Data across genes
Fitting L/S model and finding priors
Finding parametric adjustments
Adjusting the Data
Identifying correlation doppelgangers...
Calculating phenotype similarities...
Identifying phenotype doppelgangers...
Finalizing...
> results2
S4 object of class: DoppelGang
Number of potential doppelgangers: 6 : 6 expression, 2 phenotype, 0 smoking gun.
Use summary(object) to obtain a data.frame of potential doppelgangrs.
> plot(results2)
> summary(results2)
sample1 sample2 expr.similarity expr.doppel
1 Yoshihara2010:GSM432230 Yoshihara2010:GSM432231 0.7468418 TRUE
2 JapaneseB:GSM795125 Yoshihara2010:GSM432223 0.9694795 TRUE
3 JapaneseB:GSM795126 Yoshihara2010:GSM432225 0.9626765 TRUE
4 JapaneseB:GSM795127 Yoshihara2010:GSM432226 0.9695461 TRUE
5 JapaneseB:GSM795128 Yoshihara2010:GSM432228 0.9640154 TRUE
6 JapaneseB:GSM795129 Yoshihara2010:GSM432229 0.9695206 TRUE
pheno.similarity pheno.doppel smokinggun.similarity smokinggun.doppel
1 0.2029299 FALSE NA FALSE
2 0.7274151 FALSE NA FALSE
3 0.7412337 FALSE NA FALSE
4 1.0000000 TRUE NA FALSE
5 0.5398956 FALSE NA FALSE
6 1.0000000 TRUE NA FALSE
alt_sample_name unique_patient_ID sample_type histological_type
1 tumor_112:tumor_113 NA:NA tumor:tumor ser:ser
2 tumor_106:tumor_106 NA:NA tumor:tumor ser:ser
3 tumor_108:tumor_108 NA:NA tumor:tumor ser:ser
4 tumor_109:tumor_109 NA:NA tumor:tumor ser:ser
5 tumor_110:tumor_110 NA:NA tumor:tumor ser:ser
6 tumor_111:tumor_111 NA:NA tumor:tumor ser:ser
primarysite arrayedsite summarygrade summarystage tumorstage substage grade
1 ov:ov NA:NA low:low late:late 3:3 a:c 2:1
2 NA:ov NA:NA low:high late:late 3:3 c:c 2:3
3 NA:ov NA:NA high:low late:late 3:3 b:b 3:1
4 NA:ov NA:NA high:high late:late 3:3 c:c 3:3
5 NA:ov NA:NA low:high late:late 3:3 c:c 2:3
6 NA:ov NA:NA low:low late:late 3:3 c:c 2:2
age_at_initial_pathologic_diagnosis pltx tax neo days_to_tumor_recurrence
1 NA:NA y:y y:y NA:NA 1440:1290
2 NA:NA y:y y:y NA:NA NA:450
3 NA:NA y:y y:y NA:NA NA:1110
4 NA:NA y:y y:y NA:NA NA:210
5 NA:NA y:y y:y NA:NA NA:1800
6 NA:NA y:y y:y NA:NA NA:540
recurrence_status days_to_death vital_status os_binary
1 norecurrence:recurrence 1440:2070 living:living NA:NA
2 NA:recurrence 780:780 living:living NA:NA
3 NA:norecurrence 1110:1110 living:living NA:NA
4 NA:recurrence 600:600 deceased:deceased NA:NA
5 NA:norecurrence 2910:1800 living:living NA:NA
6 NA:recurrence 1710:1710 deceased:deceased NA:NA
relapse_binary site_of_tumor_first_recurrence primary_therapy_outcome_success
1 NA:NA NA:NA NA:NA
2 NA:NA NA:NA NA:NA
3 NA:NA NA:NA NA:NA
4 NA:NA NA:NA NA:NA
5 NA:NA NA:NA NA:NA
6 NA:NA NA:NA NA:NA
debulking percent_normal_cells percent_stromal_cells
1 optimal:suboptimal NA:NA NA:NA
2 suboptimal:suboptimal NA:NA NA:NA
3 optimal:optimal NA:NA NA:NA
4 suboptimal:suboptimal NA:NA NA:NA
5 suboptimal:suboptimal NA:NA NA:NA
6 suboptimal:suboptimal NA:NA NA:NA
percent_tumor_cells batch flag flag_notes
1 NA:NA NA:NA NA:NA NA:NA
2 NA:NA NA:NA NA:NA NA:NA
3 NA:NA NA:NA NA:NA NA:NA
4 NA:NA NA:NA NA:NA NA:NA
5 NA:NA NA:NA NA:NA NA:NA
6 NA:NA NA:NA NA:NA NA:NA
> ## Set phenoFinder.args=NULL to ignore similar phenotypes, and
> ## turn off ComBat batch correction:
> ## results2 <- doppelgangR(testesets, corFinder.args=list(use.ComBat=FALSE), phenoFinder.args=NULL, cache.dir=NULL)
> ## summary(results2)
>
>
>
>
>
> dev.off()
null device
1
>