R: Utility functions to assist with QA/QC and analysis of...
Utility functions
R Documentation
Utility functions to assist with QA/QC and analysis of plethysmography data
Description
After creation of a database, often additional data needs to be added or modified. These functions assist with the common tasks that occur when working with Buxco whole body plethysmography data such as adding labels based on the
sample IDs in the case of add.labels.by.sample or modifying labels that have previously been added in the case of adjust.labels. The get.err.breaks function produces a summary of the samples and timepoints that
have the specified value for the 'Break_type_label' column (such as 'ERR' or 'UNK') and whether they are close to the expected value for either and experimental or acclimation run. This can occur if there was only an experimental run
for some samples or if other anomalies occured. The user can then inspect these new labels wihtin the data.frame, modify them manually if necessary and use the data.frame as input to the adjust.labels function which replaces
the original labels and moves the original labels to another column for future reference.
A data.frame with a column named 'samples' and optionally a column named 'phase' with values corresponding to the sample names and Phase values (e.g. recorded experimental timepoint) in the database. The other columns will be add to the annotation table and any sample not included in the data.frame will have their labels set to NULL.
err.breaks.dta
A data.frame produced by get.err.breaks function.
max.exp.time
The maximum time a given experimental run should take in seconds
max.acc.time
The maximum time a given acclimation run should take in seconds
max.exp.count
The maximum number of records expected for the experimental run.
max.acc.count
The maximum value of records expected for the acclimation run.
vary.perc
The size of a percent decrease relative to the maximum experimental or acclimation run tolerated and still allow assignment to that category. Needs to be a value between 0 and 1.
label.val
A single character string observed in the Break_type_labels column of the annotation table (cannot be 'ACC' or 'EXP').
Value
add.labels.by.sample and adjust.labels modify tables in the SQLite database pointed to in the BuxcoDB object so nothing is returned.
get.err.breaks returns a data.frame summarizing the samples and timepoints with a given label.var.
Author(s)
Daniel Bottomly
See Also
parse.buxco,BuxcoDB
Examples
##set up a test dataset using internal functions
##should label sample_1 as ACC and EXP and samples 2 and 3 as UNK
##sample_3 should be too divergent from the expected 150 rows, so
##the inferred labels should remain 'UNK'
samples=c(NA, "sample_1", NA, "sample_1", "sample_2", "sample_3")
count = c(NA,900, NA,150, 150, 110)
measure_break = c(FALSE, FALSE, TRUE, FALSE, FALSE,FALSE)
table_break = c(TRUE, rep(FALSE, length(samples)-1))
phase = rep("D1", length(samples))
err.dta <- data.frame(samples=samples, count=count, measure_break=measure_break, table_break=table_break, phase=phase, stringsAsFactors=FALSE)
sim.bux.lines <- plethy:::generate.sample.buxco(err.dta)
temp.file <- tempfile()
temp.db.file <- tempfile()
write(sim.bux.lines, file=temp.file)
test.bux.db <- parse.buxco(file.name=temp.file, db.name=temp.db.file, chunk.size=10000)
addAnnotation(test.bux.db, query=day.infer.query, index=FALSE)
addAnnotation(test.bux.db, query=break.type.query, index=TRUE)
##quick test of data
test <- proc.sanity(test.bux.db)
head(test$count)
test$time
##get a summary of this
unk.summary <- get.err.breaks(test.bux.db, label.val="UNK")
table(unk.summary$Sample_Name, unk.summary$inferred_labs)
##use the summary to change the Break_type_label column in the annotation table
head(retrieveData(test.bux.db))
adjust.labels(test.bux.db, unk.summary)
head(retrieveData(test.bux.db))
##additional annotations can be added to the database based on sample ID
sample.labels <- data.frame(samples=c("sample_1","sample_3"), response_type=c("high", "low"),stringsAsFactors=FALSE)
add.labels.by.sample(test.bux.db, sample.labels)
final.dta <- retrieveData(test.bux.db)
head(final.dta)
##should be 'high' for sample_1 and 'low' for sample_3 with NAs for sample_2
table(final.dta$Sample_Name, final.dta$response_type, useNA="ifany")
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(plethy)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/plethy/utilities.Rd_%03d_medium.png", width=480, height=480)
> ### Name: Utility functions
> ### Title: Utility functions to assist with QA/QC and analysis of
> ### plethysmography data
> ### Aliases: add.labels.by.sample get.err.breaks adjust.labels proc.sanity
> ### Keywords: Utilities
>
> ### ** Examples
>
>
> ##set up a test dataset using internal functions
> ##should label sample_1 as ACC and EXP and samples 2 and 3 as UNK
> ##sample_3 should be too divergent from the expected 150 rows, so
> ##the inferred labels should remain 'UNK'
>
> samples=c(NA, "sample_1", NA, "sample_1", "sample_2", "sample_3")
> count = c(NA,900, NA,150, 150, 110)
> measure_break = c(FALSE, FALSE, TRUE, FALSE, FALSE,FALSE)
> table_break = c(TRUE, rep(FALSE, length(samples)-1))
> phase = rep("D1", length(samples))
>
> err.dta <- data.frame(samples=samples, count=count, measure_break=measure_break, table_break=table_break, phase=phase, stringsAsFactors=FALSE)
>
> sim.bux.lines <- plethy:::generate.sample.buxco(err.dta)
>
> temp.file <- tempfile()
> temp.db.file <- tempfile()
> write(sim.bux.lines, file=temp.file)
> test.bux.db <- parse.buxco(file.name=temp.file, db.name=temp.db.file, chunk.size=10000)
Processing /tmp/RtmppH48sN/file102f322b05c5 in chunks of 10000
Starting chunk 1
Reached breakpoint change
Processing breakpoint 1
Starting sample sample_1
Sample written
Reached the end of the file, writing remaining data
Starting sample sample_1
Sample written
Starting sample sample_2
Sample written
Starting sample sample_3
Sample written
> addAnnotation(test.bux.db, query=day.infer.query, index=FALSE)
[1] TRUE
> addAnnotation(test.bux.db, query=break.type.query, index=TRUE)
[1] TRUE
>
> ##quick test of data
>
> test <- proc.sanity(test.bux.db)
Warning message:
In proc.sanity(test.bux.db) : Break_type_labels other than ACC or EXP found
>
> head(test$count)
Sample_Name Variable_Name Days Break_type_label num_entries
1 sample_1 Comp 0 ACC 900
2 sample_1 Comp 0 EXP 150
3 sample_1 EF50 0 ACC 900
4 sample_1 EF50 0 EXP 150
5 sample_1 MVb 0 ACC 900
6 sample_1 MVb 0 EXP 150
>
> test$time
Break_type_label min_seconds max_seconds
1 ACC 0 1798
2 EXP 0 298
3 UNK 0 298
>
> ##get a summary of this
>
> unk.summary <- get.err.breaks(test.bux.db, label.val="UNK")
> table(unk.summary$Sample_Name, unk.summary$inferred_labs)
EXP UNK
sample_2 17 0
sample_3 0 17
>
> ##use the summary to change the Break_type_label column in the annotation table
>
> head(retrieveData(test.bux.db))
Sample_Name P_Time Break_sec_start Variable_Name Bux_table_Name
1 sample_1 2016-07-07 02:33:11 0 f WBPth
2 sample_1 2016-07-07 02:53:53 1242 f WBPth
3 sample_1 2016-07-07 02:54:37 1286 f WBPth
4 sample_1 2016-07-07 02:48:11 900 f WBPth
5 sample_1 2016-07-07 02:37:27 256 f WBPth
6 sample_1 2016-07-07 02:56:45 1414 f WBPth
Rec_Exp_date Break_number Days Break_type_label Value
1 D1 1 0 ACC 0.1151274
2 D1 1 0 ACC 0.6707252
3 D1 1 0 ACC 0.1618515
4 D1 1 0 ACC 0.7811939
5 D1 1 0 ACC 1.2135653
6 D1 1 0 ACC 1.0940928
>
> adjust.labels(test.bux.db, unk.summary)
>
> head(retrieveData(test.bux.db))
Sample_Name P_Time Break_sec_start Variable_Name Bux_table_Name
1 sample_1 2016-07-07 02:33:11 0 f WBPth
2 sample_1 2016-07-07 02:53:53 1242 f WBPth
3 sample_1 2016-07-07 02:54:37 1286 f WBPth
4 sample_1 2016-07-07 02:48:11 900 f WBPth
5 sample_1 2016-07-07 02:37:27 256 f WBPth
6 sample_1 2016-07-07 02:56:45 1414 f WBPth
Rec_Exp_date Break_number Days Break_type_label_orig Break_type_label
1 D1 1 0 ACC ACC
2 D1 1 0 ACC ACC
3 D1 1 0 ACC ACC
4 D1 1 0 ACC ACC
5 D1 1 0 ACC ACC
6 D1 1 0 ACC ACC
Value
1 0.1151274
2 0.6707252
3 0.1618515
4 0.7811939
5 1.2135653
6 1.0940928
>
> ##additional annotations can be added to the database based on sample ID
>
> sample.labels <- data.frame(samples=c("sample_1","sample_3"), response_type=c("high", "low"),stringsAsFactors=FALSE)
>
> add.labels.by.sample(test.bux.db, sample.labels)
>
> final.dta <- retrieveData(test.bux.db)
>
> head(final.dta)
Sample_Name P_Time Break_sec_start Variable_Name Bux_table_Name
1 sample_1 2016-07-07 02:33:11 0 f WBPth
2 sample_1 2016-07-07 02:53:53 1242 f WBPth
3 sample_1 2016-07-07 02:54:37 1286 f WBPth
4 sample_1 2016-07-07 02:48:11 900 f WBPth
5 sample_1 2016-07-07 02:37:27 256 f WBPth
6 sample_1 2016-07-07 02:56:45 1414 f WBPth
Rec_Exp_date Break_number Days Break_type_label_orig Break_type_label
1 D1 1 0 ACC ACC
2 D1 1 0 ACC ACC
3 D1 1 0 ACC ACC
4 D1 1 0 ACC ACC
5 D1 1 0 ACC ACC
6 D1 1 0 ACC ACC
response_type Value
1 high 0.1151274
2 high 0.6707252
3 high 0.1618515
4 high 0.7811939
5 high 1.2135653
6 high 1.0940928
>
> ##should be 'high' for sample_1 and 'low' for sample_3 with NAs for sample_2
>
> table(final.dta$Sample_Name, final.dta$response_type, useNA="ifany")
high low <NA>
sample_1 17850 0 0
sample_2 0 0 2550
sample_3 0 1870 0
>
>
>
>
>
>
> dev.off()
null device
1
>