Last data update: 2014.03.03

R: Utility functions to assist with QA/QC and analysis of...
Utility functionsR Documentation

Utility functions to assist with QA/QC and analysis of plethysmography data

Description

After creation of a database, often additional data needs to be added or modified. These functions assist with the common tasks that occur when working with Buxco whole body plethysmography data such as adding labels based on the sample IDs in the case of add.labels.by.sample or modifying labels that have previously been added in the case of adjust.labels. The get.err.breaks function produces a summary of the samples and timepoints that have the specified value for the 'Break_type_label' column (such as 'ERR' or 'UNK') and whether they are close to the expected value for either and experimental or acclimation run. This can occur if there was only an experimental run for some samples or if other anomalies occured. The user can then inspect these new labels wihtin the data.frame, modify them manually if necessary and use the data.frame as input to the adjust.labels function which replaces the original labels and moves the original labels to another column for future reference.

Usage

add.labels.by.sample(bux.db, sample.labels)
get.err.breaks(bux.db, max.exp.count=150, max.acc.count=900, vary.perc=.1, label.val="ERR")
adjust.labels(bux.db, err.breaks.dta)
proc.sanity(bux.db, max.exp.time=300, max.acc.time=1800, max.exp.count=150, max.acc.count=900)

Arguments

bux.db

An object of class BuxcoDB

sample.labels

A data.frame with a column named 'samples' and optionally a column named 'phase' with values corresponding to the sample names and Phase values (e.g. recorded experimental timepoint) in the database. The other columns will be add to the annotation table and any sample not included in the data.frame will have their labels set to NULL.

err.breaks.dta

A data.frame produced by get.err.breaks function.

max.exp.time

The maximum time a given experimental run should take in seconds

max.acc.time

The maximum time a given acclimation run should take in seconds

max.exp.count

The maximum number of records expected for the experimental run.

max.acc.count

The maximum value of records expected for the acclimation run.

vary.perc

The size of a percent decrease relative to the maximum experimental or acclimation run tolerated and still allow assignment to that category. Needs to be a value between 0 and 1.

label.val

A single character string observed in the Break_type_labels column of the annotation table (cannot be 'ACC' or 'EXP').

Value

add.labels.by.sample and adjust.labels modify tables in the SQLite database pointed to in the BuxcoDB object so nothing is returned. get.err.breaks returns a data.frame summarizing the samples and timepoints with a given label.var.

Author(s)

Daniel Bottomly

See Also

parse.buxco,BuxcoDB

Examples

	
##set up a test dataset using internal functions
##should label sample_1 as ACC and EXP and samples 2 and 3 as UNK
##sample_3 should be too divergent from the expected 150 rows, so 
##the inferred labels should remain 'UNK'
	
samples=c(NA, "sample_1", NA, "sample_1", "sample_2", "sample_3")
count = c(NA,900, NA,150, 150, 110)
measure_break = c(FALSE, FALSE, TRUE, FALSE, FALSE,FALSE)
table_break = c(TRUE, rep(FALSE, length(samples)-1))
phase = rep("D1", length(samples))
    
err.dta <- data.frame(samples=samples, count=count, measure_break=measure_break, table_break=table_break, phase=phase, stringsAsFactors=FALSE)
    
sim.bux.lines <- plethy:::generate.sample.buxco(err.dta)
    
temp.file <- tempfile()
temp.db.file <- tempfile()
write(sim.bux.lines, file=temp.file)
test.bux.db <- parse.buxco(file.name=temp.file, db.name=temp.db.file, chunk.size=10000)
addAnnotation(test.bux.db, query=day.infer.query, index=FALSE)
addAnnotation(test.bux.db, query=break.type.query, index=TRUE)

##quick test of data

test <- proc.sanity(test.bux.db)

head(test$count)

test$time

##get a summary of this
	
unk.summary <- get.err.breaks(test.bux.db, label.val="UNK")
table(unk.summary$Sample_Name, unk.summary$inferred_labs)
	
##use the summary to change the Break_type_label column in the annotation table
	
head(retrieveData(test.bux.db))
	
adjust.labels(test.bux.db, unk.summary)
	
head(retrieveData(test.bux.db))
	
##additional annotations can be added to the database based on sample ID
	
sample.labels <- data.frame(samples=c("sample_1","sample_3"), response_type=c("high", "low"),stringsAsFactors=FALSE)
	
add.labels.by.sample(test.bux.db, sample.labels)
	
final.dta <- retrieveData(test.bux.db)
	
head(final.dta)
	
##should be 'high' for sample_1 and 'low' for sample_3 with NAs for sample_2

table(final.dta$Sample_Name, final.dta$response_type, useNA="ifany")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(plethy)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/plethy/utilities.Rd_%03d_medium.png", width=480, height=480)
> ### Name: Utility functions
> ### Title: Utility functions to assist with QA/QC and analysis of
> ###   plethysmography data
> ### Aliases: add.labels.by.sample get.err.breaks adjust.labels proc.sanity
> ### Keywords: Utilities
> 
> ### ** Examples
> 
> 	
> ##set up a test dataset using internal functions
> ##should label sample_1 as ACC and EXP and samples 2 and 3 as UNK
> ##sample_3 should be too divergent from the expected 150 rows, so 
> ##the inferred labels should remain 'UNK'
> 	
> samples=c(NA, "sample_1", NA, "sample_1", "sample_2", "sample_3")
> count = c(NA,900, NA,150, 150, 110)
> measure_break = c(FALSE, FALSE, TRUE, FALSE, FALSE,FALSE)
> table_break = c(TRUE, rep(FALSE, length(samples)-1))
> phase = rep("D1", length(samples))
>     
> err.dta <- data.frame(samples=samples, count=count, measure_break=measure_break, table_break=table_break, phase=phase, stringsAsFactors=FALSE)
>     
> sim.bux.lines <- plethy:::generate.sample.buxco(err.dta)
>     
> temp.file <- tempfile()
> temp.db.file <- tempfile()
> write(sim.bux.lines, file=temp.file)
> test.bux.db <- parse.buxco(file.name=temp.file, db.name=temp.db.file, chunk.size=10000)
Processing /tmp/RtmppH48sN/file102f322b05c5 in chunks of 10000
Starting chunk 1
Reached breakpoint change
Processing breakpoint 1
Starting sample sample_1
Sample written
Reached the end of the file, writing remaining data
Starting sample sample_1
Sample written
Starting sample sample_2
Sample written
Starting sample sample_3
Sample written
> addAnnotation(test.bux.db, query=day.infer.query, index=FALSE)
[1] TRUE
> addAnnotation(test.bux.db, query=break.type.query, index=TRUE)
[1] TRUE
> 
> ##quick test of data
> 
> test <- proc.sanity(test.bux.db)
Warning message:
In proc.sanity(test.bux.db) : Break_type_labels other than ACC or EXP found
> 
> head(test$count)
  Sample_Name Variable_Name Days Break_type_label num_entries
1    sample_1          Comp    0              ACC         900
2    sample_1          Comp    0              EXP         150
3    sample_1          EF50    0              ACC         900
4    sample_1          EF50    0              EXP         150
5    sample_1           MVb    0              ACC         900
6    sample_1           MVb    0              EXP         150
> 
> test$time
  Break_type_label min_seconds max_seconds
1              ACC           0        1798
2              EXP           0         298
3              UNK           0         298
> 
> ##get a summary of this
> 	
> unk.summary <- get.err.breaks(test.bux.db, label.val="UNK")
> table(unk.summary$Sample_Name, unk.summary$inferred_labs)
          
           EXP UNK
  sample_2  17   0
  sample_3   0  17
> 	
> ##use the summary to change the Break_type_label column in the annotation table
> 	
> head(retrieveData(test.bux.db))
  Sample_Name              P_Time Break_sec_start Variable_Name Bux_table_Name
1    sample_1 2016-07-07 02:33:11               0             f          WBPth
2    sample_1 2016-07-07 02:53:53            1242             f          WBPth
3    sample_1 2016-07-07 02:54:37            1286             f          WBPth
4    sample_1 2016-07-07 02:48:11             900             f          WBPth
5    sample_1 2016-07-07 02:37:27             256             f          WBPth
6    sample_1 2016-07-07 02:56:45            1414             f          WBPth
  Rec_Exp_date Break_number Days Break_type_label     Value
1           D1            1    0              ACC 0.1151274
2           D1            1    0              ACC 0.6707252
3           D1            1    0              ACC 0.1618515
4           D1            1    0              ACC 0.7811939
5           D1            1    0              ACC 1.2135653
6           D1            1    0              ACC 1.0940928
> 	
> adjust.labels(test.bux.db, unk.summary)
> 	
> head(retrieveData(test.bux.db))
  Sample_Name              P_Time Break_sec_start Variable_Name Bux_table_Name
1    sample_1 2016-07-07 02:33:11               0             f          WBPth
2    sample_1 2016-07-07 02:53:53            1242             f          WBPth
3    sample_1 2016-07-07 02:54:37            1286             f          WBPth
4    sample_1 2016-07-07 02:48:11             900             f          WBPth
5    sample_1 2016-07-07 02:37:27             256             f          WBPth
6    sample_1 2016-07-07 02:56:45            1414             f          WBPth
  Rec_Exp_date Break_number Days Break_type_label_orig Break_type_label
1           D1            1    0                   ACC              ACC
2           D1            1    0                   ACC              ACC
3           D1            1    0                   ACC              ACC
4           D1            1    0                   ACC              ACC
5           D1            1    0                   ACC              ACC
6           D1            1    0                   ACC              ACC
      Value
1 0.1151274
2 0.6707252
3 0.1618515
4 0.7811939
5 1.2135653
6 1.0940928
> 	
> ##additional annotations can be added to the database based on sample ID
> 	
> sample.labels <- data.frame(samples=c("sample_1","sample_3"), response_type=c("high", "low"),stringsAsFactors=FALSE)
> 	
> add.labels.by.sample(test.bux.db, sample.labels)
> 	
> final.dta <- retrieveData(test.bux.db)
> 	
> head(final.dta)
  Sample_Name              P_Time Break_sec_start Variable_Name Bux_table_Name
1    sample_1 2016-07-07 02:33:11               0             f          WBPth
2    sample_1 2016-07-07 02:53:53            1242             f          WBPth
3    sample_1 2016-07-07 02:54:37            1286             f          WBPth
4    sample_1 2016-07-07 02:48:11             900             f          WBPth
5    sample_1 2016-07-07 02:37:27             256             f          WBPth
6    sample_1 2016-07-07 02:56:45            1414             f          WBPth
  Rec_Exp_date Break_number Days Break_type_label_orig Break_type_label
1           D1            1    0                   ACC              ACC
2           D1            1    0                   ACC              ACC
3           D1            1    0                   ACC              ACC
4           D1            1    0                   ACC              ACC
5           D1            1    0                   ACC              ACC
6           D1            1    0                   ACC              ACC
  response_type     Value
1          high 0.1151274
2          high 0.6707252
3          high 0.1618515
4          high 0.7811939
5          high 1.2135653
6          high 1.0940928
> 	
> ##should be 'high' for sample_1 and 'low' for sample_3 with NAs for sample_2
> 
> table(final.dta$Sample_Name, final.dta$response_type, useNA="ifany")
          
            high   low  <NA>
  sample_1 17850     0     0
  sample_2     0     0  2550
  sample_3     0  1870     0
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>