Last data update: 2014.03.03

R: Reading and writing sample data from / to a tally file
getSampleDataR Documentation

Reading and writing sample data from / to a tally file

Description

These functions allow reading and writing of sample data to the HDF5-based tally files. The sample data is stored as group attribute.

Usage

getSampleData( filename, group )
setSampleData( filename, group, sampleData, largeAttributes = FALSE, stringSize = 64 )

Arguments

filename

The name of a tally file

group

The name of a group within that tally file, e.g. /ExampleStudy/22

sampleData

A data.frame with k rows (one for each sample) and columns Type, Column and (SampleGroup or Patient. Additional column will be added as well but are not required.)

largeAttributes

HDF5 limits the size of attributes to 64KB, if you have many samples setting this flag will write the attributes in a separate dataset instead. getSampleData is aware of this and automatically chooses the dataset-stored attributes if they are present

stringSize

Maximum length for string attributes (number of characters) - default of 64 characters should be fine for most cases; This has to be specified since we do not support variable length strings as of now.

Details

The returned data.frame contains information about the sample ids, sample columns in the sample dimension of the dataset. The type of sample must be one of c("Case","Control") to be used with the provided SNV calling function. Additional relevant per-sample information may be stored here.

Note that the following columns are required in the sample data where the rows represent samples in the cohort:

Sample: the sample id of the corresponding sample

Column: the index within the genomic position dimension of the corresponding sample, be aware that getSampleData and setSampleData automatically add / remove 1 from this value since internally the tally files store the dimension 0-based whereas within R we count 1-based.

Patient the patient id of the corresponding sample

Type the type of sample

Value

sampledata

A data.frame with k rows (one for each sample) and columns Type, Column and (SampleGroup or Patient).

Author(s)

Paul Pyl

Examples

  # loading library and example data
  library(h5vc)
  tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
  sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
  sampleData
  # modify  the sample data
  sampleData$AnotherColumn <- paste( sampleData$Patient, "Modified" )
  # write to tallyFile
  setSampleData( tallyFile, "/ExampleStudy/16", sampleData )
  # re-load and check if it worked
  sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
  sampleData

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(h5vc)
Loading required package: grid
Loading required package: gridExtra
Loading required package: ggplot2
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/h5vc/getSampleData.Rd_%03d_medium.png", width=480, height=480)
> ### Name: getSampleData
> ### Title: Reading and writing sample data from / to a tally file
> ### Aliases: getSampleData setSampleData
> 
> ### ** Examples
> 
>   # loading library and example data
>   library(h5vc)
>   tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
>   sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
>   sampleData
                     SampleFiles           Sample Column    Type  Patient
1     ../Input/PT8PrimaryDNA.bam    PT8PrimaryDNA      6    Case Patient8
2     ../Input/PT5PrimaryDNA.bam    PT5PrimaryDNA      2    Case Patient5
3     ../Input/PT5RelapseDNA.bam    PT5RelapseDNA      3    Case Patient5
4 ../Input/PT8PreLeukemiaDNA.bam PT8EarlyStageDNA      5    Case Patient8
5     ../Input/PT5ControlDNA.bam    PT5ControlDNA      1 Control Patient5
6     ../Input/PT8ControlDNA.bam    PT8ControlDNA      4 Control Patient8
>   # modify  the sample data
>   sampleData$AnotherColumn <- paste( sampleData$Patient, "Modified" )
>   # write to tallyFile
>   setSampleData( tallyFile, "/ExampleStudy/16", sampleData )
>   # re-load and check if it worked
>   sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
>   sampleData
      AnotherColumn Column  Patient           Sample
1 Patient8 Modified      6 Patient8    PT8PrimaryDNA
2 Patient5 Modified      2 Patient5    PT5PrimaryDNA
3 Patient5 Modified      3 Patient5    PT5RelapseDNA
4 Patient8 Modified      5 Patient8 PT8EarlyStageDNA
5 Patient5 Modified      1 Patient5    PT5ControlDNA
6 Patient8 Modified      4 Patient8    PT8ControlDNA
                     SampleFiles    Type
1     ../Input/PT8PrimaryDNA.bam    Case
2     ../Input/PT5PrimaryDNA.bam    Case
3     ../Input/PT5RelapseDNA.bam    Case
4 ../Input/PT8PreLeukemiaDNA.bam    Case
5     ../Input/PT5ControlDNA.bam Control
6     ../Input/PT8ControlDNA.bam Control
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>