Last data update: 2014.03.03

R: Class to contain Amplicon Variant Analyzer Output
AVASet-classR Documentation

Class to contain Amplicon Variant Analyzer Output

Description

Container to store data imported from a project of Roche's Amplicon Variant Analyzer Software. It stores all information into an extended version of the Biobase ExpressionSet.

Objects from the Class

Objects can be created by calls of the form AVASet(dirname, avaBin). dirname is a character giving the proejct directory and avaBin is a character giving the path to the AVA software installation (i.e. the directory containing the doAmplicon binary). The constructor will start the AVA software command line and import all necessary data.

If the AVA software is not installed on the same machine that runs R, all data must be exported manually using the AVA Command Line Interface (AVA-CLI). After having exported all text files, the constructor AVASet(dirname, avaBin, file_sample, file_amp, file_reference, file_variant, file_variantHits) can be used to import them. See the example below.

Finally, old project folders generated by AVA software < 2.6 can be imported using AVASet(dirname). Where dirname is the path to the project folder (i.e. a directory that contains the files and subdirectories "Amplicons/ProjectDef/ampliconsProject.txt", "Amplicons/Results/Variants/currentVariantDefs.txt", "Amplicons/Results/Variants", "Amplicons/Results/Align").

Slots

assayData:

Object of class AssayData. Contains the number of reads and the total read depth for every variant and each sample in forward and reverse direction. Its column number equals nrow(phenoData).

featureData:

Object of class AnnotatedDataFrame. Contains information about the type, position and reference of each variant.

phenoData:

Object of class AnnotatedDataFrame. Contains the sample-IDs and name, annotation and group of the read data for all samples. If available, the lane, pico titer plate(s) (PTP) or MID(s) of each sample are shown as well.

assayDataAmp:

Object of class AssayData. Contains the number of reads for every amplicon and each sample in forward/reverse direction. Its column number equals nrow(featureDataAmp).

featureDataAmp:

Object of class AnnotatedDataFrame. Contains the primer sequences, reference sequences and the coordinates of the target regions for every amplicon.

referenceSequences:

Object of class AlignedRead. If additional alignment information were computed via alignShortReads, this slot knows about the chromosome, position and the strand of each reference sequence.

variantFilterPerc:

Object of class numeric. Contains a threshold to display only those variants, whose coverage (in percent) in forward and reverse direction in at least one sample is higher than this filter value. See setVariantFilter for details about setting this value.

variantFilter:

Object of class character. Contains a vector of variant names whose coverage (in percent) in forward and reverse direction in at least one sample is higher than the filter value in variantFilterPerc.

dirs:

Object of class character. Based on a directory given at instantiation of the object, it contains a vector of several directories containing all relevant AVA-project files.

experimentData:

Object of class MIAME. Contains details of the experiment.

annotation:

Object of class character. Label associated with the annotation package used in the experiment.

protocolData:

Object of class annotatedDataFrame. Contains additional information about the samples.

.__classVersion__:

Object of class Versions. Remembers the R and R453Toolbox version numbers used to created the AVASet instance.

Extends

Class eSet, directly. Class VersionedBiobase, by class "eSet", distance 2. Class Versioned, by class "eSet", distance 3.

Methods

object[i,j]:

Allows subsetting an AVASet object by features (i) and samples (j).

assayDataAmp(object), assayDataAmp(object)<-value:

Similar to assayData of the Biobase ExpressionSet, this function returns/replaces the amplicon assay data.

fDataAmp(object):

Similar to fData of the Biobase ExpressionSet, this function returns the amplicon feature data as a data frame.

featureDataAmp(object), featureDataAmp(object)<-value:

Similar to featureData of the Biobase ExpressionSet, this function returns/replaces the amplicon feature data and feature meta.

referenceSequences(object), referenceSequences(object)<-value:

Returns/replaces the reference sequence slot.

alignShortReads(object, bsGenome):

Retrieve the chromosomal positions of the amplicon sequences.

setVariantFilter(object):

Sets the filter to display only those variants, whose coverage (in percent) in forward and reverse direction in at least one sample is higher than the given value.

getVariantPercentages(object)

Computes the coverage for every variant over all reads (forward and/or reverse) and for each sample.

annotateVariants(object):

Annotates given genomic variants. See annotateVariants for details.

htmlReport(object):

Exports all (filtered) variant data into a html report. See htmlReport for details

Author(s)

Christoph Bartenhagen

See Also

MapperSet-class, annotateVariants, alignShortReads, htmlReport, setVariantFilter, getVariantPercentages

Examples


    # sum up class structure
    showClass("AVASet")

    # load an AVA dataset containing 6 samples, 4 amplicons and 259 variants
    data(avaSetExample)
    avaSetExample

    # show contents of assay, feature and pheno data
    head(assayData(avaSetExample)$variantForwCount)
    head(assayData(avaSetExample)$totalForwCount)
    head(assayData(avaSetExample)$variantRevCount)
    head(assayData(avaSetExample)$totalRevCount)
    head(fData(avaSetExample))
    pData(avaSetExample)
    assayDataAmp(avaSetExample)
    fDataAmp(avaSetExample)
    referenceSequences(avaSetExample)

    # Use these commands to export a project from within the AVA-CLI (doAmplicon):
    # > list sample -outputFile sample.csv
    # > list amplicon -outputFile amp.csv
    # > list reference -outputFile reference.csv
    # > list variant -outputFile variant.csv
    # > report variantHits -outputFile variantHits.csv

    # Load an AVA dataset containing 6 samples, 4 amplicons and 222 variants
    # by specifying five files, that were exported with the AVA-CLI:
    projectDir = system.file("extdata", "AVASet_doAmplicon", package="R453Plus1Toolbox")
    avaSetExample = AVASet(dirname=projectDir, file_sample="sample.csv", file_amp="amp.csv", file_reference="reference.csv", file_variant="variant.csv", file_variantHits="variantHits.csv")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(R453Plus1Toolbox)
Loading required package: VariantAnnotation
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector

Attaching package: 'VariantAnnotation'

The following object is masked from 'package:base':

    tabulate

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/R453Plus1Toolbox/AVASet-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: AVASet-class
> ### Title: Class to contain Amplicon Variant Analyzer Output
> ### Aliases: AVASet-class [,AVASet,ANY,ANY-method
> ###   annotateVariants,AVASet-method assayDataAmp,AVASet-method
> ###   assayDataAmp<- assayDataAmp<-,AVASet,AssayData-method
> ###   fDataAmp,AVASet-method featureDataAmp,AVASet-method featureDataAmp<-
> ###   featureDataAmp<-,AVASet,AnnotatedDataFrame-method
> ###   htmlReport,AVASet-method
> ###   alignShortReads,AVASet,DNAStringSet,character-method
> ###   referenceSequences,AVASet-method referenceSequences<-
> ###   referenceSequences<-,AVASet,AlignedRead-method
> ###   setVariantFilter,AVASet-method getVariantPercentages,AVASet-method
> ### Keywords: classes
> 
> ### ** Examples
> 
> 
>     # sum up class structure
>     showClass("AVASet")
Class "AVASet" [package "R453Plus1Toolbox"]

Slots:
                                                               
Name:        assayDataAmp     featureDataAmp referenceSequences
Class:          AssayData AnnotatedDataFrame        AlignedRead
                                                               
Name:   variantFilterPerc      variantFilter               dirs
Class:            numeric          character          character
                                                               
Name:           assayData          phenoData        featureData
Class:          AssayData AnnotatedDataFrame AnnotatedDataFrame
                                                               
Name:      experimentData         annotation       protocolData
Class:              MIAxE          character AnnotatedDataFrame
                         
Name:   .__classVersion__
Class:           Versions

Extends: 
Class "eSet", directly
Class "VersionedBiobase", by class "eSet", distance 2
Class "Versioned", by class "eSet", distance 3
> 
>     # load an AVA dataset containing 6 samples, 4 amplicons and 259 variants
>     data(avaSetExample)
>     avaSetExample
Variants: 
AVASet (storageMode: list)
assayData: 259 features, 6 samples 
  element names: variantForwCount, totalForwCount, variantRevCount, totalRevCount 
protocolData: none
phenoData
  sampleNames: Sample_1 Sample_2 ... Sample_6 (6 total)
  varLabels: SampleID MID1 ... Annotation (7 total)
  varMetadata: labelDescription
featureData
  featureNames: C1438 C369 ... C763 (259 total)
  fvarLabels: name canonicalPattern ... referenceBases (7 total)
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:  

Amplicons: 
assayDataAmp:4 features,  6 samples
  element names:forwCountrevCount
featureDataAmp: 
An object of class 'AnnotatedDataFrame'
  rowNames: TET2_E11.04 TET2_E06 TET2_E11.03 TET2_E04
  varLabels: ampID primer1 ... targetStart (6 total)
  varMetadata: labelDescription

Reference sequences: 
class: AlignedRead
length: 4 reads; width: 339..346 cycles
chromosome: NA NA NA NA 
position: 1 1 1 1 
strand: NA NA NA NA 
alignQuality: NumericQuality 
alignData varLabels: name refSeqID gene 
> 
>     # show contents of assay, feature and pheno data
>     head(assayData(avaSetExample)$variantForwCount)
      Sample_1 Sample_2 Sample_3 Sample_4 Sample_5 Sample_6
C1438        0        0        0        0        0        0
C369         0        0        0        1        0        0
C595         0        0        0        0        0        0
C397         0        0        0        0        0        0
C30          0        5        0        0        0        0
C1699        0        0        0        0        0        0
>     head(assayData(avaSetExample)$totalForwCount)
      Sample_1 Sample_2 Sample_3 Sample_4 Sample_5 Sample_6
C1438      119     1516      137     1729     1288      140
C369       267     1152      195     1518     1016      190
C595       258     1805      230     1885     1775      221
C397       258     1805      230     1885     1775      221
C30        119     1516      137     1729     1288      140
C1699      119     1516      137     1729     1288      140
>     head(assayData(avaSetExample)$variantRevCount)
      Sample_1 Sample_2 Sample_3 Sample_4 Sample_5 Sample_6
C1438        0        0        0        0        0        0
C369         0        0        0       11        0        0
C595         0        0        0        0        0        0
C397         0        0        0        0        0        0
C30          0        6        0        0        0        0
C1699        0        0        0        0        0        0
>     head(assayData(avaSetExample)$totalRevCount)
      Sample_1 Sample_2 Sample_3 Sample_4 Sample_5 Sample_6
C1438      162     2020      188     2270     1488      159
C369       192     1586      192     1934     1198      137
C595       172     2169      239     2160     2127      160
C397       172     2169      239     2160     2127      160
C30        162     2020      188     2270     1488      159
C1699      162     2020      188     2270     1488      159
>     head(fData(avaSetExample))
         name canonicalPattern referenceSeqID start end variantBase
C1438 303:T/C         s(303,C)            I37   303 303           C
C369  309:T/C         s(309,C)            I36   309 309           C
C595  108:T/C         s(108,C)            I40   108 108           C
C397  246:A/G         s(246,G)            I40   246 246           G
C30   225:A/G         s(225,G)            I37   225 225           G
C1699  28:T/C          s(28,C)            I37    28  28           C
      referenceBases
C1438              T
C369               T
C595               T
C397               A
C30                A
C1699              T
>     pData(avaSetExample)
         SampleID MID1 MID2 PTP_AccNum Lane ReadGroup
Sample_1    I9646 Mid3 Mid3    GGSFDBH   07 ReadGrp_7
Sample_2     I116 Mid1 Mid1    GA0582C   01 ReadGrp_1
Sample_3    I9644 Mid1 Mid1    GGSFDBH   07 ReadGrp_7
Sample_4     I118 Mid3 Mid3    GA0582C   01 ReadGrp_1
Sample_5     I117 Mid2 Mid2    GA0582C   01 ReadGrp_1
Sample_6    I9645 Mid2 Mid2    GGSFDBH   07 ReadGrp_7
                                Annotation
Sample_1 Run #006 - PTP 731232 - 05MAY2010
Sample_2                                 -
Sample_3 Run #006 - PTP 731232 - 05MAY2010
Sample_4                                 -
Sample_5                                 -
Sample_6 Run #006 - PTP 731232 - 05MAY2010
>     assayDataAmp(avaSetExample)
$forwCount
            Sample_1 Sample_2 Sample_3 Sample_4 Sample_5 Sample_6
TET2_E11.04      119     1516      137     1729     1288      140
TET2_E06         248      400      224      478      339      204
TET2_E11.03      267     1152      195     1518     1016      190
TET2_E04         258     1805      230     1885     1775      221

$revCount
            Sample_1 Sample_2 Sample_3 Sample_4 Sample_5 Sample_6
TET2_E11.04      162     2020      188     2270     1488      159
TET2_E06         236     2094      255     2171     1624      181
TET2_E11.03      192     1586      192     1934     1198      137
TET2_E04         172     2169      239     2160     2127      160

>     fDataAmp(avaSetExample)
            ampID                 primer1                 primer2
TET2_E11.04   I90 CATTCACCTTCTCACATAATCCA   GAATTGACCCATGAGTTGGAG
TET2_E06      I81    TGCAAGTGACCCTTGTTTTG    AACCAAAGATTGGGCTTTCC
TET2_E11.03   I89    GCTCAGTCTACCACCCATCC    AGATGCAGGGCATGAAGAGA
TET2_E04      I79    GGGGTTAAGCTTTGTGGATG TTGTGACTCTCTGGTGAATAGCA
            referenceSeqID targetEnd targetStart
TET2_E11.04            I37       325          24
TET2_E06               I42       321          21
TET2_E11.03            I36       319          21
TET2_E04               I40       322          21
>     referenceSequences(avaSetExample)
class: AlignedRead
length: 4 reads; width: 339..346 cycles
chromosome: NA NA NA NA 
position: 1 1 1 1 
strand: NA NA NA NA 
alignQuality: NumericQuality 
alignData varLabels: name refSeqID gene 
> 
>     # Use these commands to export a project from within the AVA-CLI (doAmplicon):
>     # > list sample -outputFile sample.csv
>     # > list amplicon -outputFile amp.csv
>     # > list reference -outputFile reference.csv
>     # > list variant -outputFile variant.csv
>     # > report variantHits -outputFile variantHits.csv
> 
>     # Load an AVA dataset containing 6 samples, 4 amplicons and 222 variants
>     # by specifying five files, that were exported with the AVA-CLI:
>     projectDir = system.file("extdata", "AVASet_doAmplicon", package="R453Plus1Toolbox")
>     avaSetExample = AVASet(dirname=projectDir, file_sample="sample.csv", file_amp="amp.csv", file_reference="reference.csv", file_variant="variant.csv", file_variantHits="variantHits.csv")
Reading sample data ... done
Reading reference sequences ... done
Reading variant data ... done
Reading amplicon data ... done
There were 24 warnings (use warnings() to see them)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>