R: Statistical Analysis Of Amplicon Data Of The Same Sample To...

AmpliconDuo-package

R Documentation

Statistical Analysis Of Amplicon Data Of The Same Sample To Identify Artefacts

Description

Increasingly powerful techniques for high-throughput sequencing open the
possibility to comprehensively characterize microbial communities,
including rare species. However, a still unresolved issue are the
substantial error rates in the experimental process generating these
sequences. To overcome these limitations we propose an approach, where
each sample is split and the same amplification and sequencing protocol
is applied to both halves. This procedure should allow to detect likely PCR and sequencing artifacts,
and true rare species by comparison of the results of both parts.

The AmpliconDuo package, whereas ampliconduo from here on refers to the two amplicon data sets of a split sample,
is intended to help interpret the obtained amplicon frequency distribution across split samples,
and to filter the false positive amplicons.

Details

Package:

AmpliconDuo

Type:

Package

Version:

1.1

Date:

2016-01-14

License:

GPL-2

The core of this package is the ampliconduo function, that generates for each pair of a split samples an ampliconduo data frame, while statistically analysing the data by Fisher's exact test.
Ampliconduo data frames, or lists of these, are the input required for all other functions of this package.

plotAmpliconduo
plots for an ampliconduo the amplicon frequencies (number of reads per amplicon) of sample A vs. amplicon frequencies of sample B, highlighting amplicons displaying a significant deviation between both samples.

plotAmpliconduo.set
does the same as plotAmpliconduo but accepts a list of ampliconduo data frames and arranges the plots in a 2-dimensional array.

plotORdensity
generates a histogram plot of the amplicon frequency odds ratio density for an ampliconduo data frame. For multiple data frames
organizes the plots in a 2-dimentional array.

discordance.delta
calculates delta (Δ) and delta prime (Δ'), the fraction of amplicon frequencies and amplicons, respectively, with a false discovery rate below a certain threshold θ as a measure of discordance between two amplicon data sets A and B.

filter.ampliconduo
applies filter criteria to an ampliconduo data frame deciding which amplicons are going to be rejected.

filter.ampliconduo.set
same as filter.ampliconduo for a list af ampliconduo data frames.

accepted.amplicons
returns the indices of those amplicons that have passed the filter criteria.

Lange A, Jost S, Heider D, Bock C, Budeus B, Schilling E, Strittmatter A, Boenigk J, Hoffmann D: AmpliconDuo: A Split-Sample Filtering Protocol for High-Throughput Amplicon Sequencing of Microbial Communities.
(PLoS One. 2015 Nov 2;10(11))

Examples

## load test amplicon frequency data ampliconfreqs and vector with sample names site.f
data(ampliconfreqs)
data(site.f)
## generating ampliconduo data frames
## depending on the size if the data sets, may take some time
ampliconduoset <- ampliconduo(ampliconfreqs[,1:4], sample.names = site.f[1:2])
## plot amplicon read numbers of sample A vs. amplicon read numbers of sample B,
## indicating amplicons with significant deviations in their occurence across samples
plotAmpliconduo.set(ampliconduoset, nrow = 3)
## calculate discordance between the two data sets of an ampliconduo
discordance <- discordance.delta(ampliconduoset)
## plot the odds ratio density of ampliconduo data
plotORdensity(ampliconduoset)
## apply filter criteria to remove/mark spurious amplicons
ampliconduoset.f <- filter.ampliconduo.set(ampliconduoset, min.freq = 1, q = 0.05)
## return indices of accepted amplicons, indices correspond to indices of the ampliconfreqs data,
## that were used as input for the ampliconduo function
accep.reads <- accepted.amplicons(ampliconduoset.f)

Results

R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(AmpliconDuo)
Loading required package: ggplot2
Loading required package: xtable
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/AmpliconDuo/AmpliconDuo-package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: AmpliconDuo-package
> ### Title: Statistical Analysis Of Amplicon Data Of The Same Sample To
> ### Identify Artefacts
> ### Aliases: AmpliconDuo-package AmpliconDuo
> ### Keywords: package
>
> ### ** Examples
>
>
> ## load test amplicon frequency data ampliconfreqs and vector with sample names site.f
> data(ampliconfreqs)
> data(site.f)
>
> ## generating ampliconduo data frames
> ## depending on the size if the data sets, may take some time
> ampliconduoset <- ampliconduo(ampliconfreqs[,1:4], sample.names = site.f[1:2])
..>
> ## plot amplicon read numbers of sample A vs. amplicon read numbers of sample B,
> ## indicating amplicons with significant deviations in their occurence across samples
> plotAmpliconduo.set(ampliconduoset, nrow = 3)
>
> ## calculate discordance between the two data sets of an ampliconduo
> discordance <- discordance.delta(ampliconduoset)
>
> ## plot the odds ratio density of ampliconduo data
> plotORdensity(ampliconduoset)
>
> ## apply filter criteria to remove/mark spurious amplicons
> ampliconduoset.f <- filter.ampliconduo.set(ampliconduoset, min.freq = 1, q = 0.05)
>
> ## return indices of accepted amplicons, indices correspond to indices of the ampliconfreqs data,
> ## that were used as input for the ampliconduo function
> accep.reads <- accepted.amplicons(ampliconduoset.f)
>
>
>
>
>
> dev.off()
null device
1
>