If for a given fragment the number of observed unique barcodes is equal to
the total barcode complexity (all combinations of barcodes are associated
with a given fragment), then the readsamples function assignes infinite EUC.
This can be corrected by the function correct_oversaturation(). By comparing
observed read counts with EUCs for other fragments it calculates the
correction factor.
Then, for the oversaturated fragments it multiplies the observed read counts
by the correction factor to estimate EUC. The assumption behind this
correction is that fragments have similar rate of PCR duplicates production.
Usage
correct_oversaturation(euc_GR, read_counts_file)
Arguments
euc_GR
GRanges produced by readsamples() function
read_counts_file
path to a file with observed read counts.
Value
euc_GR GRanges analogous to the readsamples() function output,
but with finite EUCs where infinity was present.
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(RNAprobR)
Loading required package: GenomicFeatures
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: plyr
Attaching package: 'plyr'
The following object is masked from 'package:IRanges':
desc
The following object is masked from 'package:S4Vectors':
rename
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/RNAprobR/correct_oversaturation.Rd_%03d_medium.png", width=480, height=480)
> ### Name: correct_oversaturation
> ### Title: Correcting EUC of oversaturated fragments.
> ### Aliases: correct_oversaturation
>
> ### ** Examples
>
> write(c("DummyRNA\t1\t2\t1000", "DummyRNA\t3\t4\t1024"),
+ file="dummy_unique_barcode")
> write(c("DummyRNA\t1\t2\t5000", "DummyRNA\t3\t4\t10000"),
+ file="dummy_read_counts")
> my_EUCs <- readsamples(samples = "dummy_unique_barcode", euc = "Fu", m=1024)
Reporting estimated unique counts according to Fu et al.
Barcodes oversaturated. Inf returned. Running correct_oversaturation()strongly recommended.
> correct_oversaturation(euc_GR = my_EUCs,
+ read_counts_file = "dummy_read_counts")
Reporting unique barcodes count, no EUC calculation
GRanges object with 2 ranges and 1 metadata column:
seqnames ranges strand | EUC
<Rle> <IRanges> <Rle> | <numeric>
[1] DummyRNA [1, 2] + | 3842
[2] DummyRNA [3, 4] + | 7684
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
>
>
>
>
> dev.off()
null device
1
>