Last data update: 2014.03.03

R: Correcting EUC of oversaturated fragments.
correct_oversaturationR Documentation

Correcting EUC of oversaturated fragments.

Description

If for a given fragment the number of observed unique barcodes is equal to the total barcode complexity (all combinations of barcodes are associated with a given fragment), then the readsamples function assignes infinite EUC. This can be corrected by the function correct_oversaturation(). By comparing observed read counts with EUCs for other fragments it calculates the correction factor. Then, for the oversaturated fragments it multiplies the observed read counts by the correction factor to estimate EUC. The assumption behind this correction is that fragments have similar rate of PCR duplicates production.

Usage

correct_oversaturation(euc_GR, read_counts_file)

Arguments

euc_GR

GRanges produced by readsamples() function

read_counts_file

path to a file with observed read counts.

Value

euc_GR GRanges analogous to the readsamples() function output, but with finite EUCs where infinity was present.

Examples

write(c("DummyRNA\t1\t2\t1000", "DummyRNA\t3\t4\t1024"),
     file="dummy_unique_barcode")
write(c("DummyRNA\t1\t2\t5000", "DummyRNA\t3\t4\t10000"),
      file="dummy_read_counts")
my_EUCs <- readsamples(samples = "dummy_unique_barcode", euc = "Fu", m=1024)
correct_oversaturation(euc_GR = my_EUCs,
                       read_counts_file = "dummy_read_counts")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(RNAprobR)
Loading required package: GenomicFeatures
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: plyr

Attaching package: 'plyr'

The following object is masked from 'package:IRanges':

    desc

The following object is masked from 'package:S4Vectors':

    rename

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/RNAprobR/correct_oversaturation.Rd_%03d_medium.png", width=480, height=480)
> ### Name: correct_oversaturation
> ### Title: Correcting EUC of oversaturated fragments.
> ### Aliases: correct_oversaturation
> 
> ### ** Examples
> 
> write(c("DummyRNA\t1\t2\t1000", "DummyRNA\t3\t4\t1024"),
+      file="dummy_unique_barcode")
> write(c("DummyRNA\t1\t2\t5000", "DummyRNA\t3\t4\t10000"),
+       file="dummy_read_counts")
> my_EUCs <- readsamples(samples = "dummy_unique_barcode", euc = "Fu", m=1024)
Reporting estimated unique counts according to Fu et al.
Barcodes oversaturated. Inf returned.  Running correct_oversaturation()strongly recommended.
> correct_oversaturation(euc_GR = my_EUCs,
+                        read_counts_file = "dummy_read_counts")
Reporting unique barcodes count, no EUC calculation
GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand |       EUC
         <Rle> <IRanges>  <Rle> | <numeric>
  [1] DummyRNA    [1, 2]      + |      3842
  [2] DummyRNA    [3, 4]      + |      7684
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>