Last data update: 2014.03.03

R: Quality control procedure for depth of coverage
qcR Documentation

Quality control procedure for depth of coverage

Description

Applies a quality control procedure to the depth of coverage matrix both sample-wise and exon-wise before normalization.

Usage

qc(Y, sampname, chr, ref, mapp, gc,cov_thresh,length_thresh,mapp_thresh,
  gc_thresh)

Arguments

Y

Original read depth matrix returned from getcoverage

sampname

Vector of sample names returned from getbambed

chr

Chromosome.

ref

IRanges object specifying exonic positions returned from getbambed

mapp

Vector of mappability for each exon returned from getmapp

gc

Vector of GC content for each exon returned from getgc

cov_thresh

Vector specifying the upper and lower bound of exonic median coverage threshold for QC. 20-4000 recommended.

length_thresh

Vector specifying the upper and lower bound of exonic length threshold for QC. 20-2000 recommended.

mapp_thresh

Scalar variable specifying exonic mappability threshold for QC. 0.9 recommended.

gc_thresh

Vector specifying the upper and lower bound of exonic GC content threshold for QC. 20-80 recommended.

Details

It is suggested that analysis by CODEX be carried out in a batch-wise fashion if multiple batches exist. CODEX further filters out exons that: have extremely low coverage–median read depth across all samples less than 20 or greater than 4000; are extremely short–less than 20 bp; are extremely hard to map– mappability less than 0.9; have extreme GC content–less than 20 or greater than 80. The above filtering thresholds are recommended and can be user-defined to be adapted to different sequencing protocols.

Value

Y_qc

Updated Y after QC

sampname_qc

Updated sampname after QC

gc_qc

Updated gc after QC

mapp_qc

Updated mapp after QC

ref_qc

Updated ref after QC

qcmat

Matrix specifying results of exon-wise QC procedures

Author(s)

Yuchao Jiang yuchaoj@wharton.upenn.edu

See Also

getbambed, getgc, getmapp

Examples

Y <- coverageObjDemo$Y
sampname <- bambedObjDemo$sampname
chr <- bambedObjDemo$chr
ref <- bambedObjDemo$ref
gc <- gcDemo
mapp <- mappDemo
cov_thresh <- c(20, 4000)
length_thresh <- c(20, 2000)
mapp_thresh <- 0.9
gc_thresh <- c(20, 80)
qcObj <- qc(Y, sampname, chr, ref, mapp, gc, cov_thresh, length_thresh, 
    mapp_thresh, gc_thresh)
Y_qc <- qcObj$Y_qc
sampname_qc <- qcObj$sampname_qc
gc_qc <- qcObj$gc_qc
mapp_qc <- qcObj$mapp_qc
ref_qc <- qcObj$ref_qc
qcmat <- qcObj$qcmat

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(CODEX)
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: BSgenome.Hsapiens.UCSC.hg19
Loading required package: BSgenome
Loading required package: rtracklayer

Attaching package: 'CODEX'

The following object is masked from 'package:BiocGenerics':

    normalize

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/CODEX/qc.Rd_%03d_medium.png", width=480, height=480)
> ### Name: qc
> ### Title: Quality control procedure for depth of coverage
> ### Aliases: qc
> ### Keywords: package
> 
> ### ** Examples
> 
> Y <- coverageObjDemo$Y
> sampname <- bambedObjDemo$sampname
> chr <- bambedObjDemo$chr
> ref <- bambedObjDemo$ref
> gc <- gcDemo
> mapp <- mappDemo
> cov_thresh <- c(20, 4000)
> length_thresh <- c(20, 2000)
> mapp_thresh <- 0.9
> gc_thresh <- c(20, 80)
> qcObj <- qc(Y, sampname, chr, ref, mapp, gc, cov_thresh, length_thresh, 
+     mapp_thresh, gc_thresh)
Excluded 21 exons due to extreme coverage.
Excluded 0 exons due to extreme exonic length.
Excluded 3 exons due to extreme mappability.
Excluded 0 exons due to extreme GC content.
After taking union, excluded 23 out of 100 exons in QC.
> Y_qc <- qcObj$Y_qc
> sampname_qc <- qcObj$sampname_qc
> gc_qc <- qcObj$gc_qc
> mapp_qc <- qcObj$mapp_qc
> ref_qc <- qcObj$ref_qc
> qcmat <- qcObj$qcmat
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>