Last data update: 2014.03.03

R: Normalization of read depth from whole exome sequencing under...
normalize2R Documentation

Normalization of read depth from whole exome sequencing under the case-control setting

Description

Fits a Poisson log-linear model that normalizes the read depth data from whole exome sequencing. Includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. If the WES is designed under case-control setting, CODEX estimates the exon-wise Poisson latent factor using only the read depths in the control cohort, and then computes the sample-wise latent factor terms for the case samples by regression.

Usage

normalize2(Y_qc, gc_qc, K, normal_index)

Arguments

Y_qc

Read depth matrix after quality control procedure returned from qc

gc_qc

Vector of GC content for each exon after quality control procedure returned from qc

K

Number of latent Poisson factors. Can be an integer if optimal solution has been chosen or a vector of integers so that AIC, BIC, and RSS are computed for choice of optimal k.

normal_index

Indices of control samples.

Value

Yhat

Normalized read depth matrix

AIC

AIC for model selection

BIC

BIC for model selection

RSS

RSS for model selection

K

Number of latent Poisson factors

Author(s)

Yuchao Jiang yuchaoj@wharton.upenn.edu

See Also

qc, choiceofK

Examples

Y_qc <- qcObjDemo$Y_qc
gc_qc <- qcObjDemo$gc_qc
normObj <- normalize2(Y_qc, gc_qc, K = 1:5, normal_index = seq(1, 45, 2))
Yhat <- normObj$Yhat
AIC <- normObj$AIC
BIC <- normObj$BIC
RSS <- normObj$RSS
K <- normObj$K

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(CODEX)
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: BSgenome.Hsapiens.UCSC.hg19
Loading required package: BSgenome
Loading required package: rtracklayer

Attaching package: 'CODEX'

The following object is masked from 'package:BiocGenerics':

    normalize

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/CODEX/normalize2.Rd_%03d_medium.png", width=480, height=480)
> ### Name: normalize2
> ### Title: Normalization of read depth from whole exome sequencing under
> ###   the case-control setting
> ### Aliases: normalize2
> ### Keywords: package
> 
> ### ** Examples
> 
> Y_qc <- qcObjDemo$Y_qc
> gc_qc <- qcObjDemo$gc_qc
> normObj <- normalize2(Y_qc, gc_qc, K = 1:5, normal_index = seq(1, 45, 2))
k = 1
Iteration 1	beta diff =1.18	f(GC) diff =0.000187
			hhat diff =0.0012
			hhat diff =2.53e-05
Iteration 2	beta diff =0.00618	f(GC) diff =1.38e-06
			hhat diff =0.000475
			hhat diff =2.33e-05
Iteration 3	beta diff =0.000672	f(GC) diff =4.84e-08
			hhat diff =0.000189
			hhat diff =1.02e-05
Iteration 4	beta diff =0.000179	f(GC) diff =9.39e-09
			hhat diff =7.54e-05
Iteration 5	beta diff =8.15e-05	f(GC) diff =6.59e-09
			hhat diff =2.79e-05
Stop at Iteration 5.
AIC1 = 4032024.269
BIC1 = 4031265.058
RSS1 = 728.648

k = 2
Iteration 1	beta diff =1.18	f(GC) diff =0.000187
			hhat diff =0.0436
			hhat diff =0.0435
			hhat diff =2.41e-06
Iteration 2	beta diff =0.00624	f(GC) diff =4.93e-07
			hhat diff =0.0435
			hhat diff =2.7e-05
Iteration 3	beta diff =0.000903	f(GC) diff =5.01e-08
			hhat diff =0.0434
			hhat diff =8.99e-06
Iteration 4	beta diff =0.000213	f(GC) diff =1.96e-08
			hhat diff =0.0433
			hhat diff =2.88e-06
Iteration 5	beta diff =6.11e-05	f(GC) diff =1.3e-08
			hhat diff =0.0433
			hhat diff =8.29e-07
Stop at Iteration 5.
AIC2 = 4035981.863
BIC2 = 4034463.441
RSS2 = 111.379

k = 3
Iteration 1	beta diff =1.18	f(GC) diff =0.000187
			hhat diff =0.0302
			hhat diff =0.000191
			hhat diff =5.99e-05
Iteration 2	beta diff =0.00771	f(GC) diff =3.78e-07
			hhat diff =0.0301
			hhat diff =3.98e-05
Iteration 3	beta diff =0.00041	f(GC) diff =6.61e-08
			hhat diff =0.03
			hhat diff =2.03e-05
Iteration 4	beta diff =0.000188	f(GC) diff =1.84e-08
			hhat diff =0.03
			hhat diff =2.12e-05
Iteration 5	beta diff =3.98e-05	f(GC) diff =1.1e-08
			hhat diff =0.0301
			hhat diff =2.44e-05
Stop at Iteration 5.
AIC3 = 4035819.769
BIC3 = 4033542.136
RSS3 = 106.405

k = 4
Iteration 1	beta diff =1.18	f(GC) diff =0.000187
			hhat diff =0.0463
			hhat diff =0.000227
			hhat diff =0.0651
			hhat diff =3.18e-05
Iteration 2	beta diff =0.00725	f(GC) diff =3.15e-07
			hhat diff =0.0578
			hhat diff =0.0219
			hhat diff =3.21e-05
Iteration 3	beta diff =0.000519	f(GC) diff =4.3e-08
			hhat diff =0.0618
			hhat diff =0.0219
			hhat diff =2.41e-05
Iteration 4	beta diff =0.000218	f(GC) diff =1.61e-08
			hhat diff =0.063
			hhat diff =0.0219
			hhat diff =1.64e-05
Iteration 5	beta diff =0.000103	f(GC) diff =9.12e-09
			hhat diff =0.0413
			hhat diff =0.000147
			hhat diff =1.47e-05
Stop at Iteration 5.
AIC4 = 4035736.029
BIC4 = 4032699.185
RSS4 = 100.512

k = 5
Iteration 1	beta diff =1.18	f(GC) diff =0.000187
			hhat diff =0.0511
			hhat diff =0.0178
			hhat diff =0.0347
			hhat diff =7.38e-05
Iteration 2	beta diff =0.00731	f(GC) diff =2.93e-07
			hhat diff =0.00615
			hhat diff =0.0176
			hhat diff =0.000144
			hhat diff =0.0347
			hhat diff =3.59e-05
Iteration 3	beta diff =0.000528	f(GC) diff =4.95e-08
			hhat diff =0.00385
			hhat diff =0.000532
			hhat diff =0.0175
			hhat diff =6.15e-05
Iteration 4	beta diff =0.000244	f(GC) diff =2.39e-08
			hhat diff =0.0171
			hhat diff =0.00055
			hhat diff =0.0175
			hhat diff =5.41e-05
Iteration 5	beta diff =0.000122	f(GC) diff =1.2e-08
			hhat diff =0.0179
			hhat diff =0.000576
			hhat diff =0.0175
			hhat diff =4.87e-05
Iteration 6	beta diff =5.34e-05	f(GC) diff =7.4e-09
			hhat diff =0.0337
			hhat diff =0.0351
			hhat diff =0.000118
			hhat diff =4.13e-05
Stop at Iteration 6.
AIC5 = 4035662.714
BIC5 = 4031866.659
RSS5 = 94.4

> Yhat <- normObj$Yhat
> AIC <- normObj$AIC
> BIC <- normObj$BIC
> RSS <- normObj$RSS
> K <- normObj$K
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>