Last data update: 2014.03.03

R: Normalization of read depth from whole exome sequencing
normalizeR Documentation

Normalization of read depth from whole exome sequencing

Description

Fits a Poisson log-linear model that normalizes the read depth data from whole exome sequencing. Includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts.

Usage

normalize(Y_qc, gc_qc, K)

Arguments

Y_qc

Read depth matrix after quality control procedure returned from qc

gc_qc

Vector of GC content for each exon after quality control procedure returned from qc

K

Number of latent Poisson factors. Can be an integer if optimal solution has been chosen or a vector of integers so that AIC, BIC, and RSS are computed for choice of optimal k.

Value

Yhat

Normalized read depth matrix

AIC

AIC for model selection

BIC

BIC for model selection

RSS

RSS for model selection

K

Number of latent Poisson factors

Author(s)

Yuchao Jiang yuchaoj@wharton.upenn.edu

See Also

qc, choiceofK

Examples

Y_qc <- qcObjDemo$Y_qc
gc_qc <- qcObjDemo$gc_qc
normObj <- normalize(Y_qc, gc_qc, K = 1:5)
Yhat <- normObj$Yhat
AIC <- normObj$AIC
BIC <- normObj$BIC
RSS <- normObj$RSS
K <- normObj$K

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(CODEX)
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: BSgenome.Hsapiens.UCSC.hg19
Loading required package: BSgenome
Loading required package: rtracklayer

Attaching package: 'CODEX'

The following object is masked from 'package:BiocGenerics':

    normalize

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/CODEX/normalize.Rd_%03d_medium.png", width=480, height=480)
> ### Name: normalize
> ### Title: Normalization of read depth from whole exome sequencing
> ### Aliases: normalize
> ### Keywords: package
> 
> ### ** Examples
> 
> Y_qc <- qcObjDemo$Y_qc
> gc_qc <- qcObjDemo$gc_qc
> normObj <- normalize(Y_qc, gc_qc, K = 1:5)
k = 1
Iteration 1	beta diff =1.2	f(GC) diff =0.000187
			hhat diff =0.0016
			hhat diff =4.53e-05
Iteration 2	beta diff =0.00789	f(GC) diff =1.26e-06
			hhat diff =0.000764
			hhat diff =2.63e-05
Iteration 3	beta diff =0.000691	f(GC) diff =5.49e-08
			hhat diff =0.000392
			hhat diff =1.42e-05
Iteration 4	beta diff =0.000255	f(GC) diff =1.29e-08
			hhat diff =0.000203
			hhat diff =6.93e-06
Iteration 5	beta diff =0.000121	f(GC) diff =6.08e-09
			hhat diff =0.000105
			hhat diff =3.12e-06
Iteration 6	beta diff =6.07e-05	f(GC) diff =3.08e-09
			hhat diff =5.76e-05
Stop at Iteration 6.
AIC1 = 4032223.415
BIC1 = 4031464.204
RSS1 = 704.063

k = 2
Iteration 1	beta diff =1.2	f(GC) diff =0.000187
			hhat diff =0.0438
			hhat diff =3.92e-05
Iteration 2	beta diff =0.00797	f(GC) diff =4.19e-07
			hhat diff =0.0436
			hhat diff =3.74e-05
Iteration 3	beta diff =0.000816	f(GC) diff =7.88e-08
			hhat diff =0.0435
			hhat diff =1.8e-05
Iteration 4	beta diff =0.000265	f(GC) diff =1.04e-08
			hhat diff =0.0434
			hhat diff =6.81e-06
Stop at Iteration 4.
AIC2 = 4036066.355
BIC2 = 4034547.933
RSS2 = 112.92

k = 3
Iteration 1	beta diff =1.2	f(GC) diff =0.000187
			hhat diff =0.03
			hhat diff =0.00031
			hhat diff =0.000203
			hhat diff =0.000208
			hhat diff =0.00021
			hhat diff =0.0578
			hhat diff =0.000123
			hhat diff =7.6e-05
Iteration 2	beta diff =0.0079	f(GC) diff =4.06e-07
			hhat diff =0.0298
			hhat diff =0.0578
			hhat diff =0.029
			hhat diff =1.11e-05
Iteration 3	beta diff =0.000404	f(GC) diff =1.19e-07
			hhat diff =0.0859
			hhat diff =0.0291
			hhat diff =1.73e-05
Iteration 4	beta diff =0.000214	f(GC) diff =1.76e-08
			hhat diff =0.0574
			hhat diff =6.7e-05
Iteration 5	beta diff =0.000143	f(GC) diff =7.68e-09
			hhat diff =0.0575
			hhat diff =4.86e-05
Iteration 6	beta diff =5.47e-05	f(GC) diff =4.5e-09
			hhat diff =0.0575
			hhat diff =4.04e-05
Stop at Iteration 6.
AIC3 = 4036269.58
BIC3 = 4033991.947
RSS3 = 91.057

k = 4
Iteration 1	beta diff =1.2	f(GC) diff =0.000187
			hhat diff =0.0432
			hhat diff =0.000449
			hhat diff =0.000257
			hhat diff =0.000161
			hhat diff =8.92e-05
Iteration 2	beta diff =0.0077	f(GC) diff =2.84e-07
			hhat diff =0.0431
			hhat diff =0.0221
			hhat diff =8.41e-05
Iteration 3	beta diff =0.000558	f(GC) diff =1.11e-07
			hhat diff =0.0433
			hhat diff =0.0219
			hhat diff =2.49e-05
Iteration 4	beta diff =0.000239	f(GC) diff =2.85e-08
			hhat diff =0.0219
			hhat diff =0.000173
			hhat diff =7.53e-06
Iteration 5	beta diff =0.000104	f(GC) diff =1.46e-08
			hhat diff =0.0105
			hhat diff =0.000117
			hhat diff =0.0435
			hhat diff =1.47e-06
Iteration 6	beta diff =3.61e-05	f(GC) diff =1.4e-08
			hhat diff =0.0217
			hhat diff =0.0652
			hhat diff =8.01e-06
Stop at Iteration 6.
AIC4 = 4036188.618
BIC4 = 4033151.774
RSS4 = 83.055

k = 5
Iteration 1	beta diff =1.2	f(GC) diff =0.000187
			hhat diff =0.0411
			hhat diff =0.000288
			hhat diff =0.0347
			hhat diff =0.000103
			hhat diff =0.0348
			hhat diff =4.87e-05
Iteration 2	beta diff =0.0082	f(GC) diff =2.43e-07
			hhat diff =0.0377
			hhat diff =0.0349
			hhat diff =0.0175
			hhat diff =0.0174
			hhat diff =0.0348
			hhat diff =0.0174
			hhat diff =1.41e-05
Iteration 3	beta diff =0.000549	f(GC) diff =7.83e-08
			hhat diff =0.01
			hhat diff =0.000353
			hhat diff =0.0175
			hhat diff =6.89e-05
Iteration 4	beta diff =0.000211	f(GC) diff =2.38e-08
			hhat diff =0.0185
			hhat diff =0.0175
			hhat diff =0.000124
			hhat diff =7.54e-05
Iteration 5	beta diff =9.23e-05	f(GC) diff =2.2e-08
			hhat diff =0.016
			hhat diff =0.000337
			hhat diff =0.0694
			hhat diff =6.45e-05
Stop at Iteration 5.
AIC5 = 4036029.081
BIC5 = 4032233.026
RSS5 = 82.124

> Yhat <- normObj$Yhat
> AIC <- normObj$AIC
> BIC <- normObj$BIC
> RSS <- normObj$RSS
> K <- normObj$K
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>