Last data update: 2014.03.03
R: Normalization of read depth from whole exome sequencing
Normalization of read depth from whole exome sequencing
Description
Fits a Poisson log-linear model that normalizes the read depth data from whole
exome sequencing. Includes terms that specifically remove biases due to GC
content, exon capture and amplification efficiency, and latent systemic
artifacts.
Usage
normalize(Y_qc, gc_qc, K)
Arguments
Y_qc
Read depth matrix after quality control procedure returned from
qc
gc_qc
Vector of GC content for each exon after quality control procedure returned
from qc
K
Number of latent Poisson factors. Can be an integer if optimal solution has
been chosen or a vector of integers so that AIC, BIC, and RSS are computed
for choice of optimal k.
Value
Yhat
Normalized read depth matrix
AIC
AIC for model selection
BIC
BIC for model selection
RSS
RSS for model selection
K
Number of latent Poisson factors
Author(s)
Yuchao Jiang yuchaoj@wharton.upenn.edu
See Also
qc
,
choiceofK
Examples
Y_qc <- qcObjDemo$Y_qc
gc_qc <- qcObjDemo$gc_qc
normObj <- normalize(Y_qc, gc_qc, K = 1:5)
Yhat <- normObj$Yhat
AIC <- normObj$AIC
BIC <- normObj$BIC
RSS <- normObj$RSS
K <- normObj$K
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(CODEX)
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: BSgenome.Hsapiens.UCSC.hg19
Loading required package: BSgenome
Loading required package: rtracklayer
Attaching package: 'CODEX'
The following object is masked from 'package:BiocGenerics':
normalize
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/CODEX/normalize.Rd_%03d_medium.png", width=480, height=480)
> ### Name: normalize
> ### Title: Normalization of read depth from whole exome sequencing
> ### Aliases: normalize
> ### Keywords: package
>
> ### ** Examples
>
> Y_qc <- qcObjDemo$Y_qc
> gc_qc <- qcObjDemo$gc_qc
> normObj <- normalize(Y_qc, gc_qc, K = 1:5)
k = 1
Iteration 1 beta diff =1.2 f(GC) diff =0.000187
hhat diff =0.0016
hhat diff =4.53e-05
Iteration 2 beta diff =0.00789 f(GC) diff =1.26e-06
hhat diff =0.000764
hhat diff =2.63e-05
Iteration 3 beta diff =0.000691 f(GC) diff =5.49e-08
hhat diff =0.000392
hhat diff =1.42e-05
Iteration 4 beta diff =0.000255 f(GC) diff =1.29e-08
hhat diff =0.000203
hhat diff =6.93e-06
Iteration 5 beta diff =0.000121 f(GC) diff =6.08e-09
hhat diff =0.000105
hhat diff =3.12e-06
Iteration 6 beta diff =6.07e-05 f(GC) diff =3.08e-09
hhat diff =5.76e-05
Stop at Iteration 6.
AIC1 = 4032223.415
BIC1 = 4031464.204
RSS1 = 704.063
k = 2
Iteration 1 beta diff =1.2 f(GC) diff =0.000187
hhat diff =0.0438
hhat diff =3.92e-05
Iteration 2 beta diff =0.00797 f(GC) diff =4.19e-07
hhat diff =0.0436
hhat diff =3.74e-05
Iteration 3 beta diff =0.000816 f(GC) diff =7.88e-08
hhat diff =0.0435
hhat diff =1.8e-05
Iteration 4 beta diff =0.000265 f(GC) diff =1.04e-08
hhat diff =0.0434
hhat diff =6.81e-06
Stop at Iteration 4.
AIC2 = 4036066.355
BIC2 = 4034547.933
RSS2 = 112.92
k = 3
Iteration 1 beta diff =1.2 f(GC) diff =0.000187
hhat diff =0.03
hhat diff =0.00031
hhat diff =0.000203
hhat diff =0.000208
hhat diff =0.00021
hhat diff =0.0578
hhat diff =0.000123
hhat diff =7.6e-05
Iteration 2 beta diff =0.0079 f(GC) diff =4.06e-07
hhat diff =0.0298
hhat diff =0.0578
hhat diff =0.029
hhat diff =1.11e-05
Iteration 3 beta diff =0.000404 f(GC) diff =1.19e-07
hhat diff =0.0859
hhat diff =0.0291
hhat diff =1.73e-05
Iteration 4 beta diff =0.000214 f(GC) diff =1.76e-08
hhat diff =0.0574
hhat diff =6.7e-05
Iteration 5 beta diff =0.000143 f(GC) diff =7.68e-09
hhat diff =0.0575
hhat diff =4.86e-05
Iteration 6 beta diff =5.47e-05 f(GC) diff =4.5e-09
hhat diff =0.0575
hhat diff =4.04e-05
Stop at Iteration 6.
AIC3 = 4036269.58
BIC3 = 4033991.947
RSS3 = 91.057
k = 4
Iteration 1 beta diff =1.2 f(GC) diff =0.000187
hhat diff =0.0432
hhat diff =0.000449
hhat diff =0.000257
hhat diff =0.000161
hhat diff =8.92e-05
Iteration 2 beta diff =0.0077 f(GC) diff =2.84e-07
hhat diff =0.0431
hhat diff =0.0221
hhat diff =8.41e-05
Iteration 3 beta diff =0.000558 f(GC) diff =1.11e-07
hhat diff =0.0433
hhat diff =0.0219
hhat diff =2.49e-05
Iteration 4 beta diff =0.000239 f(GC) diff =2.85e-08
hhat diff =0.0219
hhat diff =0.000173
hhat diff =7.53e-06
Iteration 5 beta diff =0.000104 f(GC) diff =1.46e-08
hhat diff =0.0105
hhat diff =0.000117
hhat diff =0.0435
hhat diff =1.47e-06
Iteration 6 beta diff =3.61e-05 f(GC) diff =1.4e-08
hhat diff =0.0217
hhat diff =0.0652
hhat diff =8.01e-06
Stop at Iteration 6.
AIC4 = 4036188.618
BIC4 = 4033151.774
RSS4 = 83.055
k = 5
Iteration 1 beta diff =1.2 f(GC) diff =0.000187
hhat diff =0.0411
hhat diff =0.000288
hhat diff =0.0347
hhat diff =0.000103
hhat diff =0.0348
hhat diff =4.87e-05
Iteration 2 beta diff =0.0082 f(GC) diff =2.43e-07
hhat diff =0.0377
hhat diff =0.0349
hhat diff =0.0175
hhat diff =0.0174
hhat diff =0.0348
hhat diff =0.0174
hhat diff =1.41e-05
Iteration 3 beta diff =0.000549 f(GC) diff =7.83e-08
hhat diff =0.01
hhat diff =0.000353
hhat diff =0.0175
hhat diff =6.89e-05
Iteration 4 beta diff =0.000211 f(GC) diff =2.38e-08
hhat diff =0.0185
hhat diff =0.0175
hhat diff =0.000124
hhat diff =7.54e-05
Iteration 5 beta diff =9.23e-05 f(GC) diff =2.2e-08
hhat diff =0.016
hhat diff =0.000337
hhat diff =0.0694
hhat diff =6.45e-05
Stop at Iteration 5.
AIC5 = 4036029.081
BIC5 = 4032233.026
RSS5 = 82.124
> Yhat <- normObj$Yhat
> AIC <- normObj$AIC
> BIC <- normObj$BIC
> RSS <- normObj$RSS
> K <- normObj$K
>
>
>
>
>
> dev.off()
null device
1
>