Last data update: 2014.03.03

R: Bias-free copy number estimation and robust CNA detection in...
HMMcopy-packageR Documentation

Bias-free copy number estimation and robust CNA detection in tumour samples from WGS HTS data

Description

HMMcopy is a package for making bias-free copy number estimation by correcting for GC-content and mappability bias in HTS readcounts. It also contains an implementation of the Hidden Markov Model to robustly segment a copy number profile into non-overlapping segments predicted to be of the same copy number state, and attributes a biological copy number aberration events to the segments.

Details

HMMcopy takes as input WIG format files generated by fast C++ tools distributed as part of the HMMcopy Suite, namely readcount, GC-content and mappability values for non-overlapping fixed width “bins” across the reference genome of interest. It then uses a filtering and LOESS model to correct the GC-content and mappability biases observed in the readcounts (Benjamini and Speed, 2012), and uses the corrected readcounts as a proxy of copy number. The resultant copy number profile is then segmented with a six state Hidden Markov Model, with a handful of quick visualization functions for quick viewing.

Package: HMMcopy
Type: Package
Version: 0.1.0
Date: 2011-09-06
License: GPL-3

example("HMMcopy-package") for quick tour of functionality and visualization

vignette("HMMcopy") for detailed example

Author(s)

Daniel Lai, Gavin Ha, Sohrab Shah

Maintainer: Daniel Lai <jujubix@cs.ubc.ca> and Gavin Ha <gha@bccrc.ca>

References

Yuval Benjamini and Terence P Speed. Summarizing and correcting the gc content bias in high-throughput sequencing. Nucleic Acids Res, 40(10):e72, May 2012.

Examples


# Read WIG file input
rfile <- system.file("extdata", "tumour.wig", package = "HMMcopy")
gfile <- system.file("extdata", "gc.wig", package = "HMMcopy")
mfile <- system.file("extdata", "map.wig", package = "HMMcopy")
uncorrected_reads <- wigsToRangedData(rfile, gfile, mfile)

# Correct reads into copy number
corrected_copy <- correctReadcount(uncorrected_reads)

# Segment copy number profile
segmented_copy <- HMMsegment(corrected_copy)

# Visualize one at a time
par(ask = TRUE)
plotBias(corrected_copy)
plotCorrection(corrected_copy)
plotSegments(corrected_copy, segmented_copy)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(HMMcopy)
Loading required package: IRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: geneplotter
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: lattice
Loading required package: annotate
Loading required package: AnnotationDbi
Loading required package: XML
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/HMMcopy/HMMcopy-package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: HMMcopy-package
> ### Title: Bias-free copy number estimation and robust CNA detection in
> ###   tumour samples from WGS HTS data
> ### Aliases: HMMcopy-package HMMcopy
> ### Keywords: IO manip
> 
> ### ** Examples
> 
> 
> # Read WIG file input
> rfile <- system.file("extdata", "tumour.wig", package = "HMMcopy")
> gfile <- system.file("extdata", "gc.wig", package = "HMMcopy")
> mfile <- system.file("extdata", "map.wig", package = "HMMcopy")
> uncorrected_reads <- wigsToRangedData(rfile, gfile, mfile)
> 
> # Correct reads into copy number
> corrected_copy <- correctReadcount(uncorrected_reads)
Applying filter on data...
Correcting for GC bias...
Correcting for mappability bias...
> 
> # Segment copy number profile
> segmented_copy <- HMMsegment(corrected_copy)
Initialization
EM iteration: 1 Log likelihood: -Inf
Expectation
Maximization
EM iteration: 2 Log likelihood: 3723.38977598142
Expectation
Maximization
EM iteration: 3 Log likelihood: 8651.17813251325
Expectation
Maximization
EM iteration: 4 Log likelihood: 9776.42997349569
Expectation
Maximization
EM iteration: 5 Log likelihood: 10073.160259107
Expectation
Maximization
EM iteration: 6 Log likelihood: 10155.0644113832
Expectation
Maximization
EM iteration: 7 Log likelihood: 10175.7127889998
Expectation
Maximization
EM iteration: 8 Log likelihood: 10181.1575769057
Expectation
Maximization
EM iteration: 9 Log likelihood: 10183.4033459703
Expectation
Maximization
EM iteration: 10 Log likelihood: 10184.3672316217
Expectation
Maximization
Re-calculating latest responsibilties for output
Optimal parameters found, segmenting and classifying
> 
> # Visualize one at a time
> par(ask = TRUE)
> plotBias(corrected_copy)
> plotCorrection(corrected_copy)
> plotSegments(corrected_copy, segmented_copy)
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>