R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Filter list object based on read depth and missing data

filterData

R Documentation

Filter list object based on read depth and missing data

Description

Filters all vectors in list based on specified chromosome(s) of interest, minimum and maximum read depths, missing data, mappability score threshold

Usage

filterData(data ,chrs = NULL, minDepth = 10, maxDepth = 200, 
    positionList = NULL, map = NULL, mapThres = 0.9,
    centromeres = NULL, centromere.flankLength = 0)

Arguments

`data`	`list` object that contains an arbitrary number of components. Should include ‘chr’, ‘tumDepth’. All vector elements must have the same number of rows where each row corresponds to information pertaining to a chromosomal position.
`chrs`	`character` or vector of `character` specifying the chromosomes to keep. Chromosomes not included in this `array` will be filtered out. Chromosome style must match the `genomeStyle` used when running `loadAlleleCounts`
`minDepth`	`Numeric integer` specifying the minimum tumour read depth to include. Positions >= `minDepth` are kept.
`maxDepth`	`Numeric integer` specifying the maximum tumour read depth to include. Positions <= `maxDepth` are kept.
`positionList`	`data.frame` with two columns: ‘chr’ and ‘posn’. `positionList` lists the chromosomal positions to use in the analysis. All positions not overlapping this list will be excluded. Use `NULL` to use all current positions in `data`.
`map`	`Numeric array` containing map scores corresponding to each position in `data`. Optional for filtering positions based on mappability scores.
`mapThres`	`Numeric float` specifying the mappability score threshold. Only applies if `map` is specified. `map` scores >= `mapThres` are kept.
`centromeres`	data.frame containing list of centromere regions. This should contain 3 columns: chr, start, and end. If this argument is used, then data at and flanking the centromeres will be removed.
`centromere.flankLength`	Integer indicating the length (in base pairs) to the left and to the right of the centromere designated for removal of data.

Details

All vectors in the input data list object, and map, must all have the same number of rows.

Value

The same list data containing filtered components.

Author(s)

Gavin Ha <gavinha@gmail.com>

References

Ha, G., Roth, A., Khattra, J., Ho, J., Yap, D., Prentice, L. M., Melnyk, N., McPherson, A., Bashashati, A., Laks, E., Biele, J., Ding, J., Le, A., Rosner, J., Shumansky, K., Marra, M. A., Huntsman, D. G., McAlpine, J. N., Aparicio, S. A. J. R., and Shah, S. P. (2014). TITAN: Inference of copy number architectures in clonal cell populations from tumour whole genome sequence data. Genome Research, 24: 1881-1893. (PMID: 25060187)

Examples

infile <- system.file("extdata", "test_alleleCounts_chr2.txt", 
                      package = "TitanCNA")
tumWig <- system.file("extdata", "test_tum_chr2.wig", package = "TitanCNA")
normWig <- system.file("extdata", "test_norm_chr2.wig", package = "TitanCNA")
gc <- system.file("extdata", "gc_chr2.wig", package = "TitanCNA")
map <- system.file("extdata", "map_chr2.wig", package = "TitanCNA")

#### LOAD DATA ####
data <-  loadAlleleCounts(infile, genomeStyle = "NCBI")

#### GC AND MAPPABILITY CORRECTION ####
cnData <- correctReadDepth(tumWig, normWig, gc, map)


#### READ COPY NUMBER FROM HMMCOPY FILE ####
logR <- getPositionOverlap(data$chr, data$posn, cnData)
data$logR <- log(2^logR) #use natural logs

#### FILTER DATA FOR DEPTH, MAPPABILITY, NA, etc ####
filtereData <- filterData(data, as.character(1:24), minDepth = 10, 
				maxDepth = 200, map = NULL, mapThres=0.9)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(TitanCNA)
Loading required package: foreach
Loading required package: IRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/TitanCNA/filterData.Rd_%03d_medium.png", width=480, height=480)
> ### Name: filterData
> ### Title: Filter list object based on read depth and missing data
> ### Aliases: filterData
> ### Keywords: manip
> 
> ### ** Examples
> 
> infile <- system.file("extdata", "test_alleleCounts_chr2.txt", 
+                       package = "TitanCNA")
> tumWig <- system.file("extdata", "test_tum_chr2.wig", package = "TitanCNA")
> normWig <- system.file("extdata", "test_norm_chr2.wig", package = "TitanCNA")
> gc <- system.file("extdata", "gc_chr2.wig", package = "TitanCNA")
> map <- system.file("extdata", "map_chr2.wig", package = "TitanCNA")
> 
> #### LOAD DATA ####
> data <-  loadAlleleCounts(infile, genomeStyle = "NCBI")
titan: Loading data /home/ddbj/local/lib64/R/library/TitanCNA/extdata/test_alleleCounts_chr2.txt
> 
> #### GC AND MAPPABILITY CORRECTION ####
> cnData <- correctReadDepth(tumWig, normWig, gc, map)
Reading GC and mappability files
Slurping: /home/ddbj/local/lib64/R/library/TitanCNA/extdata/gc_chr2.wig
Parsing: fixedStep chrom=2 start=1 step=1000 span=1000
Sorting by decreasing chromosome size
Slurping: /home/ddbj/local/lib64/R/library/TitanCNA/extdata/map_chr2.wig
Parsing: fixedStep chrom=2 start=1 step=1000 span=1000
Sorting by decreasing chromosome size
Loading tumour file:/home/ddbj/local/lib64/R/library/TitanCNA/extdata/test_tum_chr2.wig
Slurping: /home/ddbj/local/lib64/R/library/TitanCNA/extdata/test_tum_chr2.wig
Parsing: fixedStep chrom=2 start=1 step=1000 span=1000
Sorting by decreasing chromosome size
Loading normal file:/home/ddbj/local/lib64/R/library/TitanCNA/extdata/test_norm_chr2.wig
Slurping: /home/ddbj/local/lib64/R/library/TitanCNA/extdata/test_norm_chr2.wig
Parsing: fixedStep chrom=2 start=1 step=1000 span=1000
Sorting by decreasing chromosome size
Correcting Tumour
Applying filter on data...
Correcting for GC bias...
Correcting for mappability bias...
Correcting Normal
Applying filter on data...
Correcting for GC bias...
Correcting for mappability bias...
Normalizing Tumour by Normal
> 
> 
> #### READ COPY NUMBER FROM HMMCOPY FILE ####
> logR <- getPositionOverlap(data$chr, data$posn, cnData)
> data$logR <- log(2^logR) #use natural logs
> 
> #### FILTER DATA FOR DEPTH, MAPPABILITY, NA, etc ####
> filtereData <- filterData(data, as.character(1:24), minDepth = 10, 
+ 				maxDepth = 200, map = NULL, mapThres=0.9)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>