Last data update: 2014.03.03

R: BigWig Import and Export
BigWigFile-classR Documentation

BigWig Import and Export

Description

These functions support the import and export of the UCSC BigWig format, a compressed, binary form of WIG/BEDGraph with a spatial index and precomputed summaries. These functions do not work on Windows.

Usage

## S4 method for signature 'BigWigFile,ANY,ANY'
import(con, format, text,
                   selection = BigWigSelection(which, ...),
                   which = con, asRle = FALSE, 
                   as = c("GRanges", "RleList", "NumericList"), ...)
import.bw(con, ...)

## S4 method for signature 'ANY,BigWigFile,ANY'
export(object, con, format, ...)
## S4 method for signature 'GenomicRanges,BigWigFile,ANY'
export(object, con, format,
                   dataFormat = c("auto", "variableStep", "fixedStep",
                     "bedGraph"), compress = TRUE, fixedSummaries = FALSE)
export.bw(object, con, ...)

Arguments

con

A path, URL or BigWigFile object. Connections are not supported. For the functions ending in .bw, the file format is indicated by the function name. For the export and import methods, the format must be indicated another way. If con is a path, or URL, either the file extension or the format argument needs to be “bigWig” or “bw”.

object

The object to export, should be an RleList, IntegerList, NumericList, GRanges or something coercible to a GRanges.

format

If not missing, should be “bigWig” or “bw” (case insensitive).

text

Not supported.

as

Specifies the class of the return object. Default is GRanges, which has one range per range in the file, and a score column holding the value for each range. For NumericList, one numeric vector is returned for each range in the selection argument. For RleList, there is one Rle per sequence, and that Rle spans the entire sequence.

asRle

Deprecated. Use as instead. If TRUE, the BigWig file is assumed to contain contiguous ranges that define a run-length encoding of a vector (like coverage), and a RleList is returned.

selection

A BigWigSelection object indicating the ranges to load.

which

A range data structure coercible to RangesList, like a GRanges, or a BigWigFile. Only the intervals in the file overlapping the given ranges are returned. By default, the value is the BigWigFile itself. Its Seqinfo object is extracted and coerced to a RangesList that represents the entirety of the file.

dataFormat

Probably best left to “auto”. Exists only for historical reasons.

compress

If TRUE, compress the data. No reason to change this.

fixedSummaries

If TRUE, compute summaries at fixed resolutions corresponding to the default zoom levels in the Ensembl genome browser (with some extrapolation): 30X, 65X, 130X, 260X, 450X, 648X, 950X, 1296X, 4800X, 19200X. Otherwise, the resolutions are dynamically determined by an algorithm that computes an initial summary size by initializing to 10X the size of the smallest feature and doubling the size as needed until the size of the summary is less than half that of the data (or there are no further gains). It then computes up to 10 more levels of summary, quadrupling the size each time, until the summaries start to exceed the sequence size.

...

Arguments to pass down to methods to other methods. For import, the flow eventually reaches the BigWigFile method on import.

Value

A GRanges (default), RleList or NumericList. GRanges return ranges with non-zero score values in a score metadata column. The length of the NumericList is the same length as the selection argument (one list element per range). The return order in the NumericList matches the order of the BigWigSelection object.

BigWigFile objects

A BigWigFile object, an extension of RTLFile is a reference to a BigWig file. To cast a path, URL or connection to a BigWigFile, pass it to the BigWigFile constructor.

BigWig files are more complex than most track files, and there are a number of methods on BigWigFile for accessing the additional information:

seqinfo(x): Gets the Seqinfo object indicating the lengths of the sequences for the intervals in the file. No circularity or genome information is available.

summary(ranges = as(seqinfo(object), "GenomicRanges"), size = 1L, type = c("mean", "min", "max", "coverage", "sd"), defaultValue = NA_real_), ...: Aggregates the intervals in the file that fall into ranges, which should be something coercible to GRanges. The aggregation essentially compresses each sequence to a length of size. The algorithm is specified by type; available algorithms include the mean, min, max, coverage (percent sequence covered by at least one feature), and standard deviation. When a window contains no features, defaultValue is assumed. The result is an RleList, with an element for each element in ranges. The driving use case for this is visualization of coverage when the screen space is small compared to the viewed portion of the sequence. The operation is very fast, as it leverages cached multi-level summaries present in every BigWig file.

If a summary statistic is not available / cannot be computed for a given range a warning is thrown and the defaultValue NA_real_ is returned.

BigWigFileList objects

A BigWigFileList() provides a convenient way of managing a list of BigWigFile instances.

Author(s)

Michael Lawrence

See Also

wigToBigWig for converting a WIG file to BigWig.

Examples

if (.Platform$OS.type != "windows") {
  test_path <- system.file("tests", package = "rtracklayer")
  test_bw <- file.path(test_path, "test.bw")

  ## GRanges
  ## Returns ranges with non-zero scores.
  gr <- import(test_bw)
  gr 

  which <- GRanges(c("chr2", "chr2"), IRanges(c(1, 300), c(400, 1000)))
  import(test_bw, which = which)

  ## RleList
  ## Scores returned as an RleList is equivalent to the coverage.
  ## Best option when 'which' or 'selection' contain many small ranges.
  mini <- narrow(unlist(tile(which, 50)), 2)
  rle <- import(test_bw, which = mini, as = "RleList")
  rle 

  ## NumericList
  ## The 'which' is stored as metadata:
  track <- import(test_bw, which = which, as = "NumericList")
  metadata(track)

## Not run: 
  test_bw_out <- file.path(tempdir(), "test_out.bw")
  export(test, test_bw_out)

## End(Not run)

  bwf <- BigWigFile(test_bw)
  track <- import(bwf)

  seqinfo(bwf)  

  summary(bwf) # for each sequence, average all values into one
  summary(bwf, range(head(track))) # just average the first few features
  summary(bwf, size = seqlengths(bwf) / 10) # 10X reduction
  summary(bwf, type = "min") # min instead of mean
}

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(rtracklayer)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/rtracklayer/BigWigFile.Rd_%03d_medium.png", width=480, height=480)
> ### Name: BigWigFile-class
> ### Title: BigWig Import and Export
> ### Aliases: class:BigWigFile BigWigFile-class class:BWFile BWFile-class
> ###   class:BigWigFileList BigWigFileList-class BigWigFile BWFile
> ###   BigWigFileList seqinfo,BigWigFile-method import.bw
> ###   import.bw,ANY-method import,BigWigFile,ANY,ANY-method export.bw
> ###   export.bw,ANY-method export,ANY,BigWigFile,ANY-method
> ###   export,GenomicRanges,BigWigFile,ANY-method
> ###   export,List,BigWigFile,ANY-method summary,BigWigFile-method
> ###   path,BigWigFileList-method
> ### Keywords: methods classes
> 
> ### ** Examples
> 
> if (.Platform$OS.type != "windows") {
+   test_path <- system.file("tests", package = "rtracklayer")
+   test_bw <- file.path(test_path, "test.bw")
+ 
+   ## GRanges
+   ## Returns ranges with non-zero scores.
+   gr <- import(test_bw)
+   gr 
+ 
+   which <- GRanges(c("chr2", "chr2"), IRanges(c(1, 300), c(400, 1000)))
+   import(test_bw, which = which)
+ 
+   ## RleList
+   ## Scores returned as an RleList is equivalent to the coverage.
+   ## Best option when 'which' or 'selection' contain many small ranges.
+   mini <- narrow(unlist(tile(which, 50)), 2)
+   rle <- import(test_bw, which = mini, as = "RleList")
+   rle 
+ 
+   ## NumericList
+   ## The 'which' is stored as metadata:
+   track <- import(test_bw, which = which, as = "NumericList")
+   metadata(track)
+ 
+ ## Not run: 
+ ##D   test_bw_out <- file.path(tempdir(), "test_out.bw")
+ ##D   export(test, test_bw_out)
+ ## End(Not run)
+ 
+   bwf <- BigWigFile(test_bw)
+   track <- import(bwf)
+ 
+   seqinfo(bwf)  
+ 
+   summary(bwf) # for each sequence, average all values into one
+   summary(bwf, range(head(track))) # just average the first few features
+   summary(bwf, size = seqlengths(bwf) / 10) # 10X reduction
+   summary(bwf, type = "min") # min instead of mean
+ }
GRangesList object of length 2:
[[1]] 
GRanges object with 1 range and 1 metadata column:
      seqnames         ranges strand |     score
         <Rle>      <IRanges>  <Rle> | <numeric>
  [1]     chr2 [1, 243199373]      * |        -1

[[2]] 
GRanges object with 1 range and 1 metadata column:
      seqnames        ranges strand | score
  [1]    chr19 [1, 59128983]      * |  0.25

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>