Last data update: 2014.03.03

R: GPos objects
GPos-classR Documentation

GPos objects

Description

The GPos class is a container for storing a set of genomic positions, that is, genomic ranges of width 1. Even though a GRanges object can be used for that, using a GPos object can be much more memory-efficient, especially when the object contains long runs of adjacent positions.

Usage

GPos(pos_runs)  # constructor function

Arguments

pos_runs

A GRanges object (or any other GenomicRanges derivative) where each range is interpreted as a run of adjacent genomic positions. If pos_runs is not a GenomicRanges object, GPos() first tries to coerce it to one with as(pos_runs, "GenomicRanges", strict=FALSE).

Value

A GPos object.

Accessors

Getters

GPos objects support the same set of getters as GRanges objects (i.e. seqnames(), start(), end(), ranges(), strand(), mcols(), seqinfo(), etc...), plus the pos() getter which is equivalent to start() or end(). See ?GRanges for the list of getters supported by GRanges objects.

Note that a GPos object cannot hold names i.e. names() always returns NULL on it.

Setters

Like GRanges objects, GPos objects support the following setters:

  • The mcols() and metadata() setters.

  • The family of setters that operate on the seqinfo component of an object: seqlevels(), seqlevelsStyle(), seqlengths(), isCircular(), genome(), and seqinfo(). These setters are defined and documented in the GenomeInfoDb package.

However, there is no seqnames(), pos(), or strand() setter for GPos objects at the moment (although they might be added in the future).

Coercion

From GenomicRanges to GPos: A GenomicRanges object x in which all the ranges have a width of 1 can be coerced to a GPos object with as(x, "GPos"). The names on x are not propagated (a warning is issued if x has names on it).

From GPos to GRanges: A GPos object x can be coerced to a GRanges object with as(x, "GRanges"). However be aware that the resulting object can use thousands times (or more) memory than x! See "MEMORY USAGE" in the Examples section below.

From GPos to ordinary R objects: Like with a GRanges object, as.character(), as.factor(), and as.data.frame() work with a GPos object x. Note that as.data.frame(x) returns a data frame with a pos column (containing pos(x)) instead of the start, end, and width columns that one gets when x is a GRanges object.

Subsetting

A GPos object can be subsetted exactly like a GRanges object.

Combining

GPos objects can be combined (a.k.a. appended) with c() or append().

Splitting and Relisting

Like with a GRanges object, split() and relist() work with a GPos object x. Note that they return a GenomicRangesList object instead of a GRangesList object.

Note

Like for any Vector derivative, the length of a GPos object cannot exceed .Machine$integer.max (i.e. 2^31 on most platforms). GPos() will return an error if pos_runs contains too many genomic positions.

Author(s)

Herv<c3><83><c2><a9> Pag<c3><83><c2><a8>s; based on ideas borrowed from Georg Stricker georg.stricker@in.tum.de and Julien Gagneur gagneur@in.tum.de

See Also

  • GRanges objects.

  • The seqinfo accessor and family in the GenomeInfoDb package for accessing/modifying the seqinfo component of an object.

  • GenomicRanges-comparison for comparing and ordering genomic positions.

  • findOverlaps-methods for finding overlapping genomic ranges and/or positions.

  • nearest-methods for finding the nearest genomic range/position neighbor.

  • The snpsBySeqname, snpsByOverlaps, and snpsById methods for SNPlocs objects defined in the BSgenome package for extractors that return a GPos object.

  • SummarizedExperiment objects in the SummarizedExperiment package.

Examples

## ---------------------------------------------------------------------
## BASIC EXAMPLES
## ---------------------------------------------------------------------

## Example 1:
gpos1 <- GPos(c("chr1:44-53", "chr1:5-10", "chr2:2-5"))
gpos1

length(gpos1)
seqnames(gpos1)
pos(gpos1)  # same as 'start(gpos1)' and 'end(gpos1)'
strand(gpos1)
as.character(gpos1)
as.data.frame(gpos1)
as(gpos1, "GRanges")
as.data.frame(as(gpos1, "GRanges"))
gpos1[9:17]

## Example 2:
pos_runs <- GRanges("chrI", IRanges(c(1, 6, 12, 17), c(5, 10, 16, 20)),
                    strand=c("+", "-", "-", "+"))
gpos2 <- GPos(pos_runs)
gpos2

strand(gpos2)

## Example 3:
gpos3A <- gpos3B <- GPos(c("chrI:1-1000", "chrI:1005-2000"))
npos <- length(gpos3A)

mcols(gpos3A)$sample <- Rle("sA")
sA_counts <- sample(10, npos, replace=TRUE)
mcols(gpos3A)$counts <- sA_counts

mcols(gpos3B)$sample <- Rle("sB")
sB_counts <- sample(10, npos, replace=TRUE)
mcols(gpos3B)$counts <- sB_counts

gpos3 <- c(gpos3A, gpos3B)
gpos3

## Example 4:
library(BSgenome.Scerevisiae.UCSC.sacCer2)
genome <- BSgenome.Scerevisiae.UCSC.sacCer2
gpos4 <- GPos(seqinfo(genome))
gpos4  # all the positions along the genome are represented
mcols(gpos4)$dna <- do.call("c", unname(as.list(genome)))
gpos4

## Note however that, like for any Vector derivative, the length of a
## GPos object cannot exceed '.Machine$integer.max' (i.e. 2^31 on most
## platforms) so the above only works with a "small" genome.
## For example it doesn't work with the Human genome:
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
## Not run: 
  GPos(seqinfo(TxDb.Hsapiens.UCSC.hg19.knownGene))  # error!

## End(Not run)

## You can use isSmallGenome() to check upfront whether the genome is
## "small" or not.
isSmallGenome(genome)
isSmallGenome(TxDb.Hsapiens.UCSC.hg19.knownGene)

## ---------------------------------------------------------------------
## MEMORY USAGE
## ---------------------------------------------------------------------

## Coercion to GRanges works...
gr4 <- as(gpos4, "GRanges")
gr4
## ... but is generally not a good idea:
object.size(gpos4)
object.size(gr4)  # 6951 times bigger than the GPos object!

## Shuffling the order of the positions impacts memory usage:
gpos4r <- rev(gpos4)
object.size(gpos4r)  # significantly
gpos4s <- sample(gpos4)
object.size(gpos4s)  # even worse!

## AN IMPORTANT NOTE: In the worst situations, GPos still performs as
## good as a GRanges object.
object.size(as(gpos4r, "GRanges"))  # same size as 'gpos4r'
object.size(as(gpos4s, "GRanges"))  # same size as 'gpos4s'

## Best case scenario is when the object is strictly sorted (i.e.
## positions are in strict ascending order).
## This can be checked with:
is.unsorted(gpos4, strict=TRUE)  # 'gpos4' is strictly sorted

## ---------------------------------------------------------------------
## USING MEMORY-EFFICIENT METADATA COLUMNS
## ---------------------------------------------------------------------
## In order to keep memory usage as low as possible, it is recommended
## to use a memory-efficient representation of the metadata columns that
## we want to set on the object. Rle's are particularly well suited for
## this, especially if the metadata columns contain long runs of
## identical values. This is the case for example if we want to use a
## GPos object to represent the coverage of sequencing reads along a
## genome.

## Example 5:
library(pasillaBamSubset)
library(Rsamtools)  # for the BamFile() constructor function
bamfile1 <- BamFile(untreated1_chr4())
bamfile2 <- BamFile(untreated3_chr4())
gpos5 <- GPos(seqinfo(bamfile1))
library(GenomicAlignments)  # for "coverage" method for BamFile objects
cov1 <- unlist(coverage(bamfile1), use.names=FALSE)
cov2 <- unlist(coverage(bamfile2), use.names=FALSE)
mcols(gpos5) <- DataFrame(cov1, cov2)
gpos5

object.size(gpos5)  # lightweight

## Keep only the positions where coverage is at least 10 in one of the
## 2 samples:
gpos5[mcols(gpos5)$cov1 >= 10 | mcols(gpos5)$cov2 >= 10]

## ---------------------------------------------------------------------
## USING A GPos OBJECT IN A SummarizedExperiment OBJECT
## ---------------------------------------------------------------------
## Because the GPos class extends the GenomicRanges virtual class, a GPos
## object can be used as the rowRanges component of a SummarizedExperiment
## object.

## As a 1st example, we show how the counts for samples sA and sB in
## 'gpos3' can be stored in a SummarizedExperiment object where the rows
## correspond to unique genomic positions and the columns to samples:
library(SummarizedExperiment)
counts <- cbind(sA=sA_counts, sB=sB_counts)
mcols(gpos3A) <- NULL
rse3 <- SummarizedExperiment(list(counts=counts), rowRanges=gpos3A)
rse3
rowRanges(rse3)
head(assay(rse3))

## Finally we show how the coverage data from Example 5 can be easily
## stored in a lightweight SummarizedExperiment object:
cov <- mcols(gpos5)
mcols(gpos5) <- NULL
rse5 <- SummarizedExperiment(list(cov=cov), rowRanges=gpos5)
rse5
rowRanges(rse5)
assay(rse5)

## Keep only the positions where coverage is at least 10 in one of the
## 2 samples:
rse5[assay(rse5)$cov1 >= 10 | assay(rse5)$cov2 >= 10]

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GenomicRanges)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomicRanges/GPos-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: GPos-class
> ### Title: GPos objects
> ### Aliases: class:GPos GPos-class GPos length,GPos-method
> ###   names,GPos-method names<-,GPos-method seqnames,GPos-method pos
> ###   pos,GPos-method start,GPos-method end,GPos-method width,GPos-method
> ###   ranges,GPos-method strand,GPos-method seqinfo,GPos-method
> ###   seqinfo<-,GPos-method coerce,GenomicRanges,GPos-method
> ###   as.data.frame,GPos-method extractROWS,GPos-method show,GPos-method
> ###   c,GPos-method
> ### Keywords: methods classes
> 
> ### ** Examples
> 
> ## ---------------------------------------------------------------------
> ## BASIC EXAMPLES
> ## ---------------------------------------------------------------------
> 
> ## Example 1:
> gpos1 <- GPos(c("chr1:44-53", "chr1:5-10", "chr2:2-5"))
> gpos1
GPos object with 20 positions and 0 metadata columns:
       seqnames       pos strand
          <Rle> <integer>  <Rle>
   [1]     chr1        44      *
   [2]     chr1        45      *
   [3]     chr1        46      *
   [4]     chr1        47      *
   [5]     chr1        48      *
   ...      ...       ...    ...
  [16]     chr1        10      *
  [17]     chr2         2      *
  [18]     chr2         3      *
  [19]     chr2         4      *
  [20]     chr2         5      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> 
> length(gpos1)
[1] 20
> seqnames(gpos1)
factor-Rle of length 20 with 2 runs
  Lengths:   16    4
  Values : chr1 chr2
Levels(2): chr1 chr2
> pos(gpos1)  # same as 'start(gpos1)' and 'end(gpos1)'
 [1] 44 45 46 47 48 49 50 51 52 53  5  6  7  8  9 10  2  3  4  5
> strand(gpos1)
factor-Rle of length 20 with 1 run
  Lengths: 20
  Values :  *
Levels(3): + - *
> as.character(gpos1)
 [1] "chr1:44-44" "chr1:45-45" "chr1:46-46" "chr1:47-47" "chr1:48-48"
 [6] "chr1:49-49" "chr1:50-50" "chr1:51-51" "chr1:52-52" "chr1:53-53"
[11] "chr1:5-5"   "chr1:6-6"   "chr1:7-7"   "chr1:8-8"   "chr1:9-9"  
[16] "chr1:10-10" "chr2:2-2"   "chr2:3-3"   "chr2:4-4"   "chr2:5-5"  
> as.data.frame(gpos1)
   seqnames pos strand
1      chr1  44      *
2      chr1  45      *
3      chr1  46      *
4      chr1  47      *
5      chr1  48      *
6      chr1  49      *
7      chr1  50      *
8      chr1  51      *
9      chr1  52      *
10     chr1  53      *
11     chr1   5      *
12     chr1   6      *
13     chr1   7      *
14     chr1   8      *
15     chr1   9      *
16     chr1  10      *
17     chr2   2      *
18     chr2   3      *
19     chr2   4      *
20     chr2   5      *
> as(gpos1, "GRanges")
GRanges object with 20 ranges and 0 metadata columns:
       seqnames    ranges strand
          <Rle> <IRanges>  <Rle>
   [1]     chr1  [44, 44]      *
   [2]     chr1  [45, 45]      *
   [3]     chr1  [46, 46]      *
   [4]     chr1  [47, 47]      *
   [5]     chr1  [48, 48]      *
   ...      ...       ...    ...
  [16]     chr1  [10, 10]      *
  [17]     chr2  [ 2,  2]      *
  [18]     chr2  [ 3,  3]      *
  [19]     chr2  [ 4,  4]      *
  [20]     chr2  [ 5,  5]      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> as.data.frame(as(gpos1, "GRanges"))
   seqnames start end width strand
1      chr1    44  44     1      *
2      chr1    45  45     1      *
3      chr1    46  46     1      *
4      chr1    47  47     1      *
5      chr1    48  48     1      *
6      chr1    49  49     1      *
7      chr1    50  50     1      *
8      chr1    51  51     1      *
9      chr1    52  52     1      *
10     chr1    53  53     1      *
11     chr1     5   5     1      *
12     chr1     6   6     1      *
13     chr1     7   7     1      *
14     chr1     8   8     1      *
15     chr1     9   9     1      *
16     chr1    10  10     1      *
17     chr2     2   2     1      *
18     chr2     3   3     1      *
19     chr2     4   4     1      *
20     chr2     5   5     1      *
> gpos1[9:17]
GPos object with 9 positions and 0 metadata columns:
      seqnames       pos strand
         <Rle> <integer>  <Rle>
  [1]     chr1        52      *
  [2]     chr1        53      *
  [3]     chr1         5      *
  [4]     chr1         6      *
  [5]     chr1         7      *
  [6]     chr1         8      *
  [7]     chr1         9      *
  [8]     chr1        10      *
  [9]     chr2         2      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> 
> ## Example 2:
> pos_runs <- GRanges("chrI", IRanges(c(1, 6, 12, 17), c(5, 10, 16, 20)),
+                     strand=c("+", "-", "-", "+"))
> gpos2 <- GPos(pos_runs)
> gpos2
GPos object with 19 positions and 0 metadata columns:
       seqnames       pos strand
          <Rle> <integer>  <Rle>
   [1]     chrI         1      +
   [2]     chrI         2      +
   [3]     chrI         3      +
   [4]     chrI         4      +
   [5]     chrI         5      +
   ...      ...       ...    ...
  [15]     chrI        16      -
  [16]     chrI        17      +
  [17]     chrI        18      +
  [18]     chrI        19      +
  [19]     chrI        20      +
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> 
> strand(gpos2)
factor-Rle of length 19 with 3 runs
  Lengths:  5 10  4
  Values :  +  -  +
Levels(3): + - *
> 
> ## Example 3:
> gpos3A <- gpos3B <- GPos(c("chrI:1-1000", "chrI:1005-2000"))
> npos <- length(gpos3A)
> 
> mcols(gpos3A)$sample <- Rle("sA")
> sA_counts <- sample(10, npos, replace=TRUE)
> mcols(gpos3A)$counts <- sA_counts
> 
> mcols(gpos3B)$sample <- Rle("sB")
> sB_counts <- sample(10, npos, replace=TRUE)
> mcols(gpos3B)$counts <- sB_counts
> 
> gpos3 <- c(gpos3A, gpos3B)
> gpos3
GPos object with 3992 positions and 2 metadata columns:
         seqnames       pos strand | sample    counts
            <Rle> <integer>  <Rle> |  <Rle> <integer>
     [1]     chrI         1      * |     sA         2
     [2]     chrI         2      * |     sA         1
     [3]     chrI         3      * |     sA         5
     [4]     chrI         4      * |     sA         7
     [5]     chrI         5      * |     sA         7
     ...      ...       ...    ... .    ...       ...
  [3988]     chrI      1996      * |     sB         1
  [3989]     chrI      1997      * |     sB         2
  [3990]     chrI      1998      * |     sB         3
  [3991]     chrI      1999      * |     sB         5
  [3992]     chrI      2000      * |     sB        10
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> 
> ## Example 4:
> library(BSgenome.Scerevisiae.UCSC.sacCer2)
Loading required package: BSgenome
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> genome <- BSgenome.Scerevisiae.UCSC.sacCer2
> gpos4 <- GPos(seqinfo(genome))
> gpos4  # all the positions along the genome are represented
GPos object with 12162995 positions and 0 metadata columns:
             seqnames       pos strand
                <Rle> <integer>  <Rle>
         [1]     chrI         1      *
         [2]     chrI         2      *
         [3]     chrI         3      *
         [4]     chrI         4      *
         [5]     chrI         5      *
         ...      ...       ...    ...
  [12162991]  2micron      6314      *
  [12162992]  2micron      6315      *
  [12162993]  2micron      6316      *
  [12162994]  2micron      6317      *
  [12162995]  2micron      6318      *
  -------
  seqinfo: 18 sequences (2 circular) from sacCer2 genome
> mcols(gpos4)$dna <- do.call("c", unname(as.list(genome)))
> gpos4
GPos object with 12162995 positions and 1 metadata column:
             seqnames       pos strand |         dna
                <Rle> <integer>  <Rle> | <DNAString>
         [1]     chrI         1      * |           C
         [2]     chrI         2      * |           C
         [3]     chrI         3      * |           A
         [4]     chrI         4      * |           C
         [5]     chrI         5      * |           A
         ...      ...       ...    ... .         ...
  [12162991]  2micron      6314      * |           A
  [12162992]  2micron      6315      * |           A
  [12162993]  2micron      6316      * |           C
  [12162994]  2micron      6317      * |           G
  [12162995]  2micron      6318      * |           A
  -------
  seqinfo: 18 sequences (2 circular) from sacCer2 genome
> 
> ## Note however that, like for any Vector derivative, the length of a
> ## GPos object cannot exceed '.Machine$integer.max' (i.e. 2^31 on most
> ## platforms) so the above only works with a "small" genome.
> ## For example it doesn't work with the Human genome:
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> ## Not run: 
> ##D   GPos(seqinfo(TxDb.Hsapiens.UCSC.hg19.knownGene))  # error!
> ## End(Not run)
> 
> ## You can use isSmallGenome() to check upfront whether the genome is
> ## "small" or not.
> isSmallGenome(genome)
[1] TRUE
> isSmallGenome(TxDb.Hsapiens.UCSC.hg19.knownGene)
[1] FALSE
> 
> ## ---------------------------------------------------------------------
> ## MEMORY USAGE
> ## ---------------------------------------------------------------------
> 
> ## Coercion to GRanges works...
> gr4 <- as(gpos4, "GRanges")
> gr4
GRanges object with 12162995 ranges and 1 metadata column:
             seqnames       ranges strand |         dna
                <Rle>    <IRanges>  <Rle> | <DNAString>
         [1]     chrI       [1, 1]      * |           C
         [2]     chrI       [2, 2]      * |           C
         [3]     chrI       [3, 3]      * |           A
         [4]     chrI       [4, 4]      * |           C
         [5]     chrI       [5, 5]      * |           A
         ...      ...          ...    ... .         ...
  [12162991]  2micron [6314, 6314]      * |           A
  [12162992]  2micron [6315, 6315]      * |           A
  [12162993]  2micron [6316, 6316]      * |           C
  [12162994]  2micron [6317, 6317]      * |           G
  [12162995]  2micron [6318, 6318]      * |           A
  -------
  seqinfo: 18 sequences (2 circular) from sacCer2 genome
> ## ... but is generally not a good idea:
> object.size(gpos4)
12179168 bytes
> object.size(gr4)  # 6951 times bigger than the GPos object!
109480376 bytes
> 
> ## Shuffling the order of the positions impacts memory usage:
> gpos4r <- rev(gpos4)
> object.size(gpos4r)  # significantly
109482880 bytes
> gpos4s <- sample(gpos4)
> object.size(gpos4s)  # even worse!
199543664 bytes
> 
> ## AN IMPORTANT NOTE: In the worst situations, GPos still performs as
> ## good as a GRanges object.
> object.size(as(gpos4r, "GRanges"))  # same size as 'gpos4r'
109480376 bytes
> object.size(as(gpos4s, "GRanges"))  # same size as 'gpos4s'
199541160 bytes
> 
> ## Best case scenario is when the object is strictly sorted (i.e.
> ## positions are in strict ascending order).
> ## This can be checked with:
> is.unsorted(gpos4, strict=TRUE)  # 'gpos4' is strictly sorted
[1] FALSE
> 
> ## ---------------------------------------------------------------------
> ## USING MEMORY-EFFICIENT METADATA COLUMNS
> ## ---------------------------------------------------------------------
> ## In order to keep memory usage as low as possible, it is recommended
> ## to use a memory-efficient representation of the metadata columns that
> ## we want to set on the object. Rle's are particularly well suited for
> ## this, especially if the metadata columns contain long runs of
> ## identical values. This is the case for example if we want to use a
> ## GPos object to represent the coverage of sequencing reads along a
> ## genome.
> 
> ## Example 5:
> library(pasillaBamSubset)
> library(Rsamtools)  # for the BamFile() constructor function
> bamfile1 <- BamFile(untreated1_chr4())
> bamfile2 <- BamFile(untreated3_chr4())
> gpos5 <- GPos(seqinfo(bamfile1))
> library(GenomicAlignments)  # for "coverage" method for BamFile objects
Loading required package: SummarizedExperiment
> cov1 <- unlist(coverage(bamfile1), use.names=FALSE)
> cov2 <- unlist(coverage(bamfile2), use.names=FALSE)
> mcols(gpos5) <- DataFrame(cov1, cov2)
> gpos5
GPos object with 120748101 positions and 2 metadata columns:
              seqnames       pos strand |  cov1  cov2
                 <Rle> <integer>  <Rle> | <Rle> <Rle>
          [1]    chr2L         1      * |     0     0
          [2]    chr2L         2      * |     0     0
          [3]    chr2L         3      * |     0     0
          [4]    chr2L         4      * |     0     0
          [5]    chr2L         5      * |     0     0
          ...      ...       ...    ... .   ...   ...
  [120748097]  chrYHet    347034      * |     0     0
  [120748098]  chrYHet    347035      * |     0     0
  [120748099]  chrYHet    347036      * |     0     0
  [120748100]  chrYHet    347037      * |     0     0
  [120748101]  chrYHet    347038      * |     0     0
  -------
  seqinfo: 8 sequences from an unspecified genome
> 
> object.size(gpos5)  # lightweight
1828448 bytes
> 
> ## Keep only the positions where coverage is at least 10 in one of the
> ## 2 samples:
> gpos5[mcols(gpos5)$cov1 >= 10 | mcols(gpos5)$cov2 >= 10]
GPos object with 214773 positions and 2 metadata columns:
           seqnames       pos strand |  cov1  cov2
              <Rle> <integer>  <Rle> | <Rle> <Rle>
       [1]     chr4      4936      * |    11     6
       [2]     chr4      4937      * |    12     6
       [3]     chr4      4938      * |    12     6
       [4]     chr4      4939      * |    13     6
       [5]     chr4      4940      * |    13     6
       ...      ...       ...    ... .   ...   ...
  [214769]     chr4   1348339      * |    10    14
  [214770]     chr4   1348340      * |     6    12
  [214771]     chr4   1348341      * |     6    10
  [214772]     chr4   1348342      * |     6    10
  [214773]     chr4   1348343      * |     0    10
  -------
  seqinfo: 8 sequences from an unspecified genome
> 
> ## ---------------------------------------------------------------------
> ## USING A GPos OBJECT IN A SummarizedExperiment OBJECT
> ## ---------------------------------------------------------------------
> ## Because the GPos class extends the GenomicRanges virtual class, a GPos
> ## object can be used as the rowRanges component of a SummarizedExperiment
> ## object.
> 
> ## As a 1st example, we show how the counts for samples sA and sB in
> ## 'gpos3' can be stored in a SummarizedExperiment object where the rows
> ## correspond to unique genomic positions and the columns to samples:
> library(SummarizedExperiment)
> counts <- cbind(sA=sA_counts, sB=sB_counts)
> mcols(gpos3A) <- NULL
> rse3 <- SummarizedExperiment(list(counts=counts), rowRanges=gpos3A)
> rse3
class: RangedSummarizedExperiment 
dim: 1996 2 
metadata(0):
assays(1): counts
rownames: NULL
rowData names(0):
colnames(2): sA sB
colData names(0):
> rowRanges(rse3)
GPos object with 1996 positions and 0 metadata columns:
         seqnames       pos strand
            <Rle> <integer>  <Rle>
     [1]     chrI         1      *
     [2]     chrI         2      *
     [3]     chrI         3      *
     [4]     chrI         4      *
     [5]     chrI         5      *
     ...      ...       ...    ...
  [1992]     chrI      1996      *
  [1993]     chrI      1997      *
  [1994]     chrI      1998      *
  [1995]     chrI      1999      *
  [1996]     chrI      2000      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> head(assay(rse3))
     sA sB
[1,]  2  8
[2,]  1  6
[3,]  5  7
[4,]  7  3
[5,]  7  8
[6,]  2 10
> 
> ## Finally we show how the coverage data from Example 5 can be easily
> ## stored in a lightweight SummarizedExperiment object:
> cov <- mcols(gpos5)
> mcols(gpos5) <- NULL
> rse5 <- SummarizedExperiment(list(cov=cov), rowRanges=gpos5)
> rse5
class: RangedSummarizedExperiment 
dim: 120748101 2 
metadata(0):
assays(1): cov
rownames: NULL
rowData names(0):
colnames(2): cov1 cov2
colData names(0):
> rowRanges(rse5)
GPos object with 120748101 positions and 0 metadata columns:
              seqnames       pos strand
                 <Rle> <integer>  <Rle>
          [1]    chr2L         1      *
          [2]    chr2L         2      *
          [3]    chr2L         3      *
          [4]    chr2L         4      *
          [5]    chr2L         5      *
          ...      ...       ...    ...
  [120748097]  chrYHet    347034      *
  [120748098]  chrYHet    347035      *
  [120748099]  chrYHet    347036      *
  [120748100]  chrYHet    347037      *
  [120748101]  chrYHet    347038      *
  -------
  seqinfo: 8 sequences from an unspecified genome
> assay(rse5)
DataFrame with 120748101 rows and 2 columns
           cov1  cov2
          <Rle> <Rle>
1             0     0
2             0     0
3             0     0
4             0     0
5             0     0
...         ...   ...
120748097     0     0
120748098     0     0
120748099     0     0
120748100     0     0
120748101     0     0
> 
> ## Keep only the positions where coverage is at least 10 in one of the
> ## 2 samples:
> rse5[assay(rse5)$cov1 >= 10 | assay(rse5)$cov2 >= 10]
class: RangedSummarizedExperiment 
dim: 214773 2 
metadata(0):
assays(1): cov
rownames: NULL
rowData names(0):
colnames(2): cov1 cov2
colData names(0):
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>