The GPos class is a container for storing a set of genomic positions,
that is, genomic ranges of width 1. Even though a GRanges object can
be used for that, using a GPos object can be much more memory-efficient,
especially when the object contains long runs of adjacent positions.
Usage
GPos(pos_runs) # constructor function
Arguments
pos_runs
A GRanges object (or any other GenomicRanges derivative)
where each range is interpreted as a run of adjacent genomic positions.
If pos_runs is not a GenomicRanges object,
GPos() first tries to coerce it to one with
as(pos_runs, "GenomicRanges", strict=FALSE).
Value
A GPos object.
Accessors
Getters
GPos objects support the same set of getters as GRanges
objects (i.e. seqnames(), start(), end(),
ranges(), strand(), mcols(), seqinfo(),
etc...), plus the pos() getter which is equivalent to
start() or end(). See ?GRanges for the list of
getters supported by GRanges objects.
Note that a GPos object cannot hold names i.e. names()
always returns NULL on it.
Setters
Like GRanges objects, GPos objects support the following
setters:
The mcols() and metadata() setters.
The family of setters that operate on the seqinfo component of
an object:
seqlevels(),
seqlevelsStyle(),
seqlengths(),
isCircular(),
genome(),
and seqinfo().
These setters are defined and documented in the GenomeInfoDb
package.
However, there is no seqnames(), pos(), or strand()
setter for GPos objects at the moment (although they might be
added in the future).
Coercion
From GenomicRanges to GPos:
A GenomicRanges object x in which all the ranges have a width
of 1 can be coerced to a GPos object with as(x, "GPos").
The names on x are not propagated (a warning is issued if x
has names on it).
From GPos to GRanges:
A GPos object x can be coerced to a GRanges object
with as(x, "GRanges"). However be aware that the resulting object
can use thousands times (or more) memory than x!
See "MEMORY USAGE" in the Examples section below.
From GPos to ordinary R objects:
Like with a GRanges object, as.character(), as.factor(),
and as.data.frame() work with a GPos object x.
Note that as.data.frame(x) returns a data frame with a pos
column (containing pos(x)) instead of the start, end,
and width columns that one gets when x is a GRanges
object.
Subsetting
A GPos object can be subsetted exactly like a GRanges object.
Combining
GPos objects can be combined (a.k.a. appended) with c() or
append().
Splitting and Relisting
Like with a GRanges object, split() and relist() work
with a GPos object x. Note that they return a
GenomicRangesList object instead of a GRangesList object.
Note
Like for any Vector derivative, the length of a
GPos object cannot exceed .Machine$integer.max (i.e. 2^31 on
most platforms). GPos() will return an error if pos_runs
contains too many genomic positions.
The seqinfo accessor and family in
the GenomeInfoDb package for accessing/modifying the seqinfo
component of an object.
GenomicRanges-comparison for comparing and ordering genomic
positions.
findOverlaps-methods for finding overlapping
genomic ranges and/or positions.
nearest-methods for finding the nearest
genomic range/position neighbor.
The snpsBySeqname,
snpsByOverlaps, and
snpsById methods for
SNPlocs objects defined in the BSgenome
package for extractors that return a GPos object.
SummarizedExperiment objects in the
SummarizedExperiment package.
Examples
## ---------------------------------------------------------------------
## BASIC EXAMPLES
## ---------------------------------------------------------------------
## Example 1:
gpos1 <- GPos(c("chr1:44-53", "chr1:5-10", "chr2:2-5"))
gpos1
length(gpos1)
seqnames(gpos1)
pos(gpos1) # same as 'start(gpos1)' and 'end(gpos1)'
strand(gpos1)
as.character(gpos1)
as.data.frame(gpos1)
as(gpos1, "GRanges")
as.data.frame(as(gpos1, "GRanges"))
gpos1[9:17]
## Example 2:
pos_runs <- GRanges("chrI", IRanges(c(1, 6, 12, 17), c(5, 10, 16, 20)),
strand=c("+", "-", "-", "+"))
gpos2 <- GPos(pos_runs)
gpos2
strand(gpos2)
## Example 3:
gpos3A <- gpos3B <- GPos(c("chrI:1-1000", "chrI:1005-2000"))
npos <- length(gpos3A)
mcols(gpos3A)$sample <- Rle("sA")
sA_counts <- sample(10, npos, replace=TRUE)
mcols(gpos3A)$counts <- sA_counts
mcols(gpos3B)$sample <- Rle("sB")
sB_counts <- sample(10, npos, replace=TRUE)
mcols(gpos3B)$counts <- sB_counts
gpos3 <- c(gpos3A, gpos3B)
gpos3
## Example 4:
library(BSgenome.Scerevisiae.UCSC.sacCer2)
genome <- BSgenome.Scerevisiae.UCSC.sacCer2
gpos4 <- GPos(seqinfo(genome))
gpos4 # all the positions along the genome are represented
mcols(gpos4)$dna <- do.call("c", unname(as.list(genome)))
gpos4
## Note however that, like for any Vector derivative, the length of a
## GPos object cannot exceed '.Machine$integer.max' (i.e. 2^31 on most
## platforms) so the above only works with a "small" genome.
## For example it doesn't work with the Human genome:
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
## Not run:
GPos(seqinfo(TxDb.Hsapiens.UCSC.hg19.knownGene)) # error!
## End(Not run)
## You can use isSmallGenome() to check upfront whether the genome is
## "small" or not.
isSmallGenome(genome)
isSmallGenome(TxDb.Hsapiens.UCSC.hg19.knownGene)
## ---------------------------------------------------------------------
## MEMORY USAGE
## ---------------------------------------------------------------------
## Coercion to GRanges works...
gr4 <- as(gpos4, "GRanges")
gr4
## ... but is generally not a good idea:
object.size(gpos4)
object.size(gr4) # 6951 times bigger than the GPos object!
## Shuffling the order of the positions impacts memory usage:
gpos4r <- rev(gpos4)
object.size(gpos4r) # significantly
gpos4s <- sample(gpos4)
object.size(gpos4s) # even worse!
## AN IMPORTANT NOTE: In the worst situations, GPos still performs as
## good as a GRanges object.
object.size(as(gpos4r, "GRanges")) # same size as 'gpos4r'
object.size(as(gpos4s, "GRanges")) # same size as 'gpos4s'
## Best case scenario is when the object is strictly sorted (i.e.
## positions are in strict ascending order).
## This can be checked with:
is.unsorted(gpos4, strict=TRUE) # 'gpos4' is strictly sorted
## ---------------------------------------------------------------------
## USING MEMORY-EFFICIENT METADATA COLUMNS
## ---------------------------------------------------------------------
## In order to keep memory usage as low as possible, it is recommended
## to use a memory-efficient representation of the metadata columns that
## we want to set on the object. Rle's are particularly well suited for
## this, especially if the metadata columns contain long runs of
## identical values. This is the case for example if we want to use a
## GPos object to represent the coverage of sequencing reads along a
## genome.
## Example 5:
library(pasillaBamSubset)
library(Rsamtools) # for the BamFile() constructor function
bamfile1 <- BamFile(untreated1_chr4())
bamfile2 <- BamFile(untreated3_chr4())
gpos5 <- GPos(seqinfo(bamfile1))
library(GenomicAlignments) # for "coverage" method for BamFile objects
cov1 <- unlist(coverage(bamfile1), use.names=FALSE)
cov2 <- unlist(coverage(bamfile2), use.names=FALSE)
mcols(gpos5) <- DataFrame(cov1, cov2)
gpos5
object.size(gpos5) # lightweight
## Keep only the positions where coverage is at least 10 in one of the
## 2 samples:
gpos5[mcols(gpos5)$cov1 >= 10 | mcols(gpos5)$cov2 >= 10]
## ---------------------------------------------------------------------
## USING A GPos OBJECT IN A SummarizedExperiment OBJECT
## ---------------------------------------------------------------------
## Because the GPos class extends the GenomicRanges virtual class, a GPos
## object can be used as the rowRanges component of a SummarizedExperiment
## object.
## As a 1st example, we show how the counts for samples sA and sB in
## 'gpos3' can be stored in a SummarizedExperiment object where the rows
## correspond to unique genomic positions and the columns to samples:
library(SummarizedExperiment)
counts <- cbind(sA=sA_counts, sB=sB_counts)
mcols(gpos3A) <- NULL
rse3 <- SummarizedExperiment(list(counts=counts), rowRanges=gpos3A)
rse3
rowRanges(rse3)
head(assay(rse3))
## Finally we show how the coverage data from Example 5 can be easily
## stored in a lightweight SummarizedExperiment object:
cov <- mcols(gpos5)
mcols(gpos5) <- NULL
rse5 <- SummarizedExperiment(list(cov=cov), rowRanges=gpos5)
rse5
rowRanges(rse5)
assay(rse5)
## Keep only the positions where coverage is at least 10 in one of the
## 2 samples:
rse5[assay(rse5)$cov1 >= 10 | assay(rse5)$cov2 >= 10]
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(GenomicRanges)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomicRanges/GPos-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: GPos-class
> ### Title: GPos objects
> ### Aliases: class:GPos GPos-class GPos length,GPos-method
> ### names,GPos-method names<-,GPos-method seqnames,GPos-method pos
> ### pos,GPos-method start,GPos-method end,GPos-method width,GPos-method
> ### ranges,GPos-method strand,GPos-method seqinfo,GPos-method
> ### seqinfo<-,GPos-method coerce,GenomicRanges,GPos-method
> ### as.data.frame,GPos-method extractROWS,GPos-method show,GPos-method
> ### c,GPos-method
> ### Keywords: methods classes
>
> ### ** Examples
>
> ## ---------------------------------------------------------------------
> ## BASIC EXAMPLES
> ## ---------------------------------------------------------------------
>
> ## Example 1:
> gpos1 <- GPos(c("chr1:44-53", "chr1:5-10", "chr2:2-5"))
> gpos1
GPos object with 20 positions and 0 metadata columns:
seqnames pos strand
<Rle> <integer> <Rle>
[1] chr1 44 *
[2] chr1 45 *
[3] chr1 46 *
[4] chr1 47 *
[5] chr1 48 *
... ... ... ...
[16] chr1 10 *
[17] chr2 2 *
[18] chr2 3 *
[19] chr2 4 *
[20] chr2 5 *
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
>
> length(gpos1)
[1] 20
> seqnames(gpos1)
factor-Rle of length 20 with 2 runs
Lengths: 16 4
Values : chr1 chr2
Levels(2): chr1 chr2
> pos(gpos1) # same as 'start(gpos1)' and 'end(gpos1)'
[1] 44 45 46 47 48 49 50 51 52 53 5 6 7 8 9 10 2 3 4 5
> strand(gpos1)
factor-Rle of length 20 with 1 run
Lengths: 20
Values : *
Levels(3): + - *
> as.character(gpos1)
[1] "chr1:44-44" "chr1:45-45" "chr1:46-46" "chr1:47-47" "chr1:48-48"
[6] "chr1:49-49" "chr1:50-50" "chr1:51-51" "chr1:52-52" "chr1:53-53"
[11] "chr1:5-5" "chr1:6-6" "chr1:7-7" "chr1:8-8" "chr1:9-9"
[16] "chr1:10-10" "chr2:2-2" "chr2:3-3" "chr2:4-4" "chr2:5-5"
> as.data.frame(gpos1)
seqnames pos strand
1 chr1 44 *
2 chr1 45 *
3 chr1 46 *
4 chr1 47 *
5 chr1 48 *
6 chr1 49 *
7 chr1 50 *
8 chr1 51 *
9 chr1 52 *
10 chr1 53 *
11 chr1 5 *
12 chr1 6 *
13 chr1 7 *
14 chr1 8 *
15 chr1 9 *
16 chr1 10 *
17 chr2 2 *
18 chr2 3 *
19 chr2 4 *
20 chr2 5 *
> as(gpos1, "GRanges")
GRanges object with 20 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [44, 44] *
[2] chr1 [45, 45] *
[3] chr1 [46, 46] *
[4] chr1 [47, 47] *
[5] chr1 [48, 48] *
... ... ... ...
[16] chr1 [10, 10] *
[17] chr2 [ 2, 2] *
[18] chr2 [ 3, 3] *
[19] chr2 [ 4, 4] *
[20] chr2 [ 5, 5] *
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
> as.data.frame(as(gpos1, "GRanges"))
seqnames start end width strand
1 chr1 44 44 1 *
2 chr1 45 45 1 *
3 chr1 46 46 1 *
4 chr1 47 47 1 *
5 chr1 48 48 1 *
6 chr1 49 49 1 *
7 chr1 50 50 1 *
8 chr1 51 51 1 *
9 chr1 52 52 1 *
10 chr1 53 53 1 *
11 chr1 5 5 1 *
12 chr1 6 6 1 *
13 chr1 7 7 1 *
14 chr1 8 8 1 *
15 chr1 9 9 1 *
16 chr1 10 10 1 *
17 chr2 2 2 1 *
18 chr2 3 3 1 *
19 chr2 4 4 1 *
20 chr2 5 5 1 *
> gpos1[9:17]
GPos object with 9 positions and 0 metadata columns:
seqnames pos strand
<Rle> <integer> <Rle>
[1] chr1 52 *
[2] chr1 53 *
[3] chr1 5 *
[4] chr1 6 *
[5] chr1 7 *
[6] chr1 8 *
[7] chr1 9 *
[8] chr1 10 *
[9] chr2 2 *
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
>
> ## Example 2:
> pos_runs <- GRanges("chrI", IRanges(c(1, 6, 12, 17), c(5, 10, 16, 20)),
+ strand=c("+", "-", "-", "+"))
> gpos2 <- GPos(pos_runs)
> gpos2
GPos object with 19 positions and 0 metadata columns:
seqnames pos strand
<Rle> <integer> <Rle>
[1] chrI 1 +
[2] chrI 2 +
[3] chrI 3 +
[4] chrI 4 +
[5] chrI 5 +
... ... ... ...
[15] chrI 16 -
[16] chrI 17 +
[17] chrI 18 +
[18] chrI 19 +
[19] chrI 20 +
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
> strand(gpos2)
factor-Rle of length 19 with 3 runs
Lengths: 5 10 4
Values : + - +
Levels(3): + - *
>
> ## Example 3:
> gpos3A <- gpos3B <- GPos(c("chrI:1-1000", "chrI:1005-2000"))
> npos <- length(gpos3A)
>
> mcols(gpos3A)$sample <- Rle("sA")
> sA_counts <- sample(10, npos, replace=TRUE)
> mcols(gpos3A)$counts <- sA_counts
>
> mcols(gpos3B)$sample <- Rle("sB")
> sB_counts <- sample(10, npos, replace=TRUE)
> mcols(gpos3B)$counts <- sB_counts
>
> gpos3 <- c(gpos3A, gpos3B)
> gpos3
GPos object with 3992 positions and 2 metadata columns:
seqnames pos strand | sample counts
<Rle> <integer> <Rle> | <Rle> <integer>
[1] chrI 1 * | sA 2
[2] chrI 2 * | sA 1
[3] chrI 3 * | sA 5
[4] chrI 4 * | sA 7
[5] chrI 5 * | sA 7
... ... ... ... . ... ...
[3988] chrI 1996 * | sB 1
[3989] chrI 1997 * | sB 2
[3990] chrI 1998 * | sB 3
[3991] chrI 1999 * | sB 5
[3992] chrI 2000 * | sB 10
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
> ## Example 4:
> library(BSgenome.Scerevisiae.UCSC.sacCer2)
Loading required package: BSgenome
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> genome <- BSgenome.Scerevisiae.UCSC.sacCer2
> gpos4 <- GPos(seqinfo(genome))
> gpos4 # all the positions along the genome are represented
GPos object with 12162995 positions and 0 metadata columns:
seqnames pos strand
<Rle> <integer> <Rle>
[1] chrI 1 *
[2] chrI 2 *
[3] chrI 3 *
[4] chrI 4 *
[5] chrI 5 *
... ... ... ...
[12162991] 2micron 6314 *
[12162992] 2micron 6315 *
[12162993] 2micron 6316 *
[12162994] 2micron 6317 *
[12162995] 2micron 6318 *
-------
seqinfo: 18 sequences (2 circular) from sacCer2 genome
> mcols(gpos4)$dna <- do.call("c", unname(as.list(genome)))
> gpos4
GPos object with 12162995 positions and 1 metadata column:
seqnames pos strand | dna
<Rle> <integer> <Rle> | <DNAString>
[1] chrI 1 * | C
[2] chrI 2 * | C
[3] chrI 3 * | A
[4] chrI 4 * | C
[5] chrI 5 * | A
... ... ... ... . ...
[12162991] 2micron 6314 * | A
[12162992] 2micron 6315 * | A
[12162993] 2micron 6316 * | C
[12162994] 2micron 6317 * | G
[12162995] 2micron 6318 * | A
-------
seqinfo: 18 sequences (2 circular) from sacCer2 genome
>
> ## Note however that, like for any Vector derivative, the length of a
> ## GPos object cannot exceed '.Machine$integer.max' (i.e. 2^31 on most
> ## platforms) so the above only works with a "small" genome.
> ## For example it doesn't work with the Human genome:
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> ## Not run:
> ##D GPos(seqinfo(TxDb.Hsapiens.UCSC.hg19.knownGene)) # error!
> ## End(Not run)
>
> ## You can use isSmallGenome() to check upfront whether the genome is
> ## "small" or not.
> isSmallGenome(genome)
[1] TRUE
> isSmallGenome(TxDb.Hsapiens.UCSC.hg19.knownGene)
[1] FALSE
>
> ## ---------------------------------------------------------------------
> ## MEMORY USAGE
> ## ---------------------------------------------------------------------
>
> ## Coercion to GRanges works...
> gr4 <- as(gpos4, "GRanges")
> gr4
GRanges object with 12162995 ranges and 1 metadata column:
seqnames ranges strand | dna
<Rle> <IRanges> <Rle> | <DNAString>
[1] chrI [1, 1] * | C
[2] chrI [2, 2] * | C
[3] chrI [3, 3] * | A
[4] chrI [4, 4] * | C
[5] chrI [5, 5] * | A
... ... ... ... . ...
[12162991] 2micron [6314, 6314] * | A
[12162992] 2micron [6315, 6315] * | A
[12162993] 2micron [6316, 6316] * | C
[12162994] 2micron [6317, 6317] * | G
[12162995] 2micron [6318, 6318] * | A
-------
seqinfo: 18 sequences (2 circular) from sacCer2 genome
> ## ... but is generally not a good idea:
> object.size(gpos4)
12179168 bytes
> object.size(gr4) # 6951 times bigger than the GPos object!
109480376 bytes
>
> ## Shuffling the order of the positions impacts memory usage:
> gpos4r <- rev(gpos4)
> object.size(gpos4r) # significantly
109482880 bytes
> gpos4s <- sample(gpos4)
> object.size(gpos4s) # even worse!
199543664 bytes
>
> ## AN IMPORTANT NOTE: In the worst situations, GPos still performs as
> ## good as a GRanges object.
> object.size(as(gpos4r, "GRanges")) # same size as 'gpos4r'
109480376 bytes
> object.size(as(gpos4s, "GRanges")) # same size as 'gpos4s'
199541160 bytes
>
> ## Best case scenario is when the object is strictly sorted (i.e.
> ## positions are in strict ascending order).
> ## This can be checked with:
> is.unsorted(gpos4, strict=TRUE) # 'gpos4' is strictly sorted
[1] FALSE
>
> ## ---------------------------------------------------------------------
> ## USING MEMORY-EFFICIENT METADATA COLUMNS
> ## ---------------------------------------------------------------------
> ## In order to keep memory usage as low as possible, it is recommended
> ## to use a memory-efficient representation of the metadata columns that
> ## we want to set on the object. Rle's are particularly well suited for
> ## this, especially if the metadata columns contain long runs of
> ## identical values. This is the case for example if we want to use a
> ## GPos object to represent the coverage of sequencing reads along a
> ## genome.
>
> ## Example 5:
> library(pasillaBamSubset)
> library(Rsamtools) # for the BamFile() constructor function
> bamfile1 <- BamFile(untreated1_chr4())
> bamfile2 <- BamFile(untreated3_chr4())
> gpos5 <- GPos(seqinfo(bamfile1))
> library(GenomicAlignments) # for "coverage" method for BamFile objects
Loading required package: SummarizedExperiment
> cov1 <- unlist(coverage(bamfile1), use.names=FALSE)
> cov2 <- unlist(coverage(bamfile2), use.names=FALSE)
> mcols(gpos5) <- DataFrame(cov1, cov2)
> gpos5
GPos object with 120748101 positions and 2 metadata columns:
seqnames pos strand | cov1 cov2
<Rle> <integer> <Rle> | <Rle> <Rle>
[1] chr2L 1 * | 0 0
[2] chr2L 2 * | 0 0
[3] chr2L 3 * | 0 0
[4] chr2L 4 * | 0 0
[5] chr2L 5 * | 0 0
... ... ... ... . ... ...
[120748097] chrYHet 347034 * | 0 0
[120748098] chrYHet 347035 * | 0 0
[120748099] chrYHet 347036 * | 0 0
[120748100] chrYHet 347037 * | 0 0
[120748101] chrYHet 347038 * | 0 0
-------
seqinfo: 8 sequences from an unspecified genome
>
> object.size(gpos5) # lightweight
1828448 bytes
>
> ## Keep only the positions where coverage is at least 10 in one of the
> ## 2 samples:
> gpos5[mcols(gpos5)$cov1 >= 10 | mcols(gpos5)$cov2 >= 10]
GPos object with 214773 positions and 2 metadata columns:
seqnames pos strand | cov1 cov2
<Rle> <integer> <Rle> | <Rle> <Rle>
[1] chr4 4936 * | 11 6
[2] chr4 4937 * | 12 6
[3] chr4 4938 * | 12 6
[4] chr4 4939 * | 13 6
[5] chr4 4940 * | 13 6
... ... ... ... . ... ...
[214769] chr4 1348339 * | 10 14
[214770] chr4 1348340 * | 6 12
[214771] chr4 1348341 * | 6 10
[214772] chr4 1348342 * | 6 10
[214773] chr4 1348343 * | 0 10
-------
seqinfo: 8 sequences from an unspecified genome
>
> ## ---------------------------------------------------------------------
> ## USING A GPos OBJECT IN A SummarizedExperiment OBJECT
> ## ---------------------------------------------------------------------
> ## Because the GPos class extends the GenomicRanges virtual class, a GPos
> ## object can be used as the rowRanges component of a SummarizedExperiment
> ## object.
>
> ## As a 1st example, we show how the counts for samples sA and sB in
> ## 'gpos3' can be stored in a SummarizedExperiment object where the rows
> ## correspond to unique genomic positions and the columns to samples:
> library(SummarizedExperiment)
> counts <- cbind(sA=sA_counts, sB=sB_counts)
> mcols(gpos3A) <- NULL
> rse3 <- SummarizedExperiment(list(counts=counts), rowRanges=gpos3A)
> rse3
class: RangedSummarizedExperiment
dim: 1996 2
metadata(0):
assays(1): counts
rownames: NULL
rowData names(0):
colnames(2): sA sB
colData names(0):
> rowRanges(rse3)
GPos object with 1996 positions and 0 metadata columns:
seqnames pos strand
<Rle> <integer> <Rle>
[1] chrI 1 *
[2] chrI 2 *
[3] chrI 3 *
[4] chrI 4 *
[5] chrI 5 *
... ... ... ...
[1992] chrI 1996 *
[1993] chrI 1997 *
[1994] chrI 1998 *
[1995] chrI 1999 *
[1996] chrI 2000 *
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> head(assay(rse3))
sA sB
[1,] 2 8
[2,] 1 6
[3,] 5 7
[4,] 7 3
[5,] 7 8
[6,] 2 10
>
> ## Finally we show how the coverage data from Example 5 can be easily
> ## stored in a lightweight SummarizedExperiment object:
> cov <- mcols(gpos5)
> mcols(gpos5) <- NULL
> rse5 <- SummarizedExperiment(list(cov=cov), rowRanges=gpos5)
> rse5
class: RangedSummarizedExperiment
dim: 120748101 2
metadata(0):
assays(1): cov
rownames: NULL
rowData names(0):
colnames(2): cov1 cov2
colData names(0):
> rowRanges(rse5)
GPos object with 120748101 positions and 0 metadata columns:
seqnames pos strand
<Rle> <integer> <Rle>
[1] chr2L 1 *
[2] chr2L 2 *
[3] chr2L 3 *
[4] chr2L 4 *
[5] chr2L 5 *
... ... ... ...
[120748097] chrYHet 347034 *
[120748098] chrYHet 347035 *
[120748099] chrYHet 347036 *
[120748100] chrYHet 347037 *
[120748101] chrYHet 347038 *
-------
seqinfo: 8 sequences from an unspecified genome
> assay(rse5)
DataFrame with 120748101 rows and 2 columns
cov1 cov2
<Rle> <Rle>
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
... ... ...
120748097 0 0
120748098 0 0
120748099 0 0
120748100 0 0
120748101 0 0
>
> ## Keep only the positions where coverage is at least 10 in one of the
> ## 2 samples:
> rse5[assay(rse5)$cov1 >= 10 | assay(rse5)$cov2 >= 10]
class: RangedSummarizedExperiment
dim: 214773 2
metadata(0):
assays(1): cov
rownames: NULL
rowData names(0):
colnames(2): cov1 cov2
colData names(0):
>
>
>
>
>
> dev.off()
null device
1
>