A class for representing reads from next-generation sequencing
experiments that have been aligned to genomic intervals.
Objects from the Class
Objects can be created either by:
calls of the form
new("AlignedGenomeIntervals", .Data, closed, ...).
using the auxiliary function AlignedGenomeIntervals and
supplying separate vectors of same length which hold the
required information: AlignedGenomeIntervals(start, end, chromosome, strand, reads,
matches, sequence)
If arguments reads or matches are not specified, they
are assumed to be '1' for all intervals.
or, probably the most common way, by coercing from objects of
class AlignedRead.
Slots
.Data:
two-column integer matrix, holding the
start and end coordinates of the intervals on the chromosomes
sequence:
character; sequence of the read aligned to
the interval
reads:
integer; total number of reads that were
aligned to this interval
matches:
integer; the total number of genomic
intervals that reads which were aligned to this interval were
aligned to. A value of '1' thus means that this read sequence
matches uniquely to this one genome interval only
organism:
string; an identifier for the genome of
which organism the intervals are related to. Functions making use
of this slot require a specific annotation package
org.<organism>.eg.db. For example if organism is
'Hs', the annotation package 'org.Hs.eg.db' is utilised by these
functions. The annotation packages can be obtained from the
Bioconductor repositories.
annotation:
data.frame; see class
genome_intervals for details
closed:
matrix; see class
genome_intervals for details
type:
character; see class
genome_intervals for details
score:
numeric; optional score for each aligned genome
interval
id:
character; optional identifier for each aligned
genome interval
chrlengths:
integer; optional named integer vector of
chromosome lengths for the respective genome; if present it is
used in place of the chromosome lengths retrieved from the
annotation package (see slot organism)
Extends
Class Genome_intervals-class, directly.
Class Intervals_full, by class
"Genome_intervals", distance 2.
Methods
coerce
Coercion method from objects of class
AlignedRead, which is defined in package ShortRead,
to objects of class AlignedGenomeIntervals
coerce
Coercion method from objects of class
AlignedGenomeIntervals to objects of class
RangedData, which is defined in package IRanges
coverage
signature("AlignedGenomeIntervals"): computes
the read coverage over all chromosomes. If the organism of
the object is set correctly, the chromosome lengths are retrieved
from the appropriate annotation package, otherwise the maximum
interval end is taken to be the absolute length of that chromosome
(strand).
The result of this method is a list and the individual list
elements are of class Rle, a class for encoding long
repetitive vectors that is defined in package IRanges.
The additional argument byStrand governs whether
the coverage is computed separately for each strand. If
byStrand=FALSE (default) only one result is returned per
chromosome. If byStrand=TRUE, there result is
two separate Rle objects per chromosome with the strand
appended to the chromosome name.
By now, the coverage method for
AlignedGenomeIntervals makes use of the method for
RangedData objects from package IRanges (thanks to a
suggestion from P. Aboyoun).
detail
signature("AlignedGenomeIntervals"): a more
detailed output of all the intervals than provided by show;
only advisable for objects containing few intervals
extend
signature("AlignedGenomeIntervals") with
additional arguments fiveprime=0L and
threeprime=0L. These must be integer numbers and greater
than or equal to 0. They specify how much is subtracted from the
left border of the interval and added to the right side. Which end
is 5' and which one is 3' are determined from the strand
information of the object.
Lastly, if the object has an organism annotation, it is
checked that the right ends of the intervals do not exceed the
respective chromosome lengths.
export
export the aligned intervals as tab-delimited text
files which can be uploaded to the UCSC genome
browser as ‘custom tracks’.
Currently, there are methods for exporting the data
into ‘bed’ format and ‘bedGraph’ format,
either writing the intervals from both strands into one file or
into two separate files (formats ‘bedStrand’ and
‘bedGraphStrand’, respectively).
Details about these track formats can be found
at the UCSC genome browser web pages.
The additional argument writeHeader can be set to
FALSE to suppress writing of the track definition header
line to the file.
For Genome_intervals objects, only ‘bed’ format is
supported at the moment and does not need to be specified.
hist
signature("AlignedGenomeIntervals"): creates
a histogram of the lengths of the reads aligned to the intervals
organism
Get or set the organism that the genome intervals in
the object correspond to. Should be a predefined code, such as
'Mm' for mouse and 'Hs' for human. The reason for this code, that,
if the organism is set, a corresponding annotation package that is
called org.<organism>.eg.db is used, for example for
obtaining the chromosome lengths to be used in methods such as
coverage. These annotation packages can be obtained from
the Bioconductor repository.
plot
visualisation method; a second argument of class
Genome_intervals_stranded can be provided for additional
annotation to the plot. Please see below and in the vignette for
examples. Refer to the documentation of plotAligned
for more details on the plotting function.
reduce
collapse/reduce aligned genome intervals by combining
intervals which are completely included in each other, combining
overlapping intervals AND combining immediately adjacent
intervals (if method="standard").
Intervals are only combined if they are on the same
chromosome, the same strand AND have the same match specificity
of the aligned reads.
If you only want to combine intervals that have exactly the same
start and stop position (but may have reads of slightly different
sequence aligned to them), then use the argument
method="exact".
If you only want to combine intervals that have exactly the same
5' or 3' end (but may differ in the other end and in the aligned
sequence), then use the argument
method="same5" (same 5' end) or
method="same3" (same 3' end).
Finally, it's possible to only collapse/reduce aligned genome
intervals that overlap each other by at least a certain fraction
using the argument min.frac. min.frac is a number
between 0.0 and 1.0. For example, if you call reduce with
argument min.frac=0.4, only intervals that overlap
each other by at least 40 percent are collapsed/merged.
sample
draw a random sample of n (Argument
size) of the aligned reads (without or with replacement)
and returns the AlignedGenomeIntervals object defined by
these aligned reads.
score
access or set a custom score for the object
sort
sorts the intervals by chromosome name, start and end
coordinate in increasing order (unless decreasing=TRUE is
specified) and returns the sorted object
subset
take a subset of reads, matrix-like subsetting via
'[' can also be used
############# toy example:
A <- new("AlignedGenomeIntervals",
.Data=cbind(c(1,3,4,5,8,10), c(5,5,6,8,9,11)),
annotation=data.frame(
seq_name=factor(rep(c("chr1","chr2","chr3"), each=2)),
strand=factor(c("-","-","+","+","+","+") ,levels=c("-","+")),
inter_base=rep(FALSE, 6)),
reads=rep(3L, 6), matches=rep(1L,6),
sequence=c("ACATT","ACA","CGT","GTAA","AG","CT"))
show(A)
detail(A)
## alternative initiation of this object:
A <- AlignedGenomeIntervals(
start=c(1,3,4,5,8,10), end=c(5,5,6,8,9,11),
chromosome=rep(c("chr2","chrX","chr1"), each=2),
strand=c("-","-","+","+","+","+"),
sequence=c("ACATT","ACA","CGT","GGAA","AG","CT"),
reads=c(1L, 5L, 2L, 7L, 3L, 3L))
detail(A)
## custom identifiers can be assigned to the intervals
id(A) <- paste("gi", 1:6, sep="")
## subsetting and combining
detail(A[c(1:4)])
detail(c(A[1], A[4]))
## sorting: always useful
A <- sort(A)
detail(A)
## the 'reduce' method provides a cleaned-up, compact set
detail(reduce(A))
## with arguments specifying additional conditions for merging
detail(reduce(A, min.frac=0.8))
## 'sample' to draw a sample subset of reads and their intervals
detail(sample(A, 10))
## biological example
exDir <- system.file("extdata", package="girafe")
exA <- readAligned(dirPath=exDir, type="Bowtie",
pattern="aravinSRNA_23_no_adapter_excerpt_mm9_unmasked.bwtmap")
exAI <- as(exA, "AlignedGenomeIntervals")
organism(exAI) <- "Mm"
show(exAI)
## which chromosomes are the intervals on?
table(chromosome(exAI))
## subset
exAI[is.element(chromosome(exAI), c("chr1","chr2"))]
## compute coverage per chromosome:
coverage(exAI[is.element(chromosome(exAI), c("chr1","chr2"))])
### plotting:
load(file.path(exDir, "mgi_gi.RData"))
if (interactive())
plot(exAI, mgi.gi, chr="chrX", start=50400000, end=50410000)
### overlap with annotated genome elements:
exOv <- interval_overlap(exAI, mgi.gi)
## how many elements do read match positions generally overlap:
table(listLen(exOv))
## what are the 13 elements overlapped by a single match position:
mgi.gi[exOv[[which.max(listLen(exOv))]]]
## what kinds of elements are overlapped
(tabOv <- table(as.character(mgi.gi$type)[unlist(exOv)]))
### display those classes:
my.cols <- rainbow(length(tabOv))
if (interactive())
pie(tabOv, col=my.cols, radius=0.85)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(girafe)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: intervals
Attaching package: 'intervals'
The following object is masked from 'package:Biostrings':
type
The following object is masked from 'package:GenomicRanges':
reduce
The following object is masked from 'package:IRanges':
reduce
The following object is masked from 'package:S4Vectors':
expand
Loading required package: ShortRead
Loading required package: BiocParallel
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: genomeIntervals
Loading required package: grid
No methods found in "IRanges" for requests: sort
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/girafe/AlignedGenomeIntervals-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: AlignedGenomeIntervals-class
> ### Title: Class 'AlignedGenomeIntervals'
> ### Aliases: AlignedGenomeIntervals-class AlignedGenomeIntervals
> ### [,AlignedGenomeIntervals,ANY,ANY-method c.AlignedGenomeIntervals
> ### c,AlignedGenomeIntervals-method chrlengths
> ### chrlengths,AlignedGenomeIntervals-method chrlengths<-
> ### chrlengths<-,AlignedGenomeIntervals,numeric-method
> ### clusters,AlignedGenomeIntervals-method
> ### clusters,Genome_intervals-method
> ### coerce,AlignedRead,AlignedGenomeIntervals-method
> ### coerce,AlignedGenomeIntervals,RangedData-method
> ### coverage,AlignedGenomeIntervals-method
> ### detail,AlignedGenomeIntervals-method export
> ### export,AlignedGenomeIntervals,character,character-method
> ### export,Genome_intervals,character,ANY-method extend
> ### extend,AlignedGenomeIntervals-method
> ### extend,Genome_intervals_stranded-method
> ### extend,Genome_intervals-method id,AlignedGenomeIntervals-method id<-
> ### id<-,AlignedGenomeIntervals,character-method
> ### hist,AlignedGenomeIntervals-method
> ### interval_included,AlignedGenomeIntervals,Genome_intervals_stranded-method
> ### interval_included,Genome_intervals_stranded,AlignedGenomeIntervals-method
> ### interval_included,AlignedGenomeIntervals,AlignedGenomeIntervals-method
> ### interval_overlap,AlignedGenomeIntervals,Genome_intervals-method
> ### interval_overlap,Genome_intervals,AlignedGenomeIntervals-method
> ### interval_overlap,AlignedGenomeIntervals,Genome_intervals_stranded-method
> ### interval_overlap,Genome_intervals_stranded,AlignedGenomeIntervals-method
> ### interval_overlap,AlignedGenomeIntervals,AlignedGenomeIntervals-method
> ### matches matches,AlignedGenomeIntervals-method matches<-
> ### matches<-,AlignedGenomeIntervals,integer-method
> ### nchar,AlignedGenomeIntervals-method organism
> ### organism,AlignedGenomeIntervals-method organism<-
> ### organism<-,AlignedGenomeIntervals-method
> ### organism<-,AlignedGenomeIntervals,character-method
> ### plot,AlignedGenomeIntervals-method
> ### plot,AlignedGenomeIntervals,ANY-method
> ### plot,AlignedGenomeIntervals,missing-method
> ### plot,AlignedGenomeIntervals,Genome_intervals_stranded-method reads
> ### reads,AlignedGenomeIntervals-method reads<-
> ### reads<-,AlignedGenomeIntervals,character-method
> ### reduce,AlignedGenomeIntervals-method reduce,Genome_intervals-method
> ### sample,AlignedGenomeIntervals-method
> ### score,AlignedGenomeIntervals-method
> ### score<-,AlignedGenomeIntervals,numeric-method
> ### score<-,AlignedGenomeIntervals-method score<- seqnames
> ### seqnames,AlignedGenomeIntervals-method
> ### show,AlignedGenomeIntervals-method sort,AlignedGenomeIntervals-method
> ### strand,AlignedGenomeIntervals-method
> ### strand<-,AlignedGenomeIntervals,vector-method
> ### strand<-,AlignedGenomeIntervals,factor-method subset
> ### subset,AlignedGenomeIntervals-method summary
> ### summary,AlignedGenomeIntervals-method
> ### width,AlignedGenomeIntervals-method
> ### chromosome,AlignedGenomeIntervals-method
> ### chromosome,Genome_intervals-method
> ### Keywords: classes
>
> ### ** Examples
>
> ############# toy example:
> A <- new("AlignedGenomeIntervals",
+ .Data=cbind(c(1,3,4,5,8,10), c(5,5,6,8,9,11)),
+ annotation=data.frame(
+ seq_name=factor(rep(c("chr1","chr2","chr3"), each=2)),
+ strand=factor(c("-","-","+","+","+","+") ,levels=c("-","+")),
+ inter_base=rep(FALSE, 6)),
+ reads=rep(3L, 6), matches=rep(1L,6),
+ sequence=c("ACATT","ACA","CGT","GTAA","AG","CT"))
>
> show(A)
6 genome intervals with 18 aligned reads on 3 chromosome(s).
> detail(A)
start end seq_name strand reads matches sequence
1 1 5 chr1 - 3 1 ACATT
2 3 5 chr1 - 3 1 ACA
3 4 6 chr2 + 3 1 CGT
4 5 8 chr2 + 3 1 GTAA
5 8 9 chr3 + 3 1 AG
6 10 11 chr3 + 3 1 CT
>
> ## alternative initiation of this object:
> A <- AlignedGenomeIntervals(
+ start=c(1,3,4,5,8,10), end=c(5,5,6,8,9,11),
+ chromosome=rep(c("chr2","chrX","chr1"), each=2),
+ strand=c("-","-","+","+","+","+"),
+ sequence=c("ACATT","ACA","CGT","GGAA","AG","CT"),
+ reads=c(1L, 5L, 2L, 7L, 3L, 3L))
> detail(A)
start end seq_name strand reads matches sequence
1 1 5 chr2 - 1 1 ACATT
2 3 5 chr2 - 5 1 ACA
3 4 6 chrX + 2 1 CGT
4 5 8 chrX + 7 1 GGAA
5 8 9 chr1 + 3 1 AG
6 10 11 chr1 + 3 1 CT
>
> ## custom identifiers can be assigned to the intervals
> id(A) <- paste("gi", 1:6, sep="")
>
> ## subsetting and combining
> detail(A[c(1:4)])
start end seq_name strand reads matches sequence id
1 1 5 chr2 - 1 1 ACATT gi1
2 3 5 chr2 - 5 1 ACA gi2
3 4 6 chrX + 2 1 CGT gi3
4 5 8 chrX + 7 1 GGAA gi4
> detail(c(A[1], A[4]))
[[1]]
1 genome intervals with 7 aligned reads on 3 chromosome(s).
>
> ## sorting: always useful
> A <- sort(A)
> detail(A)
start end seq_name strand reads matches sequence id
1 8 9 chr1 + 3 1 AG gi5
2 10 11 chr1 + 3 1 CT gi6
3 1 5 chr2 - 1 1 ACATT gi1
4 3 5 chr2 - 5 1 ACA gi2
5 4 6 chrX + 2 1 CGT gi3
6 5 8 chrX + 7 1 GGAA gi4
>
> ## the 'reduce' method provides a cleaned-up, compact set
> detail(reduce(A))
start end seq_name strand reads matches sequence id
1 8 11 chr1 + 6 1 AGCT gi5,gi6
2 1 5 chr2 - 6 1 ACATT gi1,gi2
3 4 8 chrX + 9 1 CGGAA gi3,gi4
> ## with arguments specifying additional conditions for merging
> detail(reduce(A, min.frac=0.8))
start end seq_name strand reads matches sequence id
1 8 9 chr1 + 3 1 AG gi5
2 10 11 chr1 + 3 1 CT gi6
3 1 5 chr2 - 6 1 ACATT gi1,gi2
4 4 6 chrX + 2 1 CGT gi3
5 5 8 chrX + 7 1 GGAA gi4
>
> ## 'sample' to draw a sample subset of reads and their intervals
> detail(sample(A, 10))
start end seq_name strand reads matches sequence id
1 8 9 chr1 + 2 1 AG gi5
2 10 11 chr1 + 2 1 CT gi6
3 3 5 chr2 - 4 1 ACA gi2
4 5 8 chrX + 2 1 GGAA gi4
>
> ## biological example
> exDir <- system.file("extdata", package="girafe")
> exA <- readAligned(dirPath=exDir, type="Bowtie",
+ pattern="aravinSRNA_23_no_adapter_excerpt_mm9_unmasked.bwtmap")
> exAI <- as(exA, "AlignedGenomeIntervals")
> organism(exAI) <- "Mm"
Loading required package: org.Mm.eg.db
Loading required package: AnnotationDbi
> show(exAI)
1,286 genome intervals with 1,689 aligned reads on 22 chromosome(s).
Organism: Mm
> ## which chromosomes are the intervals on?
> table(chromosome(exAI))
chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr2 chr3
112 50 60 53 65 74 59 48 47 43 19 87 62
chr4 chr5 chr6 chr7 chr8 chr9 chrMT chrX chrY
57 52 82 51 57 69 5 132 2
>
> ## subset
> exAI[is.element(chromosome(exAI), c("chr1","chr2"))]
199 genome intervals with 245 aligned reads on 22 chromosome(s).
Organism: Mm
>
> ## compute coverage per chromosome:
> coverage(exAI[is.element(chromosome(exAI), c("chr1","chr2"))])
RleList of length 2
$chr1
integer-Rle of length 195471971 with 197 runs
Lengths: 5632955 23 481869 23 ... 4394657 23 12859
Values : 0 1 0 1 ... 0 1 0
$chr2
integer-Rle of length 182113224 with 143 runs
Lengths: 3772451 23 982533 23 ... 473 23 8020148
Values : 0 2 0 1 ... 0 1 0
>
> ### plotting:
> load(file.path(exDir, "mgi_gi.RData"))
> # if (interactive())
> plot(exAI, mgi.gi, chr="chrX", start=50400000, end=50410000)
>
> ### overlap with annotated genome elements:
> exOv <- interval_overlap(exAI, mgi.gi)
> ## how many elements do read match positions generally overlap:
> table(listLen(exOv))
0 1 2 12
815 340 130 1
> ## what are the 13 elements overlapped by a single match position:
> mgi.gi[exOv[[which.max(listLen(exOv))]]]
Object of class Genome_intervals_stranded
12 base intervals and 0 inter-base intervals(*):
chr18 + [37089939, 37347311]
chr18 + [37098972, 37347311]
chr18 + [37105861, 37347311]
chr18 + [37112343, 37347311]
... 4 other intervals...
chr18 + [37157534, 37347311]
chr18 + [37164974, 37347311]
chr18 + [37170512, 37347311]
chr18 + [37179884, 37347311]
annotation:
seq_name inter_base source type
30726 chr18 FALSE MGI gene
30727 chr18 FALSE MGI gene
30728 chr18 FALSE MGI gene
30729 chr18 FALSE MGI gene
gffAttributes ID strand
30726 ACC="MGI:2150982"; Name="protocadherin alpha 1" Pcdha1 +
30727 ACC="MGI:2681880"; Name="protocadherin alpha 2" Pcdha2 +
30728 ACC="MGI:2447313"; Name="protocadherin alpha 3" Pcdha3 +
30729 ACC="MGI:1298406"; Name="protocadherin alpha 4" Pcdha4 +
... 4 other intervals...
seq_name inter_base source type
30734 chr18 FALSE MGI gene
30735 chr18 FALSE MGI gene
30736 chr18 FALSE MGI gene
30738 chr18 FALSE MGI gene
gffAttributes ID strand
30734 ACC="MGI:2447322"; Name="protocadherin alpha 9" Pcdha9 +
30735 ACC="MGI:1298408"; Name="protocadherin alpha 10" Pcdha10 +
30736 ACC="MGI:1298372"; Name="protocadherin alpha 11" Pcdha11 +
30738 ACC="MGI:1298370"; Name="protocadherin alpha 12" Pcdha12 +
> ## what kinds of elements are overlapped
> (tabOv <- table(as.character(mgi.gi$type)[unlist(exOv)]))
gene lincRNA miRNA ncRNA pseudogene rRNA snoRNA
238 15 297 19 28 10 1
tRNA
4
> ### display those classes:
> my.cols <- rainbow(length(tabOv))
> # if (interactive())
> pie(tabOv, col=my.cols, radius=0.85)
>
>
>
>
>
> dev.off()
null device
1
>