A Seqinfo object is a table-like object that contains basic information
about a set of genomic sequences. The table has 1 row per sequence and
1 column per sequence attribute. Currently the only attributes are the
length, circularity flag, and genome provenance (e.g. hg19) of the
sequence, but more attributes might be added in the future as the need
arises.
Details
Typically Seqinfo objects are not used directly but are part of
higher level objects. Those higher level objects will generally
provide a seqinfo accessor for getting/setting their
Seqinfo component.
Constructor
Seqinfo(seqnames, seqlengths=NA, isCircular=NA, genome=NA):
Creates a Seqinfo object.
Accessor methods
In the code snippets below, x is a Seqinfo object.
length(x):
Return the number of sequences in x.
seqnames(x), seqnames(x) <- value:
Get/set the names of the sequences in x.
Those names must be non-NA, non-empty and unique.
They are also called the sequence levels or the keys
of the Seqinfo object.
Note that, in general, the end-user should not try to alter the
sequence levels with seqnames(x) <- value. The recommended way
to do this is with seqlevels(x) <- value as described below.
names(x), names(x) <- value:
Same as seqnames(x) and seqnames(x) <- value.
seqlevels(x):
Same as seqnames(x).
seqlevels(x) <- value:
Can be used to rename, drop, add and/or reorder the sequence levels.
value must be either a named or unnamed character vector.
When value has names, the names only serve the purpose of
mapping the new sequence levels to the old ones.
Otherwise (i.e. when value is unnamed) this mapping is
implicitly inferred from the following rules:
(1) If the number of new and old levels are the same, and if the
positional mapping between the new and old levels shows that
some or all of the levels are being renamed, and if the levels
that are being renamed are renamed with levels that didn't exist
before (i.e. are not present in the old levels), then
seqlevels(x) <- value will just rename the sequence levels.
Note that in that case the result is the same as with
seqnames(x) <- value but it's still recommended to use
seqlevels(x) <- value as it is safer.
(2) Otherwise (i.e. if the conditions for (1) are not satisfied)
seqlevels(x) <- value will consider that the sequence
levels are not being renamed and will just perform
x <- x[value].
See below for some examples.
seqlengths(x), seqlengths(x) <- value:
Get/set the length for each sequence in x.
isCircular(x), isCircular(x) <- value:
Get/set the circularity flag for each sequence in x.
genome(x), genome(x) <- value:
Get/set the genome identifier or assembly name for each sequence
in x.
Subsetting
In the code snippets below, x is a Seqinfo object.
x[i]:
A Seqinfo object can be subsetted only by name i.e. i
must be a character vector.
This is a convenient way to drop/add/reorder the rows (aka the
sequence levels) of a Seqinfo object.
See below for some examples.
Coercion
In the code snippets below, x is a Seqinfo object.
as.data.frame(x):
Turns x into a data frame.
Combining Seqinfo objects
There are no c or rbind method for Seqinfo objects.
Both would be expected to just append the rows in y to the rows
in x resulting in an object of length length(x) + length(y).
But that would tend to break the constraint that the seqnames of a Seqinfo
object must be unique keys.
So instead, a merge method is provided.
In the code snippet below, x and y are Seqinfo objects.
merge(x, y):
Merge x and y into a single Seqinfo object where the
keys (aka the seqnames) are union(seqnames(x), seqnames(y)).
If a row in y has the same key as a row in x, and if
the 2 rows contain compatible information (NA values are compatible
with anything), then they are merged into a single row in the result.
If they cannot be merged (because they contain different seqlengths,
and/or circularity flags, and/or genome identifiers), then an error
is raised.
In addition to check for incompatible sequence information,
merge(x, y) also compares seqnames(x) with
seqnames(y) and issues a warning if each of them has names not
in the other. The purpose of these checks is to try to detect situations
where the user might be combining or comparing objects based on
different reference genomes.
intersect(x, y): Finds the intersection between
two Seqinfo objects by merging them and subsetting for the
intersection of their sequence names. This makes it easy to avoid
warnings about the objects not being subsets of each other during
overlap operations.
Author(s)
H. Pages
See Also
seqinfo
The fetchExtendedChromInfoFromUCSC utility
function that is used behind the scene to make a Seqinfo
object for a supported genome (see examples below).
Examples
## ---------------------------------------------------------------------
## A. MAKING A Seqinfo OBJECT FOR A SUPPORTED GENOME
## ---------------------------------------------------------------------
if (interactive()) {
## This uses fetchExtendedChromInfoFromUCSC() behind the scene and
## thus requires internet access. See ?fetchExtendedChromInfoFromUCSC
## for the list of UCSC genomes that are currently supported.
Seqinfo(genome="hg38")
Seqinfo(genome="bosTau8")
Seqinfo(genome="canFam3")
Seqinfo(genome="musFur1")
Seqinfo(genome="mm10")
Seqinfo(genome="rn6")
Seqinfo(genome="galGal4")
Seqinfo(genome="dm6")
Seqinfo(genome="sacCer3")
}
## ---------------------------------------------------------------------
## B. BASIC MANIPULATION OF A Seqinfo OBJECT
## ---------------------------------------------------------------------
## Note that all the arguments (except 'genome') must have the
## same length. 'genome' can be of length 1, whatever the lengths
## of the other arguments are.
x <- Seqinfo(seqnames=c("chr1", "chr2", "chr3", "chrM"),
seqlengths=c(100, 200, NA, 15),
isCircular=c(NA, FALSE, FALSE, TRUE),
genome="toy")
x
## Accessors:
length(x)
seqnames(x)
names(x)
seqlevels(x)
seqlengths(x)
isCircular(x)
genome(x)
## Get a compact summary:
summary(x)
## Subset by names:
x[c("chrY", "chr3", "chr1")]
## Rename, drop, add and/or reorder the sequence levels:
xx <- x
seqlevels(xx) <- sub("chr", "ch", seqlevels(xx)) # rename
xx
seqlevels(xx) <- rev(seqlevels(xx)) # reorder
xx
seqlevels(xx) <- c("ch1", "ch2", "chY") # drop/add/reorder
xx
seqlevels(xx) <- c(chY="Y", ch1="1", "22") # rename/reorder/drop/add
xx
## ---------------------------------------------------------------------
## C. MERGING 2 Seqinfo OBJECTS
## ---------------------------------------------------------------------
y <- Seqinfo(seqnames=c("chr3", "chr4", "chrM"),
seqlengths=c(300, NA, 15))
y
## This issues a warning:
merge(x, y) # rows for chr3 and chrM are merged
## To get rid of the above warning, either use suppressWarnings() or
## set the genome on 'y':
suppressWarnings(merge(x, y))
genome(y) <- genome(x)
merge(x, y)
## Note that, strictly speaking, merging 2 Seqinfo objects is not
## a commutative operation, i.e., in general 'z1 <- merge(x, y)'
## is not identical to 'z2 <- merge(y, x)'. However 'z1' and 'z2'
## are guaranteed to contain the same information (i.e. the same
## rows, but typically not in the same order):
merge(y, x)
## This contradicts what 'x' says about circularity of chr3 and chrM:
isCircular(y)[c("chr3", "chrM")] <- c(TRUE, FALSE)
y
if (interactive()) {
merge(x, y) # raises an error
}
## Sanity checks:
stopifnot(identical(x, merge(x, Seqinfo())))
stopifnot(identical(x, merge(Seqinfo(), x)))
stopifnot(identical(x, merge(x, x)))
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(GenomeInfoDb)
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomeInfoDb/Seqinfo-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: Seqinfo-class
> ### Title: Seqinfo objects
> ### Aliases: class:Seqinfo Seqinfo-class Seqinfo Seqinfo
> ### length,Seqinfo-method seqnames,Seqinfo-method
> ### seqnames<-,Seqinfo-method names,Seqinfo-method names<-,Seqinfo-method
> ### seqlevels,Seqinfo-method seqlevels<-,Seqinfo-method
> ### seqlengths,Seqinfo-method seqlengths<-,Seqinfo-method
> ### isCircular,Seqinfo-method isCircular<-,Seqinfo-method
> ### genome,Seqinfo-method genome<-,Seqinfo-method [,Seqinfo-method
> ### as.data.frame,Seqinfo-method coerce,data.frame,Seqinfo-method
> ### coerce,DataFrame,Seqinfo-method summary.Seqinfo
> ### summary,Seqinfo-method show,Seqinfo-method
> ### merge,Seqinfo,missing-method merge,missing,Seqinfo-method
> ### merge,Seqinfo,NULL-method merge,NULL,Seqinfo-method
> ### merge,Seqinfo,Seqinfo-method intersect,Seqinfo,Seqinfo-method
> ### Keywords: methods classes
>
> ### ** Examples
>
> ## ---------------------------------------------------------------------
> ## A. MAKING A Seqinfo OBJECT FOR A SUPPORTED GENOME
> ## ---------------------------------------------------------------------
>
> #if (interactive()) {
> ## This uses fetchExtendedChromInfoFromUCSC() behind the scene and
> ## thus requires internet access. See ?fetchExtendedChromInfoFromUCSC
> ## for the list of UCSC genomes that are currently supported.
> Seqinfo(genome="hg38")
Seqinfo object with 455 sequences (1 circular) from hg38 genome:
seqnames seqlengths isCircular genome
chr1 248956422 FALSE hg38
chr2 242193529 FALSE hg38
chr3 198295559 FALSE hg38
chr4 190214555 FALSE hg38
chr5 181538259 FALSE hg38
... ... ... ...
chrUn_KI270753v1 62944 FALSE hg38
chrUn_KI270754v1 40191 FALSE hg38
chrUn_KI270755v1 36723 FALSE hg38
chrUn_KI270756v1 79590 FALSE hg38
chrUn_KI270757v1 71251 FALSE hg38
> Seqinfo(genome="bosTau8")
Seqinfo object with 3179 sequences (1 circular) from bosTau8 genome:
seqnames seqlengths isCircular genome
chr1 158337067 FALSE bosTau8
chr2 137060424 FALSE bosTau8
chr3 121430405 FALSE bosTau8
chr4 120829699 FALSE bosTau8
chr5 121191424 FALSE bosTau8
... ... ... ...
chrUn_GJ060418v1 956 FALSE bosTau8
chrUn_GJ060419v1 1016 FALSE bosTau8
chrUn_GJ060420v1 934 FALSE bosTau8
chrUn_GJ060421v1 1015 FALSE bosTau8
chrUn_GJ060422v1 739 FALSE bosTau8
> Seqinfo(genome="canFam3")
Seqinfo object with 3268 sequences (1 circular) from canFam3 genome:
seqnames seqlengths isCircular genome
chr1 122678785 FALSE canFam3
chr2 85426708 FALSE canFam3
chr3 91889043 FALSE canFam3
chr4 88276631 FALSE canFam3
chr5 88915250 FALSE canFam3
... ... ... ...
chrUn_JH374189 8038 FALSE canFam3
chrUn_JH374190 5797 FALSE canFam3
chrUn_JH374191 6845 FALSE canFam3
chrUn_JH374192 7721 FALSE canFam3
chrUn_JH374193 5700 FALSE canFam3
> Seqinfo(genome="musFur1")
Seqinfo object with 7741 sequences from musFur1 genome:
seqnames seqlengths isCircular genome
AEYP01107703 64955 FALSE musFur1
AEYP01108159 42269 FALSE musFur1
AEYP01108459 26290 FALSE musFur1
AEYP01108526 24660 FALSE musFur1
AEYP01108555 23825 FALSE musFur1
... ... ... ...
GL898764 2765 FALSE musFur1
GL898765 3999 FALSE musFur1
GL898766 3993 FALSE musFur1
GL898767 2677 FALSE musFur1
GL898768 2649 FALSE musFur1
> Seqinfo(genome="mm10")
Seqinfo object with 66 sequences (1 circular) from mm10 genome:
seqnames seqlengths isCircular genome
chr1 195471971 FALSE mm10
chr2 182113224 FALSE mm10
chr3 160039680 FALSE mm10
chr4 156508116 FALSE mm10
chr5 151834684 FALSE mm10
... ... ... ...
chrUn_GL456392 23629 FALSE mm10
chrUn_GL456393 55711 FALSE mm10
chrUn_GL456394 24323 FALSE mm10
chrUn_GL456396 21240 FALSE mm10
chrUn_JH584304 114452 FALSE mm10
> Seqinfo(genome="rn6")
Seqinfo object with 953 sequences (1 circular) from rn6 genome:
seqnames seqlengths isCircular genome
chr1 282763074 FALSE rn6
chr2 266435125 FALSE rn6
chr3 177699992 FALSE rn6
chr4 184226339 FALSE rn6
chr5 173707219 FALSE rn6
... ... ... ...
chrUn_KL568514v1 11876 FALSE rn6
chrUn_KL568515v1 2232 FALSE rn6
chrUn_KL568516v1 5687 FALSE rn6
chrUn_KL568517v1 13491 FALSE rn6
chrUn_KL568518v1 5301 FALSE rn6
> Seqinfo(genome="galGal4")
Seqinfo object with 15932 sequences (1 circular) from galGal4 genome:
seqnames seqlengths isCircular genome
chr1 195276750 FALSE galGal4
chr2 148809762 FALSE galGal4
chr3 110447801 FALSE galGal4
chr4 90216835 FALSE galGal4
chr5 59580361 FALSE galGal4
... ... ... ...
chrUn_JH376409 7124 FALSE galGal4
chrUn_JH376410 91309 FALSE galGal4
chrUn_JH376411 51880 FALSE galGal4
chrUn_JH376412 256162 FALSE galGal4
chrUn_JH376413 7987 FALSE galGal4
> Seqinfo(genome="dm6")
Seqinfo object with 1870 sequences (1 circular) from dm6 genome:
seqnames seqlengths isCircular genome
chr2L 23513712 FALSE dm6
chr2R 25286936 FALSE dm6
chr3L 28110227 FALSE dm6
chr3R 32079331 FALSE dm6
chr4 1348131 FALSE dm6
... ... ... ...
chrUn_DS485998v1 1003 FALSE dm6
chrUn_DS486002v1 1001 FALSE dm6
chrUn_DS486004v1 1001 FALSE dm6
chrUn_DS486005v1 1001 FALSE dm6
chrUn_DS486008v1 1001 FALSE dm6
> Seqinfo(genome="sacCer3")
Seqinfo object with 17 sequences (1 circular) from sacCer3 genome:
seqnames seqlengths isCircular genome
chrI 230218 FALSE sacCer3
chrII 813184 FALSE sacCer3
chrIII 316620 FALSE sacCer3
chrIV 1531933 FALSE sacCer3
chrV 576874 FALSE sacCer3
... ... ... ...
chrXIII 924431 FALSE sacCer3
chrXIV 784333 FALSE sacCer3
chrXV 1091291 FALSE sacCer3
chrXVI 948066 FALSE sacCer3
chrM 85779 TRUE sacCer3
> #}
>
> ## ---------------------------------------------------------------------
> ## B. BASIC MANIPULATION OF A Seqinfo OBJECT
> ## ---------------------------------------------------------------------
>
> ## Note that all the arguments (except 'genome') must have the
> ## same length. 'genome' can be of length 1, whatever the lengths
> ## of the other arguments are.
> x <- Seqinfo(seqnames=c("chr1", "chr2", "chr3", "chrM"),
+ seqlengths=c(100, 200, NA, 15),
+ isCircular=c(NA, FALSE, FALSE, TRUE),
+ genome="toy")
> x
Seqinfo object with 4 sequences (1 circular) from toy genome:
seqnames seqlengths isCircular genome
chr1 100 NA toy
chr2 200 FALSE toy
chr3 NA FALSE toy
chrM 15 TRUE toy
>
> ## Accessors:
> length(x)
[1] 4
> seqnames(x)
[1] "chr1" "chr2" "chr3" "chrM"
> names(x)
[1] "chr1" "chr2" "chr3" "chrM"
> seqlevels(x)
[1] "chr1" "chr2" "chr3" "chrM"
> seqlengths(x)
chr1 chr2 chr3 chrM
100 200 NA 15
> isCircular(x)
chr1 chr2 chr3 chrM
NA FALSE FALSE TRUE
> genome(x)
chr1 chr2 chr3 chrM
"toy" "toy" "toy" "toy"
>
> ## Get a compact summary:
> summary(x)
[1] "4 sequences (1 circular) from toy genome"
>
> ## Subset by names:
> x[c("chrY", "chr3", "chr1")]
Seqinfo object with 3 sequences from 2 genomes (NA, toy):
seqnames seqlengths isCircular genome
chrY NA NA <NA>
chr3 NA FALSE toy
chr1 100 NA toy
>
> ## Rename, drop, add and/or reorder the sequence levels:
> xx <- x
> seqlevels(xx) <- sub("chr", "ch", seqlevels(xx)) # rename
> xx
Seqinfo object with 4 sequences (1 circular) from toy genome:
seqnames seqlengths isCircular genome
ch1 100 NA toy
ch2 200 FALSE toy
ch3 NA FALSE toy
chM 15 TRUE toy
> seqlevels(xx) <- rev(seqlevels(xx)) # reorder
> xx
Seqinfo object with 4 sequences (1 circular) from toy genome:
seqnames seqlengths isCircular genome
chM 15 TRUE toy
ch3 NA FALSE toy
ch2 200 FALSE toy
ch1 100 NA toy
> seqlevels(xx) <- c("ch1", "ch2", "chY") # drop/add/reorder
> xx
Seqinfo object with 3 sequences from 2 genomes (toy, NA):
seqnames seqlengths isCircular genome
ch1 100 NA toy
ch2 200 FALSE toy
chY NA NA <NA>
> seqlevels(xx) <- c(chY="Y", ch1="1", "22") # rename/reorder/drop/add
> xx
Seqinfo object with 3 sequences from 2 genomes (NA, toy):
seqnames seqlengths isCircular genome
Y NA NA <NA>
1 100 NA toy
22 NA NA <NA>
>
> ## ---------------------------------------------------------------------
> ## C. MERGING 2 Seqinfo OBJECTS
> ## ---------------------------------------------------------------------
>
> y <- Seqinfo(seqnames=c("chr3", "chr4", "chrM"),
+ seqlengths=c(300, NA, 15))
> y
Seqinfo object with 3 sequences from an unspecified genome:
seqnames seqlengths isCircular genome
chr3 300 NA <NA>
chr4 NA NA <NA>
chrM 15 NA <NA>
>
> ## This issues a warning:
> merge(x, y) # rows for chr3 and chrM are merged
Seqinfo object with 5 sequences (1 circular) from 2 genomes (toy, NA):
seqnames seqlengths isCircular genome
chr1 100 NA toy
chr2 200 FALSE toy
chr3 300 FALSE toy
chrM 15 TRUE toy
chr4 NA NA <NA>
Warning message:
In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chr1, chr2
- in 'y': chr4
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
>
> ## To get rid of the above warning, either use suppressWarnings() or
> ## set the genome on 'y':
> suppressWarnings(merge(x, y))
Seqinfo object with 5 sequences (1 circular) from 2 genomes (toy, NA):
seqnames seqlengths isCircular genome
chr1 100 NA toy
chr2 200 FALSE toy
chr3 300 FALSE toy
chrM 15 TRUE toy
chr4 NA NA <NA>
> genome(y) <- genome(x)
> merge(x, y)
Seqinfo object with 5 sequences (1 circular) from toy genome:
seqnames seqlengths isCircular genome
chr1 100 NA toy
chr2 200 FALSE toy
chr3 300 FALSE toy
chrM 15 TRUE toy
chr4 NA NA toy
>
> ## Note that, strictly speaking, merging 2 Seqinfo objects is not
> ## a commutative operation, i.e., in general 'z1 <- merge(x, y)'
> ## is not identical to 'z2 <- merge(y, x)'. However 'z1' and 'z2'
> ## are guaranteed to contain the same information (i.e. the same
> ## rows, but typically not in the same order):
> merge(y, x)
Seqinfo object with 5 sequences (1 circular) from toy genome:
seqnames seqlengths isCircular genome
chr3 300 FALSE toy
chr4 NA NA toy
chrM 15 TRUE toy
chr1 100 NA toy
chr2 200 FALSE toy
>
> ## This contradicts what 'x' says about circularity of chr3 and chrM:
> isCircular(y)[c("chr3", "chrM")] <- c(TRUE, FALSE)
> y
Seqinfo object with 3 sequences (1 circular) from toy genome:
seqnames seqlengths isCircular genome
chr3 300 TRUE toy
chr4 NA NA toy
chrM 15 FALSE toy
> #if (interactive()) {
> merge(x, y) # raises an error
Error in mergeNamedAtomicVectors(isCircular(x), isCircular(y), what = c("sequence", :
sequences chr3, chrM have incompatible circularity flags:
- in 'x': FALSE, TRUE
- in 'y': TRUE, FALSE
Calls: merge ... .Seqinfo.merge -> .Seqinfo.mergexy -> mergeNamedAtomicVectors
Execution halted