Last data update: 2014.03.03

R: Seqinfo objects
Seqinfo-classR Documentation

Seqinfo objects

Description

A Seqinfo object is a table-like object that contains basic information about a set of genomic sequences. The table has 1 row per sequence and 1 column per sequence attribute. Currently the only attributes are the length, circularity flag, and genome provenance (e.g. hg19) of the sequence, but more attributes might be added in the future as the need arises.

Details

Typically Seqinfo objects are not used directly but are part of higher level objects. Those higher level objects will generally provide a seqinfo accessor for getting/setting their Seqinfo component.

Constructor

Seqinfo(seqnames, seqlengths=NA, isCircular=NA, genome=NA): Creates a Seqinfo object.

Accessor methods

In the code snippets below, x is a Seqinfo object.

length(x): Return the number of sequences in x.

seqnames(x), seqnames(x) <- value: Get/set the names of the sequences in x. Those names must be non-NA, non-empty and unique. They are also called the sequence levels or the keys of the Seqinfo object.

Note that, in general, the end-user should not try to alter the sequence levels with seqnames(x) <- value. The recommended way to do this is with seqlevels(x) <- value as described below.

names(x), names(x) <- value: Same as seqnames(x) and seqnames(x) <- value.

seqlevels(x): Same as seqnames(x).

seqlevels(x) <- value: Can be used to rename, drop, add and/or reorder the sequence levels. value must be either a named or unnamed character vector. When value has names, the names only serve the purpose of mapping the new sequence levels to the old ones. Otherwise (i.e. when value is unnamed) this mapping is implicitly inferred from the following rules:

(1) If the number of new and old levels are the same, and if the positional mapping between the new and old levels shows that some or all of the levels are being renamed, and if the levels that are being renamed are renamed with levels that didn't exist before (i.e. are not present in the old levels), then seqlevels(x) <- value will just rename the sequence levels. Note that in that case the result is the same as with seqnames(x) <- value but it's still recommended to use seqlevels(x) <- value as it is safer.

(2) Otherwise (i.e. if the conditions for (1) are not satisfied) seqlevels(x) <- value will consider that the sequence levels are not being renamed and will just perform x <- x[value].

See below for some examples.

seqlengths(x), seqlengths(x) <- value: Get/set the length for each sequence in x.

isCircular(x), isCircular(x) <- value: Get/set the circularity flag for each sequence in x.

genome(x), genome(x) <- value: Get/set the genome identifier or assembly name for each sequence in x.

Subsetting

In the code snippets below, x is a Seqinfo object.

x[i]: A Seqinfo object can be subsetted only by name i.e. i must be a character vector. This is a convenient way to drop/add/reorder the rows (aka the sequence levels) of a Seqinfo object.

See below for some examples.

Coercion

In the code snippets below, x is a Seqinfo object.

as.data.frame(x): Turns x into a data frame.

Combining Seqinfo objects

There are no c or rbind method for Seqinfo objects. Both would be expected to just append the rows in y to the rows in x resulting in an object of length length(x) + length(y). But that would tend to break the constraint that the seqnames of a Seqinfo object must be unique keys.

So instead, a merge method is provided.

In the code snippet below, x and y are Seqinfo objects.

merge(x, y): Merge x and y into a single Seqinfo object where the keys (aka the seqnames) are union(seqnames(x), seqnames(y)). If a row in y has the same key as a row in x, and if the 2 rows contain compatible information (NA values are compatible with anything), then they are merged into a single row in the result. If they cannot be merged (because they contain different seqlengths, and/or circularity flags, and/or genome identifiers), then an error is raised. In addition to check for incompatible sequence information, merge(x, y) also compares seqnames(x) with seqnames(y) and issues a warning if each of them has names not in the other. The purpose of these checks is to try to detect situations where the user might be combining or comparing objects based on different reference genomes.

intersect(x, y): Finds the intersection between two Seqinfo objects by merging them and subsetting for the intersection of their sequence names. This makes it easy to avoid warnings about the objects not being subsets of each other during overlap operations.

Author(s)

H. Pages

See Also

  • seqinfo

  • The fetchExtendedChromInfoFromUCSC utility function that is used behind the scene to make a Seqinfo object for a supported genome (see examples below).

Examples

## ---------------------------------------------------------------------
## A. MAKING A Seqinfo OBJECT FOR A SUPPORTED GENOME
## ---------------------------------------------------------------------

if (interactive()) {
  ## This uses fetchExtendedChromInfoFromUCSC() behind the scene and
  ## thus requires internet access. See ?fetchExtendedChromInfoFromUCSC
  ## for the list of UCSC genomes that are currently supported.
  Seqinfo(genome="hg38")
  Seqinfo(genome="bosTau8")
  Seqinfo(genome="canFam3")
  Seqinfo(genome="musFur1")
  Seqinfo(genome="mm10")
  Seqinfo(genome="rn6")
  Seqinfo(genome="galGal4")
  Seqinfo(genome="dm6")
  Seqinfo(genome="sacCer3")
}

## ---------------------------------------------------------------------
## B. BASIC MANIPULATION OF A Seqinfo OBJECT
## ---------------------------------------------------------------------

## Note that all the arguments (except 'genome') must have the
## same length. 'genome' can be of length 1, whatever the lengths
## of the other arguments are.
x <- Seqinfo(seqnames=c("chr1", "chr2", "chr3", "chrM"),
             seqlengths=c(100, 200, NA, 15),
             isCircular=c(NA, FALSE, FALSE, TRUE),
             genome="toy")
x

## Accessors:
length(x)
seqnames(x)
names(x)
seqlevels(x)
seqlengths(x)
isCircular(x)
genome(x)

## Get a compact summary:
summary(x)

## Subset by names:
x[c("chrY", "chr3", "chr1")]

## Rename, drop, add and/or reorder the sequence levels:
xx <- x
seqlevels(xx) <- sub("chr", "ch", seqlevels(xx))  # rename
xx
seqlevels(xx) <- rev(seqlevels(xx))  # reorder
xx
seqlevels(xx) <- c("ch1", "ch2", "chY")  # drop/add/reorder
xx
seqlevels(xx) <- c(chY="Y", ch1="1", "22")  # rename/reorder/drop/add
xx

## ---------------------------------------------------------------------
## C. MERGING 2 Seqinfo OBJECTS
## ---------------------------------------------------------------------

y <- Seqinfo(seqnames=c("chr3", "chr4", "chrM"),
             seqlengths=c(300, NA, 15))
y

## This issues a warning:
merge(x, y)  # rows for chr3 and chrM are merged

## To get rid of the above warning, either use suppressWarnings() or
## set the genome on 'y':
suppressWarnings(merge(x, y))
genome(y) <- genome(x)
merge(x, y)

## Note that, strictly speaking, merging 2 Seqinfo objects is not
## a commutative operation, i.e., in general 'z1 <- merge(x, y)'
## is not identical to 'z2 <- merge(y, x)'. However 'z1' and 'z2'
## are guaranteed to contain the same information (i.e. the same
## rows, but typically not in the same order):
merge(y, x)

## This contradicts what 'x' says about circularity of chr3 and chrM:
isCircular(y)[c("chr3", "chrM")] <- c(TRUE, FALSE)
y
if (interactive()) {
  merge(x, y)  # raises an error
}

## Sanity checks:
stopifnot(identical(x, merge(x, Seqinfo())))
stopifnot(identical(x, merge(Seqinfo(), x)))
stopifnot(identical(x, merge(x, x)))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GenomeInfoDb)
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomeInfoDb/Seqinfo-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: Seqinfo-class
> ### Title: Seqinfo objects
> ### Aliases: class:Seqinfo Seqinfo-class Seqinfo Seqinfo
> ###   length,Seqinfo-method seqnames,Seqinfo-method
> ###   seqnames<-,Seqinfo-method names,Seqinfo-method names<-,Seqinfo-method
> ###   seqlevels,Seqinfo-method seqlevels<-,Seqinfo-method
> ###   seqlengths,Seqinfo-method seqlengths<-,Seqinfo-method
> ###   isCircular,Seqinfo-method isCircular<-,Seqinfo-method
> ###   genome,Seqinfo-method genome<-,Seqinfo-method [,Seqinfo-method
> ###   as.data.frame,Seqinfo-method coerce,data.frame,Seqinfo-method
> ###   coerce,DataFrame,Seqinfo-method summary.Seqinfo
> ###   summary,Seqinfo-method show,Seqinfo-method
> ###   merge,Seqinfo,missing-method merge,missing,Seqinfo-method
> ###   merge,Seqinfo,NULL-method merge,NULL,Seqinfo-method
> ###   merge,Seqinfo,Seqinfo-method intersect,Seqinfo,Seqinfo-method
> ### Keywords: methods classes
> 
> ### ** Examples
> 
> ## ---------------------------------------------------------------------
> ## A. MAKING A Seqinfo OBJECT FOR A SUPPORTED GENOME
> ## ---------------------------------------------------------------------
> 
> #if (interactive()) {
>   ## This uses fetchExtendedChromInfoFromUCSC() behind the scene and
>   ## thus requires internet access. See ?fetchExtendedChromInfoFromUCSC
>   ## for the list of UCSC genomes that are currently supported.
>   Seqinfo(genome="hg38")
Seqinfo object with 455 sequences (1 circular) from hg38 genome:
  seqnames         seqlengths isCircular genome
  chr1              248956422      FALSE   hg38
  chr2              242193529      FALSE   hg38
  chr3              198295559      FALSE   hg38
  chr4              190214555      FALSE   hg38
  chr5              181538259      FALSE   hg38
  ...                     ...        ...    ...
  chrUn_KI270753v1      62944      FALSE   hg38
  chrUn_KI270754v1      40191      FALSE   hg38
  chrUn_KI270755v1      36723      FALSE   hg38
  chrUn_KI270756v1      79590      FALSE   hg38
  chrUn_KI270757v1      71251      FALSE   hg38
>   Seqinfo(genome="bosTau8")
Seqinfo object with 3179 sequences (1 circular) from bosTau8 genome:
  seqnames         seqlengths isCircular  genome
  chr1              158337067      FALSE bosTau8
  chr2              137060424      FALSE bosTau8
  chr3              121430405      FALSE bosTau8
  chr4              120829699      FALSE bosTau8
  chr5              121191424      FALSE bosTau8
  ...                     ...        ...     ...
  chrUn_GJ060418v1        956      FALSE bosTau8
  chrUn_GJ060419v1       1016      FALSE bosTau8
  chrUn_GJ060420v1        934      FALSE bosTau8
  chrUn_GJ060421v1       1015      FALSE bosTau8
  chrUn_GJ060422v1        739      FALSE bosTau8
>   Seqinfo(genome="canFam3")
Seqinfo object with 3268 sequences (1 circular) from canFam3 genome:
  seqnames       seqlengths isCircular  genome
  chr1            122678785      FALSE canFam3
  chr2             85426708      FALSE canFam3
  chr3             91889043      FALSE canFam3
  chr4             88276631      FALSE canFam3
  chr5             88915250      FALSE canFam3
  ...                   ...        ...     ...
  chrUn_JH374189       8038      FALSE canFam3
  chrUn_JH374190       5797      FALSE canFam3
  chrUn_JH374191       6845      FALSE canFam3
  chrUn_JH374192       7721      FALSE canFam3
  chrUn_JH374193       5700      FALSE canFam3
>   Seqinfo(genome="musFur1")
Seqinfo object with 7741 sequences from musFur1 genome:
  seqnames     seqlengths isCircular  genome
  AEYP01107703      64955      FALSE musFur1
  AEYP01108159      42269      FALSE musFur1
  AEYP01108459      26290      FALSE musFur1
  AEYP01108526      24660      FALSE musFur1
  AEYP01108555      23825      FALSE musFur1
  ...                 ...        ...     ...
  GL898764           2765      FALSE musFur1
  GL898765           3999      FALSE musFur1
  GL898766           3993      FALSE musFur1
  GL898767           2677      FALSE musFur1
  GL898768           2649      FALSE musFur1
>   Seqinfo(genome="mm10")
Seqinfo object with 66 sequences (1 circular) from mm10 genome:
  seqnames       seqlengths isCircular genome
  chr1            195471971      FALSE   mm10
  chr2            182113224      FALSE   mm10
  chr3            160039680      FALSE   mm10
  chr4            156508116      FALSE   mm10
  chr5            151834684      FALSE   mm10
  ...                   ...        ...    ...
  chrUn_GL456392      23629      FALSE   mm10
  chrUn_GL456393      55711      FALSE   mm10
  chrUn_GL456394      24323      FALSE   mm10
  chrUn_GL456396      21240      FALSE   mm10
  chrUn_JH584304     114452      FALSE   mm10
>   Seqinfo(genome="rn6")
Seqinfo object with 953 sequences (1 circular) from rn6 genome:
  seqnames         seqlengths isCircular genome
  chr1              282763074      FALSE    rn6
  chr2              266435125      FALSE    rn6
  chr3              177699992      FALSE    rn6
  chr4              184226339      FALSE    rn6
  chr5              173707219      FALSE    rn6
  ...                     ...        ...    ...
  chrUn_KL568514v1      11876      FALSE    rn6
  chrUn_KL568515v1       2232      FALSE    rn6
  chrUn_KL568516v1       5687      FALSE    rn6
  chrUn_KL568517v1      13491      FALSE    rn6
  chrUn_KL568518v1       5301      FALSE    rn6
>   Seqinfo(genome="galGal4")
Seqinfo object with 15932 sequences (1 circular) from galGal4 genome:
  seqnames       seqlengths isCircular  genome
  chr1            195276750      FALSE galGal4
  chr2            148809762      FALSE galGal4
  chr3            110447801      FALSE galGal4
  chr4             90216835      FALSE galGal4
  chr5             59580361      FALSE galGal4
  ...                   ...        ...     ...
  chrUn_JH376409       7124      FALSE galGal4
  chrUn_JH376410      91309      FALSE galGal4
  chrUn_JH376411      51880      FALSE galGal4
  chrUn_JH376412     256162      FALSE galGal4
  chrUn_JH376413       7987      FALSE galGal4
>   Seqinfo(genome="dm6")
Seqinfo object with 1870 sequences (1 circular) from dm6 genome:
  seqnames         seqlengths isCircular genome
  chr2L              23513712      FALSE    dm6
  chr2R              25286936      FALSE    dm6
  chr3L              28110227      FALSE    dm6
  chr3R              32079331      FALSE    dm6
  chr4                1348131      FALSE    dm6
  ...                     ...        ...    ...
  chrUn_DS485998v1       1003      FALSE    dm6
  chrUn_DS486002v1       1001      FALSE    dm6
  chrUn_DS486004v1       1001      FALSE    dm6
  chrUn_DS486005v1       1001      FALSE    dm6
  chrUn_DS486008v1       1001      FALSE    dm6
>   Seqinfo(genome="sacCer3")
Seqinfo object with 17 sequences (1 circular) from sacCer3 genome:
  seqnames seqlengths isCircular  genome
  chrI         230218      FALSE sacCer3
  chrII        813184      FALSE sacCer3
  chrIII       316620      FALSE sacCer3
  chrIV       1531933      FALSE sacCer3
  chrV         576874      FALSE sacCer3
  ...             ...        ...     ...
  chrXIII      924431      FALSE sacCer3
  chrXIV       784333      FALSE sacCer3
  chrXV       1091291      FALSE sacCer3
  chrXVI       948066      FALSE sacCer3
  chrM          85779       TRUE sacCer3
> #}
> 
> ## ---------------------------------------------------------------------
> ## B. BASIC MANIPULATION OF A Seqinfo OBJECT
> ## ---------------------------------------------------------------------
> 
> ## Note that all the arguments (except 'genome') must have the
> ## same length. 'genome' can be of length 1, whatever the lengths
> ## of the other arguments are.
> x <- Seqinfo(seqnames=c("chr1", "chr2", "chr3", "chrM"),
+              seqlengths=c(100, 200, NA, 15),
+              isCircular=c(NA, FALSE, FALSE, TRUE),
+              genome="toy")
> x
Seqinfo object with 4 sequences (1 circular) from toy genome:
  seqnames seqlengths isCircular genome
  chr1            100         NA    toy
  chr2            200      FALSE    toy
  chr3             NA      FALSE    toy
  chrM             15       TRUE    toy
> 
> ## Accessors:
> length(x)
[1] 4
> seqnames(x)
[1] "chr1" "chr2" "chr3" "chrM"
> names(x)
[1] "chr1" "chr2" "chr3" "chrM"
> seqlevels(x)
[1] "chr1" "chr2" "chr3" "chrM"
> seqlengths(x)
chr1 chr2 chr3 chrM 
 100  200   NA   15 
> isCircular(x)
 chr1  chr2  chr3  chrM 
   NA FALSE FALSE  TRUE 
> genome(x)
 chr1  chr2  chr3  chrM 
"toy" "toy" "toy" "toy" 
> 
> ## Get a compact summary:
> summary(x)
[1] "4 sequences (1 circular) from toy genome"
> 
> ## Subset by names:
> x[c("chrY", "chr3", "chr1")]
Seqinfo object with 3 sequences from 2 genomes (NA, toy):
  seqnames seqlengths isCircular genome
  chrY             NA         NA   <NA>
  chr3             NA      FALSE    toy
  chr1            100         NA    toy
> 
> ## Rename, drop, add and/or reorder the sequence levels:
> xx <- x
> seqlevels(xx) <- sub("chr", "ch", seqlevels(xx))  # rename
> xx
Seqinfo object with 4 sequences (1 circular) from toy genome:
  seqnames seqlengths isCircular genome
  ch1             100         NA    toy
  ch2             200      FALSE    toy
  ch3              NA      FALSE    toy
  chM              15       TRUE    toy
> seqlevels(xx) <- rev(seqlevels(xx))  # reorder
> xx
Seqinfo object with 4 sequences (1 circular) from toy genome:
  seqnames seqlengths isCircular genome
  chM              15       TRUE    toy
  ch3              NA      FALSE    toy
  ch2             200      FALSE    toy
  ch1             100         NA    toy
> seqlevels(xx) <- c("ch1", "ch2", "chY")  # drop/add/reorder
> xx
Seqinfo object with 3 sequences from 2 genomes (toy, NA):
  seqnames seqlengths isCircular genome
  ch1             100         NA    toy
  ch2             200      FALSE    toy
  chY              NA         NA   <NA>
> seqlevels(xx) <- c(chY="Y", ch1="1", "22")  # rename/reorder/drop/add
> xx
Seqinfo object with 3 sequences from 2 genomes (NA, toy):
  seqnames seqlengths isCircular genome
  Y                NA         NA   <NA>
  1               100         NA    toy
  22               NA         NA   <NA>
> 
> ## ---------------------------------------------------------------------
> ## C. MERGING 2 Seqinfo OBJECTS
> ## ---------------------------------------------------------------------
> 
> y <- Seqinfo(seqnames=c("chr3", "chr4", "chrM"),
+              seqlengths=c(300, NA, 15))
> y
Seqinfo object with 3 sequences from an unspecified genome:
  seqnames seqlengths isCircular genome
  chr3            300         NA   <NA>
  chr4             NA         NA   <NA>
  chrM             15         NA   <NA>
> 
> ## This issues a warning:
> merge(x, y)  # rows for chr3 and chrM are merged
Seqinfo object with 5 sequences (1 circular) from 2 genomes (toy, NA):
  seqnames seqlengths isCircular genome
  chr1            100         NA    toy
  chr2            200      FALSE    toy
  chr3            300      FALSE    toy
  chrM             15       TRUE    toy
  chr4             NA         NA   <NA>
Warning message:
In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chr1, chr2
  - in 'y': chr4
  Make sure to always combine/compare objects based on the same reference
  genome (use suppressWarnings() to suppress this warning).
> 
> ## To get rid of the above warning, either use suppressWarnings() or
> ## set the genome on 'y':
> suppressWarnings(merge(x, y))
Seqinfo object with 5 sequences (1 circular) from 2 genomes (toy, NA):
  seqnames seqlengths isCircular genome
  chr1            100         NA    toy
  chr2            200      FALSE    toy
  chr3            300      FALSE    toy
  chrM             15       TRUE    toy
  chr4             NA         NA   <NA>
> genome(y) <- genome(x)
> merge(x, y)
Seqinfo object with 5 sequences (1 circular) from toy genome:
  seqnames seqlengths isCircular genome
  chr1            100         NA    toy
  chr2            200      FALSE    toy
  chr3            300      FALSE    toy
  chrM             15       TRUE    toy
  chr4             NA         NA    toy
> 
> ## Note that, strictly speaking, merging 2 Seqinfo objects is not
> ## a commutative operation, i.e., in general 'z1 <- merge(x, y)'
> ## is not identical to 'z2 <- merge(y, x)'. However 'z1' and 'z2'
> ## are guaranteed to contain the same information (i.e. the same
> ## rows, but typically not in the same order):
> merge(y, x)
Seqinfo object with 5 sequences (1 circular) from toy genome:
  seqnames seqlengths isCircular genome
  chr3            300      FALSE    toy
  chr4             NA         NA    toy
  chrM             15       TRUE    toy
  chr1            100         NA    toy
  chr2            200      FALSE    toy
> 
> ## This contradicts what 'x' says about circularity of chr3 and chrM:
> isCircular(y)[c("chr3", "chrM")] <- c(TRUE, FALSE)
> y
Seqinfo object with 3 sequences (1 circular) from toy genome:
  seqnames seqlengths isCircular genome
  chr3            300       TRUE    toy
  chr4             NA         NA    toy
  chrM             15      FALSE    toy
> #if (interactive()) {
>   merge(x, y)  # raises an error
Error in mergeNamedAtomicVectors(isCircular(x), isCircular(y), what = c("sequence",  : 
  sequences chr3, chrM have incompatible circularity flags:
  - in 'x': FALSE, TRUE
  - in 'y': TRUE, FALSE
Calls: merge ... .Seqinfo.merge -> .Seqinfo.mergexy -> mergeNamedAtomicVectors
Execution halted