Last data update: 2014.03.03

R: Conveniently rename the seqlevels of an object according to a...
seqlevelsStyleR Documentation

Conveniently rename the seqlevels of an object according to a given style

Description

The seqlevelsStyle getter and setter can be used to get the current seqlevels style of an object and to rename its seqlevels according to a given style.

Usage

seqlevelsStyle(x)
seqlevelsStyle(x) <- value

## Related low-level utilities:
genomeStyles(species)
extractSeqlevels(species, style)
extractSeqlevelsByGroup(species, style, group)
mapSeqlevels(seqnames, style, best.only=TRUE, drop=TRUE)
seqlevelsInGroup(seqnames, group, species, style)

Arguments

x

The object from/on which to get/set the seqlevels style. x must have a seqlevels method or be a character vector.

value

A single character string that sets the seqlevels style for x.

species

The genus and species of the organism in question separated by a single space. Don't forget to capitalize the genus.

style

a character vector with a single element to specify the style.

group

Group can be 'auto' for autosomes, 'sex' for sex chromosomes/allosomes, 'circular' for circular chromosomes. The default is 'all' which returns all the chromosomes.

best.only

if TRUE (the default), then only the "best" sequence renaming maps (i.e. the rows with less NAs) are returned.

drop

if TRUE (the default), then a vector is returned instead of a matrix when the matrix has only 1 row.

seqnames

a character vector containing the labels attached to the chromosomes in a given genome for a given style. For example : For Homo sapiens, NCBI style - they are "1","2","3",...,"X","Y","MT"

Details

seqlevelsStyle(x), seqlevelsStyle(x) <- value: Get the current seqlevels style of an object, or rename its seqlevels according to the supplied style.

genomeStyles: Different organizations have different naming conventions for how they name the biologically defined sequence elements (usually chromosomes) for each organism they support. The Seqnames package contains a database that defines these different conventions.

genomeStyles() returns the list of all supported seqname mappings, one per supported organism. Each mapping is represented as a data frame with 1 column per seqname style and 1 row per chromosome name (not all chromosomes of a given organism necessarily belong to the mapping).

genomeStyles(species) returns a data.frame only for the given organism with all its supported seqname mappings.

extractSeqlevels: Returns a character vector of the seqnames for a single style and species.

extractSeqlevelsByGroup: Returns a character vector of the seqnames for a single style and species by group. Group can be 'auto' for autosomes, 'sex' for sex chromosomes/ allosomes, 'circular' for circular chromosomes. The default is 'all' which returns all the chromosomes.

mapSeqlevels: Returns a matrix with 1 column per supplied sequence name and 1 row per sequence renaming map compatible with the specified style. If best.only is TRUE (the default), only the "best" renaming maps (i.e. the rows with less NAs) are returned.

seqlevelsInGroup: It takes a character vector along with a group and optional style and species.If group is not specified , it returns "all" or standard/top level seqnames. Returns a character vector of seqnames after subsetting for the group specified by the user. See examples for more details.

Value

For seqlevelsStyle: A single string containing the style of the seqlevels in x, or a character vector containing the styles of the seqlevels in x if the current style cannot be determined unambiguously. Note that this information is not stored in x but inferred by looking up a seqlevels style database stored inside the GenomeInfoDb package.

For extractSeqlevels , extractSeqlevelsByGroup and seqlevelsInGroup: A character vector of seqlevels for given supported species and group.

For mapSeqlevels: A matrix with 1 column per supplied sequence name and 1 row per sequence renaming map compatible with the specified style.

For genomeStyle: If species is specified returns a data.frame containg the seqlevels style and its mapping for a given organism. If species is not specified, a list is returned with one list per species containing the seqlevels style with the corresponding mappings.

Author(s)

Sonali Arora, Martin Morgan, Marc Carlson, H. Pages

Examples

## ---------------------------------------------------------------------
## seqlevelsStyle() getter and setter
## ---------------------------------------------------------------------

## character vectors:
x <- paste0("chr", 1:5)
seqlevelsStyle(x)
seqlevelsStyle(x) <- "NCBI"
x

## GRanges:
library(GenomicRanges)
gr <- GRanges(rep(c("chr2", "chr3", "chrM"), 2), IRanges(1:6, 10))

seqlevelsStyle(gr)
seqlevelsStyle(gr) <- "NCBI"
gr

seqlevelsStyle(gr)
seqlevelsStyle(gr) <- "dbSNP"
gr

seqlevelsStyle(gr)
seqlevelsStyle(gr) <- "UCSC"
gr

## ---------------------------------------------------------------------
## Related low-level utilities
## ---------------------------------------------------------------------

## Genome styles:
names(genomeStyles())
genomeStyles("Homo_sapiens")
"UCSC" %in% names(genomeStyles("Homo_sapiens"))

## Extract seqlevels based on species, style and group:
## The 'group' argument can be 'sex', 'auto', 'circular' or 'all'.

## All:
extractSeqlevels(species="Drosophila_melanogaster", style="Ensembl")

## Sex chromosomes:
extractSeqlevelsByGroup(species="Homo_sapiens", style="UCSC", group="sex")

## Autosomes:
extractSeqlevelsByGroup(species="Homo_sapiens", style="UCSC", group="auto")


## Identify which seqnames belong to a particular 'group':
newchr <- paste0("chr",c(1:22,"X","Y","M","1_gl000192_random","4_ctg9"))
seqlevelsInGroup(newchr, group="sex")

newchr <- as.character(c(1:22,"X","Y","MT"))
seqlevelsInGroup(newchr, group="all","Homo_sapiens","NCBI")

## Identify which seqnames belong to a species and style:
seqnames <- c("chr1","chr9", "chr2", "chr3", "chr10")
all(seqnames %in% extractSeqlevels("Homo_sapiens", "UCSC"))

## Find mapped seqlevelsStyles for exsiting seqnames:
mapSeqlevels(c("chrII", "chrIII", "chrM"), "NCBI")
mapSeqlevels(c("chrII", "chrIII", "chrM"), "Ensembl")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GenomeInfoDb)
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomeInfoDb/seqlevelsStyle.Rd_%03d_medium.png", width=480, height=480)
> ### Name: seqlevelsStyle
> ### Title: Conveniently rename the seqlevels of an object according to a
> ###   given style
> ### Aliases: seqlevelsStyle seqlevelsStyle<-
> ###   seqlevelsStyle,character-method seqlevelsStyle,ANY-method
> ###   seqlevelsStyle<-,ANY-method seqlevelsStyle<-,character-method
> ###   genomeStyles extractSeqlevels extractSeqlevelsByGroup mapSeqlevels
> ###   seqlevelsInGroup
> 
> ### ** Examples
> 
> ## ---------------------------------------------------------------------
> ## seqlevelsStyle() getter and setter
> ## ---------------------------------------------------------------------
> 
> ## character vectors:
> x <- paste0("chr", 1:5)
> seqlevelsStyle(x)
[1] "UCSC"
> seqlevelsStyle(x) <- "NCBI"
> x
[1] "1" "2" "3" "4" "5"
> 
> ## GRanges:
> library(GenomicRanges)
> gr <- GRanges(rep(c("chr2", "chr3", "chrM"), 2), IRanges(1:6, 10))
> 
> seqlevelsStyle(gr)
[1] "UCSC"
> seqlevelsStyle(gr) <- "NCBI"
> gr
GRanges object with 6 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]        2   [1, 10]      *
  [2]        3   [2, 10]      *
  [3]       MT   [3, 10]      *
  [4]        2   [4, 10]      *
  [5]        3   [5, 10]      *
  [6]       MT   [6, 10]      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
> 
> seqlevelsStyle(gr)
[1] "NCBI"    "Ensembl"
> seqlevelsStyle(gr) <- "dbSNP"
> gr
GRanges object with 6 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]      ch2   [1, 10]      *
  [2]      ch3   [2, 10]      *
  [3]     chMT   [3, 10]      *
  [4]      ch2   [4, 10]      *
  [5]      ch3   [5, 10]      *
  [6]     chMT   [6, 10]      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
> 
> seqlevelsStyle(gr)
[1] "dbSNP"
> seqlevelsStyle(gr) <- "UCSC"
> gr
GRanges object with 6 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr2   [1, 10]      *
  [2]     chr3   [2, 10]      *
  [3]     chrM   [3, 10]      *
  [4]     chr2   [4, 10]      *
  [5]     chr3   [5, 10]      *
  [6]     chrM   [6, 10]      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
> 
> ## ---------------------------------------------------------------------
> ## Related low-level utilities
> ## ---------------------------------------------------------------------
> 
> ## Genome styles:
> names(genomeStyles())
 [1] "Arabidopsis_thaliana"     "Caenorhabditis_elegans"  
 [3] "Canis_familiaris"         "Cyanidioschyzon_merolae" 
 [5] "Drosophila_melanogaster"  "Homo_sapiens"            
 [7] "Mus_musculus"             "Oryza_sativa"            
 [9] "Populus_trichocarpa"      "Rattus_norvegicus"       
[11] "Saccharomyces_cerevisiae" "Zea_mays"                
> genomeStyles("Homo_sapiens")
   circular  auto   sex NCBI  UCSC dbSNP Ensembl
1     FALSE  TRUE FALSE    1  chr1   ch1       1
2     FALSE  TRUE FALSE    2  chr2   ch2       2
3     FALSE  TRUE FALSE    3  chr3   ch3       3
4     FALSE  TRUE FALSE    4  chr4   ch4       4
5     FALSE  TRUE FALSE    5  chr5   ch5       5
6     FALSE  TRUE FALSE    6  chr6   ch6       6
7     FALSE  TRUE FALSE    7  chr7   ch7       7
8     FALSE  TRUE FALSE    8  chr8   ch8       8
9     FALSE  TRUE FALSE    9  chr9   ch9       9
10    FALSE  TRUE FALSE   10 chr10  ch10      10
11    FALSE  TRUE FALSE   11 chr11  ch11      11
12    FALSE  TRUE FALSE   12 chr12  ch12      12
13    FALSE  TRUE FALSE   13 chr13  ch13      13
14    FALSE  TRUE FALSE   14 chr14  ch14      14
15    FALSE  TRUE FALSE   15 chr15  ch15      15
16    FALSE  TRUE FALSE   16 chr16  ch16      16
17    FALSE  TRUE FALSE   17 chr17  ch17      17
18    FALSE  TRUE FALSE   18 chr18  ch18      18
19    FALSE  TRUE FALSE   19 chr19  ch19      19
20    FALSE  TRUE FALSE   20 chr20  ch20      20
21    FALSE  TRUE FALSE   21 chr21  ch21      21
22    FALSE  TRUE FALSE   22 chr22  ch22      22
23    FALSE FALSE  TRUE    X  chrX   chX       X
24    FALSE FALSE  TRUE    Y  chrY   chY       Y
25     TRUE FALSE FALSE   MT  chrM  chMT      MT
> "UCSC" %in% names(genomeStyles("Homo_sapiens"))
[1] TRUE
> 
> ## Extract seqlevels based on species, style and group:
> ## The 'group' argument can be 'sex', 'auto', 'circular' or 'all'.
> 
> ## All:
> extractSeqlevels(species="Drosophila_melanogaster", style="Ensembl")
 [1] "2L"                        "2R"                       
 [3] "3L"                        "3R"                       
 [5] "4"                         "X"                        
 [7] "dmel_mitochondrion_genome" "2LHet"                    
 [9] "2RHet"                     "3LHet"                    
[11] "3RHet"                     "XHet"                     
[13] "YHet"                      "U"                        
[15] "Uextra"                   
> 
> ## Sex chromosomes:
> extractSeqlevelsByGroup(species="Homo_sapiens", style="UCSC", group="sex")
[1] "chrX" "chrY"
> 
> ## Autosomes:
> extractSeqlevelsByGroup(species="Homo_sapiens", style="UCSC", group="auto")
 [1] "chr1"  "chr2"  "chr3"  "chr4"  "chr5"  "chr6"  "chr7"  "chr8"  "chr9" 
[10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"
[19] "chr19" "chr20" "chr21" "chr22"
> 
> 
> ## Identify which seqnames belong to a particular 'group':
> newchr <- paste0("chr",c(1:22,"X","Y","M","1_gl000192_random","4_ctg9"))
> seqlevelsInGroup(newchr, group="sex")
[1] "chrX" "chrY"
> 
> newchr <- as.character(c(1:22,"X","Y","MT"))
> seqlevelsInGroup(newchr, group="all","Homo_sapiens","NCBI")
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15"
[16] "16" "17" "18" "19" "20" "21" "22" "X"  "Y"  "MT"
> 
> ## Identify which seqnames belong to a species and style:
> seqnames <- c("chr1","chr9", "chr2", "chr3", "chr10")
> all(seqnames %in% extractSeqlevels("Homo_sapiens", "UCSC"))
[1] TRUE
> 
> ## Find mapped seqlevelsStyles for exsiting seqnames:
> mapSeqlevels(c("chrII", "chrIII", "chrM"), "NCBI")
 chrII chrIII   chrM 
  "II"  "III"   "MT" 
> mapSeqlevels(c("chrII", "chrIII", "chrM"), "Ensembl")
     chrII chrIII chrM   
[1,] "II"  "III"  "MtDNA"
[2,] "II"  "III"  "Mito" 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>