R: Conveniently rename the seqlevels of an object according to a...
seqlevelsStyle
R Documentation
Conveniently rename the seqlevels of an object according to a given style
Description
The seqlevelsStyle getter and setter can be used to get the current
seqlevels style of an object and to rename its seqlevels according to a given
style.
The object from/on which to get/set the seqlevels style. x must
have a seqlevels method or be a character vector.
value
A single character string that sets the seqlevels style for x.
species
The genus and species of the organism in question
separated by a single space. Don't forget to capitalize the genus.
style
a character vector with a single element to specify the style.
group
Group can be 'auto' for autosomes, 'sex' for
sex chromosomes/allosomes, 'circular' for circular chromosomes. The
default is 'all' which returns all the chromosomes.
best.only
if TRUE (the default), then only the "best"
sequence renaming maps (i.e. the rows with less NAs) are returned.
drop
if TRUE (the default), then a vector is returned
instead of a matrix when the matrix has only 1 row.
seqnames
a character vector containing the labels attached to the
chromosomes in a given genome for a given style. For example : For
Homo sapiens, NCBI style - they are
"1","2","3",...,"X","Y","MT"
Details
seqlevelsStyle(x), seqlevelsStyle(x) <- value:
Get the current seqlevels style of an object, or rename its seqlevels
according to the supplied style.
genomeStyles:
Different organizations have different naming conventions for how they
name the biologically defined sequence elements (usually chromosomes)
for each organism they support. The Seqnames package contains a
database that defines these different conventions.
genomeStyles() returns the list of all supported seqname mappings,
one per supported organism. Each mapping is represented as a data frame
with 1 column per seqname style and 1 row per chromosome name
(not all chromosomes of a given organism necessarily belong to the mapping).
genomeStyles(species) returns a data.frame only for the given organism
with all its supported seqname mappings.
extractSeqlevels:
Returns a character vector of the seqnames for a single style and species.
extractSeqlevelsByGroup:
Returns a character vector of the seqnames for a single style and species
by group. Group can be 'auto' for autosomes, 'sex' for sex chromosomes/
allosomes, 'circular' for circular chromosomes. The default is 'all' which
returns all the chromosomes.
mapSeqlevels:
Returns a matrix with 1 column per supplied sequence name and 1 row
per sequence renaming map compatible with the specified style.
If best.only is TRUE (the default), only the "best"
renaming maps (i.e. the rows with less NAs) are returned.
seqlevelsInGroup:
It takes a character vector along with a group and optional style and
species.If group is not specified , it returns "all" or standard/top level
seqnames.
Returns a character vector of seqnames after subsetting for the group
specified by the user. See examples for more details.
Value
For seqlevelsStyle: A single string containing the style of the
seqlevels in x, or a character vector containing the styles of the
seqlevels in x if the current style cannot be determined
unambiguously. Note that this information is not stored in x but
inferred by looking up a seqlevels style database stored inside the
GenomeInfoDb package.
For extractSeqlevels , extractSeqlevelsByGroup and
seqlevelsInGroup: A character vector of seqlevels
for given supported species and group.
For mapSeqlevels: A matrix with 1 column per supplied sequence
name and 1 row per sequence renaming map compatible with the specified style.
For genomeStyle: If species is specified returns a data.frame
containg the seqlevels style and its mapping for a given organism. If species
is not specified, a list is returned with one list per species containing
the seqlevels style with the corresponding mappings.
Author(s)
Sonali Arora, Martin Morgan, Marc Carlson, H. Pages
Examples
## ---------------------------------------------------------------------
## seqlevelsStyle() getter and setter
## ---------------------------------------------------------------------
## character vectors:
x <- paste0("chr", 1:5)
seqlevelsStyle(x)
seqlevelsStyle(x) <- "NCBI"
x
## GRanges:
library(GenomicRanges)
gr <- GRanges(rep(c("chr2", "chr3", "chrM"), 2), IRanges(1:6, 10))
seqlevelsStyle(gr)
seqlevelsStyle(gr) <- "NCBI"
gr
seqlevelsStyle(gr)
seqlevelsStyle(gr) <- "dbSNP"
gr
seqlevelsStyle(gr)
seqlevelsStyle(gr) <- "UCSC"
gr
## ---------------------------------------------------------------------
## Related low-level utilities
## ---------------------------------------------------------------------
## Genome styles:
names(genomeStyles())
genomeStyles("Homo_sapiens")
"UCSC" %in% names(genomeStyles("Homo_sapiens"))
## Extract seqlevels based on species, style and group:
## The 'group' argument can be 'sex', 'auto', 'circular' or 'all'.
## All:
extractSeqlevels(species="Drosophila_melanogaster", style="Ensembl")
## Sex chromosomes:
extractSeqlevelsByGroup(species="Homo_sapiens", style="UCSC", group="sex")
## Autosomes:
extractSeqlevelsByGroup(species="Homo_sapiens", style="UCSC", group="auto")
## Identify which seqnames belong to a particular 'group':
newchr <- paste0("chr",c(1:22,"X","Y","M","1_gl000192_random","4_ctg9"))
seqlevelsInGroup(newchr, group="sex")
newchr <- as.character(c(1:22,"X","Y","MT"))
seqlevelsInGroup(newchr, group="all","Homo_sapiens","NCBI")
## Identify which seqnames belong to a species and style:
seqnames <- c("chr1","chr9", "chr2", "chr3", "chr10")
all(seqnames %in% extractSeqlevels("Homo_sapiens", "UCSC"))
## Find mapped seqlevelsStyles for exsiting seqnames:
mapSeqlevels(c("chrII", "chrIII", "chrM"), "NCBI")
mapSeqlevels(c("chrII", "chrIII", "chrM"), "Ensembl")
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(GenomeInfoDb)
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomeInfoDb/seqlevelsStyle.Rd_%03d_medium.png", width=480, height=480)
> ### Name: seqlevelsStyle
> ### Title: Conveniently rename the seqlevels of an object according to a
> ### given style
> ### Aliases: seqlevelsStyle seqlevelsStyle<-
> ### seqlevelsStyle,character-method seqlevelsStyle,ANY-method
> ### seqlevelsStyle<-,ANY-method seqlevelsStyle<-,character-method
> ### genomeStyles extractSeqlevels extractSeqlevelsByGroup mapSeqlevels
> ### seqlevelsInGroup
>
> ### ** Examples
>
> ## ---------------------------------------------------------------------
> ## seqlevelsStyle() getter and setter
> ## ---------------------------------------------------------------------
>
> ## character vectors:
> x <- paste0("chr", 1:5)
> seqlevelsStyle(x)
[1] "UCSC"
> seqlevelsStyle(x) <- "NCBI"
> x
[1] "1" "2" "3" "4" "5"
>
> ## GRanges:
> library(GenomicRanges)
> gr <- GRanges(rep(c("chr2", "chr3", "chrM"), 2), IRanges(1:6, 10))
>
> seqlevelsStyle(gr)
[1] "UCSC"
> seqlevelsStyle(gr) <- "NCBI"
> gr
GRanges object with 6 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 2 [1, 10] *
[2] 3 [2, 10] *
[3] MT [3, 10] *
[4] 2 [4, 10] *
[5] 3 [5, 10] *
[6] MT [6, 10] *
-------
seqinfo: 3 sequences from an unspecified genome; no seqlengths
>
> seqlevelsStyle(gr)
[1] "NCBI" "Ensembl"
> seqlevelsStyle(gr) <- "dbSNP"
> gr
GRanges object with 6 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] ch2 [1, 10] *
[2] ch3 [2, 10] *
[3] chMT [3, 10] *
[4] ch2 [4, 10] *
[5] ch3 [5, 10] *
[6] chMT [6, 10] *
-------
seqinfo: 3 sequences from an unspecified genome; no seqlengths
>
> seqlevelsStyle(gr)
[1] "dbSNP"
> seqlevelsStyle(gr) <- "UCSC"
> gr
GRanges object with 6 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2 [1, 10] *
[2] chr3 [2, 10] *
[3] chrM [3, 10] *
[4] chr2 [4, 10] *
[5] chr3 [5, 10] *
[6] chrM [6, 10] *
-------
seqinfo: 3 sequences from an unspecified genome; no seqlengths
>
> ## ---------------------------------------------------------------------
> ## Related low-level utilities
> ## ---------------------------------------------------------------------
>
> ## Genome styles:
> names(genomeStyles())
[1] "Arabidopsis_thaliana" "Caenorhabditis_elegans"
[3] "Canis_familiaris" "Cyanidioschyzon_merolae"
[5] "Drosophila_melanogaster" "Homo_sapiens"
[7] "Mus_musculus" "Oryza_sativa"
[9] "Populus_trichocarpa" "Rattus_norvegicus"
[11] "Saccharomyces_cerevisiae" "Zea_mays"
> genomeStyles("Homo_sapiens")
circular auto sex NCBI UCSC dbSNP Ensembl
1 FALSE TRUE FALSE 1 chr1 ch1 1
2 FALSE TRUE FALSE 2 chr2 ch2 2
3 FALSE TRUE FALSE 3 chr3 ch3 3
4 FALSE TRUE FALSE 4 chr4 ch4 4
5 FALSE TRUE FALSE 5 chr5 ch5 5
6 FALSE TRUE FALSE 6 chr6 ch6 6
7 FALSE TRUE FALSE 7 chr7 ch7 7
8 FALSE TRUE FALSE 8 chr8 ch8 8
9 FALSE TRUE FALSE 9 chr9 ch9 9
10 FALSE TRUE FALSE 10 chr10 ch10 10
11 FALSE TRUE FALSE 11 chr11 ch11 11
12 FALSE TRUE FALSE 12 chr12 ch12 12
13 FALSE TRUE FALSE 13 chr13 ch13 13
14 FALSE TRUE FALSE 14 chr14 ch14 14
15 FALSE TRUE FALSE 15 chr15 ch15 15
16 FALSE TRUE FALSE 16 chr16 ch16 16
17 FALSE TRUE FALSE 17 chr17 ch17 17
18 FALSE TRUE FALSE 18 chr18 ch18 18
19 FALSE TRUE FALSE 19 chr19 ch19 19
20 FALSE TRUE FALSE 20 chr20 ch20 20
21 FALSE TRUE FALSE 21 chr21 ch21 21
22 FALSE TRUE FALSE 22 chr22 ch22 22
23 FALSE FALSE TRUE X chrX chX X
24 FALSE FALSE TRUE Y chrY chY Y
25 TRUE FALSE FALSE MT chrM chMT MT
> "UCSC" %in% names(genomeStyles("Homo_sapiens"))
[1] TRUE
>
> ## Extract seqlevels based on species, style and group:
> ## The 'group' argument can be 'sex', 'auto', 'circular' or 'all'.
>
> ## All:
> extractSeqlevels(species="Drosophila_melanogaster", style="Ensembl")
[1] "2L" "2R"
[3] "3L" "3R"
[5] "4" "X"
[7] "dmel_mitochondrion_genome" "2LHet"
[9] "2RHet" "3LHet"
[11] "3RHet" "XHet"
[13] "YHet" "U"
[15] "Uextra"
>
> ## Sex chromosomes:
> extractSeqlevelsByGroup(species="Homo_sapiens", style="UCSC", group="sex")
[1] "chrX" "chrY"
>
> ## Autosomes:
> extractSeqlevelsByGroup(species="Homo_sapiens", style="UCSC", group="auto")
[1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9"
[10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"
[19] "chr19" "chr20" "chr21" "chr22"
>
>
> ## Identify which seqnames belong to a particular 'group':
> newchr <- paste0("chr",c(1:22,"X","Y","M","1_gl000192_random","4_ctg9"))
> seqlevelsInGroup(newchr, group="sex")
[1] "chrX" "chrY"
>
> newchr <- as.character(c(1:22,"X","Y","MT"))
> seqlevelsInGroup(newchr, group="all","Homo_sapiens","NCBI")
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15"
[16] "16" "17" "18" "19" "20" "21" "22" "X" "Y" "MT"
>
> ## Identify which seqnames belong to a species and style:
> seqnames <- c("chr1","chr9", "chr2", "chr3", "chr10")
> all(seqnames %in% extractSeqlevels("Homo_sapiens", "UCSC"))
[1] TRUE
>
> ## Find mapped seqlevelsStyles for exsiting seqnames:
> mapSeqlevels(c("chrII", "chrIII", "chrM"), "NCBI")
chrII chrIII chrM
"II" "III" "MT"
> mapSeqlevels(c("chrII", "chrIII", "chrM"), "Ensembl")
chrII chrIII chrM
[1,] "II" "III" "MtDNA"
[2,] "II" "III" "Mito"
>
>
>
>
>
>
> dev.off()
null device
1
>