Any object having a Seqinfo class in which the seqlevels will
be kept, dropped or renamed.
value
A named or unnamed character vector.
Names are ignored by keepSeqlevels and dropSeqlevels.
Only the values in the character vector dictate which seqlevels to keep
or drop.
In the case of renameSeqlevels, the names are used to map new
sequence levels to the old (names correspond to the old levels). When
value is unnamed, the replacement vector must the same length
and in the same order as the original seqlevels(x).
species
The species name of the Seqinfo class in which the seqlevels will
be kept, dropped or renamed.
Details
Matching and overlap operations on range objects often require that the
seqlevels match before a comparison can be made (e.g., findOverlaps).
keepSeqlevels, dropSeqlevels and renameSeqlevels are
high-level convenience functions that wrap the low-level seqlevels
setter.
keepSeqlevels, dropSeqlevels: Subsetting operations
that modify the size of x. keepSeqlevels keeps only the
seqlevels in value and removes all others. dropSeqlevels
drops the levels in value and retains all others. If value
does not match any seqlevels in x an empty object is returned.
renameSeqlevels: Rename the seqlevels in x to those in
value. If value is a named character vector, the names are used
to map the new seqlevels to the old. When value is unnamed, the
replacement vector must be the same length and in the same order as the
original seqlevels(x).
restoreSeqlevels: Perform seqlevels(txdb) <- seqlevels0(txdb),
that is, restore the seqlevels in x back to the original values.
Applicable only when x is a TxDb object.
keepStandardChromosomes:Subsetting operation that returns only the
'standard' Chromosomes. We define 'standard chromosomes' as those chromosomes
which represent sequences in the assembly that are not scaffolds. Also
referred to as 'assembly molecule' on NCBI. Applicable when x
has a Seqinfo object.This function determines which seqlevels need to be kept
using the organism's supported by GenomeInfoDb. The user can also specify the
species to get the standard Chromsomes in x.
Value
The x object with seqlevels removed or renamed. If x has
no seqlevels (empty object) or no replacement values match the current
seqlevels in x the unchanged x is returned.
## ---------------------------------------------------------------------
## keepSeqlevels / dropSeqlevels
## ---------------------------------------------------------------------
## GRanges / GAlignments:
library(GenomicRanges)
gr <- GRanges(c("chr1", "chr1", "chr2", "chr3"), IRanges(1:4, width=3))
seqlevels(gr)
## Keep only 'chr1'
chr1 <- keepSeqlevels(gr, "chr1")
## Drop 'chr1'. Both 'chr2' and 'chr3' are kept.
chr2 <- dropSeqlevels(gr, "chr1")
library(Rsamtools) # for the ex1.bam file
library(GenomicAlignments) # for readGAlignments()
fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
gal <- readGAlignments(fl)
## If 'value' is named, the names are ignored.
seq2 <- keepSeqlevels(gal, c(foo="seq2"))
seqlevels(seq2)
## GRangesList / GAlignmentsList:
grl <- split(gr, as.character(seqnames(gr)))
dropSeqlevels(grl, c("chr1", "chr2"))
galist <- split(gal, as.character(seqnames(gal)))
keepSeqlevels(galist, "seq2")
## TxDb:
## A TxDb cannot be directly subset with 'keepSeqlevels'
## and 'dropSeqlevels'.
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
seqlevels(txdb)
## Not run:
keepSeqlevels(txdb, "chr2L") ## fails
## End(Not run)
## GRanges or GRangesLists extracted from the TxDb can be subset.
txbygene <- transcriptsBy(txdb, "gene")
seqlevels(txbygene)
chr2L <- keepSeqlevels(txbygene, "chr2L")
seqlevels(chr2L)
## ---------------------------------------------------------------------
## renameSeqlevels
## ---------------------------------------------------------------------
## GAlignments:
seqlevels(gal)
## Rename 'seq2' to 'chr2' with a named vector.
gal2a <- renameSeqlevels(gal, c(seq2="chr2"))
## Rename 'seq2' to 'chr2' with an unnamed vector that includes all
## seqlevels as they appear in the object.
gal2b <- renameSeqlevels(gal, c("seq1", "chr2"))
## Names that do not match existing seqlevels are ignored.
## This attempt at renaming does nothing.
gal3 <- renameSeqlevels(gal, c(foo="chr2"))
identical(seqlevels(gal), seqlevels(gal3))
## TxDb:
seqlevels(txdb)
## When the seqlevels of a TxDb are renamed, all future
## extractions reflect the modified seqlevels.
renameSeqlevels(txdb, sub("chr", "CH", seqlevels(txdb)))
renameSeqlevels(txdb, c(CHM="M"))
seqlevels(txdb)
transcripts <- transcripts(txdb)
identical(seqlevels(txdb), seqlevels(transcripts))
## ---------------------------------------------------------------------
## restoreSeqlevels
## ---------------------------------------------------------------------
## Restore seqlevels in a TxDb to original values.
## Not run:
txdb <- restoreSeqlevels(txdb)
seqlevels(txdb)
## End(Not run)
## ---------------------------------------------------------------------
## keepStandardChromosomes
## ---------------------------------------------------------------------
gr <- GRanges(c(paste0("chr",c(1:3)), "chr1_gl000191_random",
"chr1_gl000192_random"), IRanges(1:5, width=3))
gr
grl <- split(gr,seqnames(gr))
##GRanges
keepStandardChromosomes(gr)
##GRangesList
keepStandardChromosomes(grl)
plantgr <- GRanges(c(1:5,"MT","Pltd","wrong"), IRanges(1:8,width=5))
keepStandardChromosomes(plantgr, species="Arabidopsis thaliana")
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(GenomeInfoDb)
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomeInfoDb/seqlevels-wrappers.Rd_%03d_medium.png", width=480, height=480)
> ### Name: seqlevels-wrappers
> ### Title: Convenience wrappers to the seqlevels() getter and setter
> ### Aliases: seqlevels-wrappers keepSeqlevels dropSeqlevels renameSeqlevels
> ### restoreSeqlevels keepStandardChromosomes
> ### Keywords: methods utilities
>
> ### ** Examples
>
>
> ## ---------------------------------------------------------------------
> ## keepSeqlevels / dropSeqlevels
> ## ---------------------------------------------------------------------
>
> ## GRanges / GAlignments:
>
> library(GenomicRanges)
> gr <- GRanges(c("chr1", "chr1", "chr2", "chr3"), IRanges(1:4, width=3))
> seqlevels(gr)
[1] "chr1" "chr2" "chr3"
> ## Keep only 'chr1'
> chr1 <- keepSeqlevels(gr, "chr1")
> ## Drop 'chr1'. Both 'chr2' and 'chr3' are kept.
> chr2 <- dropSeqlevels(gr, "chr1")
>
> library(Rsamtools) # for the ex1.bam file
Loading required package: Biostrings
Loading required package: XVector
> library(GenomicAlignments) # for readGAlignments()
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
>
> fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
> gal <- readGAlignments(fl)
> ## If 'value' is named, the names are ignored.
> seq2 <- keepSeqlevels(gal, c(foo="seq2"))
> seqlevels(seq2)
[1] "seq2"
>
> ## GRangesList / GAlignmentsList:
>
> grl <- split(gr, as.character(seqnames(gr)))
> dropSeqlevels(grl, c("chr1", "chr2"))
GRangesList object of length 1:
$chr3
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr3 [4, 6] *
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> galist <- split(gal, as.character(seqnames(gal)))
> keepSeqlevels(galist, "seq2")
GAlignmentsList object of length 1:
$seq2
GAlignments object with 1789 alignments and 0 metadata columns:
seqnames strand cigar qwidth start end width njunc
[1] seq2 + 36M 36 1 36 36 0
[2] seq2 + 35M 35 1 35 35 0
[3] seq2 - 35M 35 1 35 35 0
[4] seq2 + 35M 35 2 36 35 0
[5] seq2 + 35M 35 5 39 35 0
... ... ... ... ... ... ... ... ...
[1785] seq2 + 35M 35 1524 1558 35 0
[1786] seq2 + 35M 35 1524 1558 35 0
[1787] seq2 - 35M 35 1528 1562 35 0
[1788] seq2 - 35M 35 1532 1566 35 0
[1789] seq2 - 35M 35 1533 1567 35 0
-------
seqinfo: 1 sequence from an unspecified genome
>
> ## TxDb:
>
> ## A TxDb cannot be directly subset with 'keepSeqlevels'
> ## and 'dropSeqlevels'.
> library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
> txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
> seqlevels(txdb)
[1] "chr2L" "chr2R" "chr3L" "chr3R" "chr4" "chrX"
[7] "chrU" "chrM" "chr2LHet" "chr2RHet" "chr3LHet" "chr3RHet"
[13] "chrXHet" "chrYHet" "chrUextra"
> ## Not run:
> ##D keepSeqlevels(txdb, "chr2L") ## fails
> ## End(Not run)
>
> ## GRanges or GRangesLists extracted from the TxDb can be subset.
> txbygene <- transcriptsBy(txdb, "gene")
> seqlevels(txbygene)
[1] "chr2L" "chr2R" "chr3L" "chr3R" "chr4" "chrX"
[7] "chrU" "chrM" "chr2LHet" "chr2RHet" "chr3LHet" "chr3RHet"
[13] "chrXHet" "chrYHet" "chrUextra"
> chr2L <- keepSeqlevels(txbygene, "chr2L")
> seqlevels(chr2L)
[1] "chr2L"
>
> ## ---------------------------------------------------------------------
> ## renameSeqlevels
> ## ---------------------------------------------------------------------
>
> ## GAlignments:
>
> seqlevels(gal)
[1] "seq1" "seq2"
> ## Rename 'seq2' to 'chr2' with a named vector.
> gal2a <- renameSeqlevels(gal, c(seq2="chr2"))
> ## Rename 'seq2' to 'chr2' with an unnamed vector that includes all
> ## seqlevels as they appear in the object.
> gal2b <- renameSeqlevels(gal, c("seq1", "chr2"))
> ## Names that do not match existing seqlevels are ignored.
> ## This attempt at renaming does nothing.
> gal3 <- renameSeqlevels(gal, c(foo="chr2"))
Warning message:
In renameSeqlevels(gal, c(foo = "chr2")) : invalid seqlevels 'foo' ignored
> identical(seqlevels(gal), seqlevels(gal3))
[1] TRUE
>
> ## TxDb:
>
> seqlevels(txdb)
[1] "chr2L" "chr2R" "chr3L" "chr3R" "chr4" "chrX"
[7] "chrU" "chrM" "chr2LHet" "chr2RHet" "chr3LHet" "chr3RHet"
[13] "chrXHet" "chrYHet" "chrUextra"
> ## When the seqlevels of a TxDb are renamed, all future
> ## extractions reflect the modified seqlevels.
> renameSeqlevels(txdb, sub("chr", "CH", seqlevels(txdb)))
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: UCSC
# Genome: dm3
# Organism: Drosophila melanogaster
# Taxonomy ID: 7227
# UCSC Table: ensGene
# Resource URL: http://genome.ucsc.edu/
# Type of Gene ID: Ensembl gene ID
# Full dataset: yes
# miRBase build ID: NA
# transcript_nrow: 29173
# exon_nrow: 76920
# cds_nrow: 62135
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2015-10-07 18:15:53 +0000 (Wed, 07 Oct 2015)
# GenomicFeatures version at creation time: 1.21.30
# RSQLite version at creation time: 1.0.0
# DBSCHEMAVERSION: 1.1
> renameSeqlevels(txdb, c(CHM="M"))
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: UCSC
# Genome: dm3
# Organism: Drosophila melanogaster
# Taxonomy ID: 7227
# UCSC Table: ensGene
# Resource URL: http://genome.ucsc.edu/
# Type of Gene ID: Ensembl gene ID
# Full dataset: yes
# miRBase build ID: NA
# transcript_nrow: 29173
# exon_nrow: 76920
# cds_nrow: 62135
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2015-10-07 18:15:53 +0000 (Wed, 07 Oct 2015)
# GenomicFeatures version at creation time: 1.21.30
# RSQLite version at creation time: 1.0.0
# DBSCHEMAVERSION: 1.1
> seqlevels(txdb)
[1] "CH2L" "CH2R" "CH3L" "CH3R" "CH4" "CHX"
[7] "CHU" "M" "CH2LHet" "CH2RHet" "CH3LHet" "CH3RHet"
[13] "CHXHet" "CHYHet" "CHUextra"
>
> transcripts <- transcripts(txdb)
> identical(seqlevels(txdb), seqlevels(transcripts))
[1] TRUE
>
> ## ---------------------------------------------------------------------
> ## restoreSeqlevels
> ## ---------------------------------------------------------------------
>
> ## Restore seqlevels in a TxDb to original values.
> ## Not run:
> ##D txdb <- restoreSeqlevels(txdb)
> ##D seqlevels(txdb)
> ## End(Not run)
>
> ## ---------------------------------------------------------------------
> ## keepStandardChromosomes
> ## ---------------------------------------------------------------------
>
> gr <- GRanges(c(paste0("chr",c(1:3)), "chr1_gl000191_random",
+ "chr1_gl000192_random"), IRanges(1:5, width=3))
> gr
GRanges object with 5 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [1, 3] *
[2] chr2 [2, 4] *
[3] chr3 [3, 5] *
[4] chr1_gl000191_random [4, 6] *
[5] chr1_gl000192_random [5, 7] *
-------
seqinfo: 5 sequences from an unspecified genome; no seqlengths
> grl <- split(gr,seqnames(gr))
>
> ##GRanges
> keepStandardChromosomes(gr)
GRanges object with 3 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [1, 3] *
[2] chr2 [2, 4] *
[3] chr3 [3, 5] *
-------
seqinfo: 3 sequences from an unspecified genome; no seqlengths
>
> ##GRangesList
> keepStandardChromosomes(grl)
GRangesList object of length 3:
$chr1
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [1, 3] *
$chr2
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
[1] chr2 [2, 4] *
$chr3
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
[1] chr3 [3, 5] *
-------
seqinfo: 3 sequences from an unspecified genome; no seqlengths
>
> plantgr <- GRanges(c(1:5,"MT","Pltd","wrong"), IRanges(1:8,width=5))
> keepStandardChromosomes(plantgr, species="Arabidopsis thaliana")
GRanges object with 7 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 1 [1, 5] *
[2] 2 [2, 6] *
[3] 3 [3, 7] *
[4] 4 [4, 8] *
[5] 5 [5, 9] *
[6] MT [6, 10] *
[7] Pltd [7, 11] *
-------
seqinfo: 7 sequences from an unspecified genome; no seqlengths
>
>
>
>
>
> dev.off()
null device
1
>