Last data update: 2014.03.03

R: Filter results fetched from the Ensembl database
GeneidFilter-classR Documentation

Filter results fetched from the Ensembl database

Description

These classes allow to specify which entries (i.e. genes, transcripts or exons) should be retrieved from the database.

Details

ExonidFilter

Allows to filter based on the (Ensembl) exon identifier.

ExonrankFilter

Allows to filter based on the rank (index) of the exon within the transcript model. Exons are always numbered 5' to 3' end of the transcript, thus, also on the reverse strand, the exon 1 is the most 5' exon of the transcript.

EntrezidFilter

Filter results based on the NCBI Entrezgene identifierts of the genes. Use the listGenebiotypes method to get a complete list of all available gene biotypes.

GenebiotypeFilter

Filter results based on the gene biotype as defined in the Ensembl database.

GeneidFilter

Filter results based on the Ensembl gene identifiers.

GenenameFilter

Allows to filter on the gene names (symbols) of the genes.

GRangesFilter

Allows to fetch features within or overlapping specified genomic region(s)/range(s). This filter takes a GRanges object as input and, if condition="within" (the default) will restrict results to features (genes, transcripts or exons) that are completely within the region. Alternatively, by specifying condition="overlapping" it will return all features (i.e. genes for a call to genes, transcripts for a call to transcripts and exons for a call to exons) that are partially overlapping with the region, i.e. which start coordinate is smaller than the end coordinate of the region and which end coordinate is larger than the start coordinate of the region. Thus, genes and transcripts that have an intron overlapping the region will also be returned.

Calls to the methods exonsBy, cdsBy and transcriptsBy use the start and end coordinates of the feature type specified with argument by (i.e. "gene", "transcript" or "exon") for the filtering.

Note: if the specified GRanges object defines multiple region, all features within (or overlapping) any of these regions are returned.

Chromosome names/seqnames can be provided in UCSC format (e.g. "chrX") or Ensembl format (e.g. "X"); see seqlevelsStyle for more information.

SeqendFilter

Filter based on the chromosomal end coordinate of the exons, transcripts or genes.

SeqnameFilter

Filter on the sequence name on which the features are encoded (mostly the chromosome names). Supports UCSC chromosome names (e.g. "chrX") and Ensembl chromosome names (e.g. "X").

SeqstartFilter

Filter based on the chromosomal start coordinates of the exons, transcripts or genes.

SeqstrandFilter

Filter based on the strand on which the features are encoded.

TxbiotypeFilter

Filter on the transcript biotype defined in Ensembl. Use the listTxbiotypes method to get a complete list of all available transcript biotypes.

TxidFilter

Filter on the Ensembl transcript identifiers.

Objects from the Class

While objects can be created by calls e.g. of the form new("GeneidFilter", ...) users are strongly encouraged to use the specific functions: GeneidFilter, EntrezidFilter, GenenameFilter, GenebiotypeFilter, GRangesFilter, TxidFilter, TxbiotypeFilter, ExonidFilter, ExonrankFilter, SeqnameFilter, SeqstrandFilter, SeqstartFilter and SeqendFilter.

See examples below for usage.

Slots

condition:

Object of class "character": can be either "=", "in" or "like" to filter on character values (e.g. gene id, gene biotype, seqname etc), or "=", ">" or "<" for numerical values (chromosome/seq coordinates). Note that for "like" value should be a SQL pattern (e.g. "ENS%").

value:

Object of class "character": the value to be used for filtering.

Extends

Class BasicFilter, directly.

Methods for all BasicFilter objects

Note: these methods are applicable to all classes extending the BasicFilter class.

column

signature(object = "GeneidFilter", db="EnsDb", with.tables="character"): returns the column (attribute name) to be used for the filtering. Submitting the db parameter ensures that returned column is valid in the corresponding database schema. The optional argument with.tables allows to specify which in which database table the function should look for the attribute/column name. By default the method will check all database tables.

column

signature(object = "GeneidFilter", db="EnsDb", with.tables="missing"): returns the column (attribute name) to be used for the filtering. Submitting the db parameter ensures that returned column is valid in the corresponding database schema.

column

signature(object = "GeneidFilter", db="missing", with.tables="missing"): returns the column (table column name) to be used for the filtering.

condition

signature(x="BasicFilter"): returns the value for the condition slot.

condition<-

setter method for condition.

value

signature(x="BasicFilter", db="EnsDb"): returns the value of the value slot of the filter object.

value<-

setter method for value.

where

signature(object = "GeneidFilter", db="EnsDb", with.tables="character"): returns the where condition for the SQL call. Submitting also the db parameter ensures that the columns are valid in the corresponding database schema. The optional argument with.tables allows to specify which in which database table the function should look for the attribute/column name. By default the method will check all database tables.

where

signature(object = "GeneidFilter", db="EnsDb", with.tables="missing"): returns the where condition for the SQL call. Submitting also the db parameter ensures that the columns are valid in the corresponding database schema.

where

signature(object = "GeneidFilter", db="missing", with.tables="missing"): returns the where condition for the SQL call.

Methods for GRangesFilter objects

start, end, strand

Get the start and end coordinate and the strand from the GRanges within the filter.

seqlevels, seqnames

Get the names of the sequences from the GRanges of the filter.

Note

The column and where methods should be always called along with the EnsDb object, as this ensures that the returned column names are valid for the database schema. The optional argument with.tables should on the other hand only be used rarely as it is more intended for internal use.

Note that the database column "entrezid" queried for EntrezidFilter classes can contain multiple, ";" separated, Entrezgene IDs, thus, using this filter at present might not return all entries from the database.

Author(s)

Johannes Rainer

See Also

genes, transcripts, exons, listGenebiotypes, listTxbiotypes

Examples


## create a filter that could be used to retrieve all informations for
## the respective gene.
Gif <- GeneidFilter("ENSG00000012817")
Gif
## returns the where condition of the SQL querys
where(Gif)

## create a filter for a chromosomal end position of a gene
Sef <- SeqendFilter(10000, condition=">", "gene")
Sef

## for additional examples see the help page of "genes"


## Example for GRangesFilter:
## retrieve all genes overlapping the specified region
grf <- GRangesFilter(GRanges("11", ranges=IRanges(114000000, 114000050),
                             strand="+"), condition="overlapping")
library(EnsDb.Hsapiens.v75)
edb <- EnsDb.Hsapiens.v75
genes(edb, filter=grf)

## Get also all transcripts overlapping that region
transcripts(edb, filter=grf)

## Retrieve all transcripts for the above gene
gn <- genes(edb, filter=grf)
txs <- transcripts(edb, filter=GenenameFilter(gn$gene_name))
## Next we simply plot their start and end coordinates.
plot(3, 3, pch=NA, xlim=c(start(gn), end(gn)), ylim=c(0, length(txs)), yaxt="n", ylab="")
## Highlight the GRangesFilter region
rect(xleft=start(grf), xright=end(grf), ybottom=0, ytop=length(txs), col="red", border="red")
for(i in 1:length(txs)){
    current <- txs[i]
    rect(xleft=start(current), xright=end(current), ybottom=i-0.975, ytop=i-0.125, border="grey")
    text(start(current), y=i-0.5,pos=4, cex=0.75, labels=current$tx_id)
}
## Thus, we can see that only 4 transcripts of that gene are indeed overlapping the region.


## No exon is overlapping that region, thus we're not getting anything
exons(edb, filter=grf)


## Example for ExonrankFilter
## Extract all exons 1 and (if present) 2 for all genes encoded on the
## Y chromosome
exons(edb, columns=c("tx_id", "exon_idx"),
      filter=list(SeqnameFilter("Y"),
                  ExonrankFilter(3, condition="<")))


Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ensembldb)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: GenomicRanges
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/ensembldb/GeneidFilter-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: GeneidFilter-class
> ### Title: Filter results fetched from the Ensembl database
> ### Aliases: BasicFilter-class EntrezidFilter-class GeneidFilter-class
> ###   GenebiotypeFilter-class GenenameFilter-class TxidFilter-class
> ###   TxbiotypeFilter-class ExonidFilter-class SeqnameFilter-class
> ###   SeqstrandFilter-class SeqstartFilter-class SeqendFilter-class
> ###   GRangesFilter-class ExonrankFilter-class
> ###   column,EntrezidFilter,missing,missing-method
> ###   column,GeneidFilter,missing,missing-method
> ###   column,GenenameFilter,missing,missing-method
> ###   column,GenebiotypeFilter,missing,missing-method
> ###   column,TxidFilter,missing,missing-method
> ###   column,TxbiotypeFilter,missing,missing-method
> ###   column,ExonidFilter,missing,missing-method
> ###   column,ExonrankFilter,missing,missing-method
> ###   column,SeqnameFilter,missing,missing-method
> ###   column,SeqstrandFilter,missing,missing-method
> ###   column,SeqstartFilter,missing,missing-method
> ###   column,SeqendFilter,missing,missing-method
> ###   column,GRangesFilter,missing,missing-method
> ###   where,EntrezidFilter,missing,missing-method
> ###   where,GeneidFilter,missing,missing-method
> ###   where,GenenameFilter,missing,missing-method
> ###   where,GenebiotypeFilter,missing,missing-method
> ###   where,TxidFilter,missing,missing-method
> ###   where,TxbiotypeFilter,missing,missing-method
> ###   where,ExonidFilter,missing,missing-method
> ###   where,ExonrankFilter,missing,missing-method
> ###   where,SeqnameFilter,missing,missing-method
> ###   where,SeqstrandFilter,missing,missing-method
> ###   where,SeqstartFilter,missing,missing-method
> ###   where,SeqendFilter,missing,missing-method
> ###   where,GRangesFilter,missing,missing-method
> ###   column,EntrezidFilter,EnsDb,missing-method
> ###   column,GeneidFilter,EnsDb,missing-method
> ###   column,GenenameFilter,EnsDb,missing-method
> ###   column,GenebiotypeFilter,EnsDb,missing-method
> ###   column,TxidFilter,EnsDb,missing-method
> ###   column,TxbiotypeFilter,EnsDb,missing-method
> ###   column,ExonidFilter,EnsDb,missing-method
> ###   column,ExonrankFilter,EnsDb,missing-method
> ###   column,SeqnameFilter,EnsDb,missing-method
> ###   column,SeqstrandFilter,EnsDb,missing-method
> ###   column,SeqstartFilter,EnsDb,missing-method
> ###   column,SeqendFilter,EnsDb,missing-method
> ###   column,GRangesFilter,EnsDb,missing-method
> ###   where,EntrezidFilter,EnsDb,missing-method
> ###   where,GeneidFilter,EnsDb,missing-method
> ###   where,GenenameFilter,EnsDb,missing-method
> ###   where,GenebiotypeFilter,EnsDb,missing-method
> ###   where,TxidFilter,EnsDb,missing-method
> ###   where,TxbiotypeFilter,EnsDb,missing-method
> ###   where,ExonidFilter,EnsDb,missing-method
> ###   where,ExonrankFilter,EnsDb,missing-method
> ###   where,SeqnameFilter,EnsDb,missing-method
> ###   where,SeqstrandFilter,EnsDb,missing-method
> ###   where,SeqstartFilter,EnsDb,missing-method
> ###   where,SeqendFilter,EnsDb,missing-method
> ###   where,GRangesFilter,EnsDb,missing-method
> ###   column,EntrezidFilter,EnsDb,character-method
> ###   column,GeneidFilter,EnsDb,character-method
> ###   column,GenenameFilter,EnsDb,character-method
> ###   column,GenebiotypeFilter,EnsDb,character-method
> ###   column,TxidFilter,EnsDb,character-method
> ###   column,TxbiotypeFilter,EnsDb,character-method
> ###   column,ExonidFilter,EnsDb,character-method
> ###   column,ExonrankFilter,EnsDb,character-method
> ###   column,SeqnameFilter,EnsDb,character-method
> ###   column,SeqstrandFilter,EnsDb,character-method
> ###   column,SeqstartFilter,EnsDb,character-method
> ###   column,SeqendFilter,EnsDb,character-method
> ###   column,GRangesFilter,EnsDb,character-method
> ###   where,EntrezidFilter,EnsDb,character-method
> ###   where,GeneidFilter,EnsDb,character-method
> ###   where,GenenameFilter,EnsDb,character-method
> ###   where,GenebiotypeFilter,EnsDb,character-method
> ###   where,TxidFilter,EnsDb,character-method
> ###   where,TxbiotypeFilter,EnsDb,character-method
> ###   where,ExonidFilter,EnsDb,character-method
> ###   where,ExonrankFilter,EnsDb,character-method
> ###   where,SeqnameFilter,EnsDb,character-method
> ###   where,SeqstrandFilter,EnsDb,character-method
> ###   where,SeqstartFilter,EnsDb,character-method
> ###   where,SeqendFilter,EnsDb,character-method
> ###   where,GRangesFilter,EnsDb,character-method
> ###   condition,BasicFilter-method condition<-,BasicFilter-method
> ###   condition<- condition,GRangesFilter-method
> ###   condition<-,GRangesFilter-method show,BasicFilter-method
> ###   show,GRangesFilter-method print,BasicFilter-method
> ###   where,BasicFilter,missing,missing-method
> ###   where,BasicFilter,EnsDb,missing-method
> ###   where,BasicFilter,EnsDb,character-method
> ###   where,list,EnsDb,character-method where,list,EnsDb,missing-method
> ###   where,list,missing,missing-method value,BasicFilter,missing-method
> ###   value<- value<-,BasicFilter-method value<-,ExonrankFilter-method
> ###   value,BasicFilter,EnsDb-method value,GRangesFilter,missing-method
> ###   value,GRangesFilter,EnsDb-method value,SeqnameFilter,EnsDb-method
> ###   condition value column where end,GRangesFilter-method
> ###   seqlevels,GRangesFilter-method seqnames,GRangesFilter-method
> ###   start,GRangesFilter-method strand,GRangesFilter-method
> ### Keywords: classes
> 
> ### ** Examples
> 
> 
> ## create a filter that could be used to retrieve all informations for
> ## the respective gene.
> Gif <- GeneidFilter("ENSG00000012817")
> Gif
|  GeneidFilter 
| condition:  = 
| value:  ENSG00000012817 
> ## returns the where condition of the SQL querys
> where(Gif)
[1] "gene_id = 'ENSG00000012817'"
> 
> ## create a filter for a chromosomal end position of a gene
> Sef <- SeqendFilter(10000, condition=">", "gene")
> Sef
|  SeqendFilter 
| condition:  > 
| value:  10000 
> 
> ## for additional examples see the help page of "genes"
> 
> 
> ## Example for GRangesFilter:
> ## retrieve all genes overlapping the specified region
> grf <- GRangesFilter(GRanges("11", ranges=IRanges(114000000, 114000050),
+                              strand="+"), condition="overlapping")
> library(EnsDb.Hsapiens.v75)
> edb <- EnsDb.Hsapiens.v75
> genes(edb, filter=grf)
GRanges object with 1 range and 5 metadata columns:
                  seqnames                 ranges strand |         gene_id
                     <Rle>              <IRanges>  <Rle> |     <character>
  ENSG00000109906       11 [113930315, 114121398]      + | ENSG00000109906
                    gene_name    entrezid   gene_biotype seq_coord_system
                  <character> <character>    <character>      <character>
  ENSG00000109906      ZBTB16        7704 protein_coding       chromosome
  -------
  seqinfo: 1 sequence from GRCh37 genome
> 
> ## Get also all transcripts overlapping that region
> transcripts(edb, filter=grf)
GRanges object with 4 ranges and 5 metadata columns:
                  seqnames                 ranges strand |           tx_id
                     <Rle>              <IRanges>  <Rle> |     <character>
  ENST00000335953       11 [113930315, 114121398]      + | ENST00000335953
  ENST00000541602       11 [113930447, 114060486]      + | ENST00000541602
  ENST00000392996       11 [113931229, 114121374]      + | ENST00000392996
  ENST00000539918       11 [113935134, 114118066]      + | ENST00000539918
                               tx_biotype tx_cds_seq_start tx_cds_seq_end
                              <character>        <numeric>      <numeric>
  ENST00000335953          protein_coding        113934023      114121277
  ENST00000541602         retained_intron             <NA>           <NA>
  ENST00000392996          protein_coding        113934023      114121277
  ENST00000539918 nonsense_mediated_decay        113935134      113992549
                          gene_id
                      <character>
  ENST00000335953 ENSG00000109906
  ENST00000541602 ENSG00000109906
  ENST00000392996 ENSG00000109906
  ENST00000539918 ENSG00000109906
  -------
  seqinfo: 1 sequence from GRCh37 genome
> 
> ## Retrieve all transcripts for the above gene
> gn <- genes(edb, filter=grf)
> txs <- transcripts(edb, filter=GenenameFilter(gn$gene_name))
> ## Next we simply plot their start and end coordinates.
> plot(3, 3, pch=NA, xlim=c(start(gn), end(gn)), ylim=c(0, length(txs)), yaxt="n", ylab="")
> ## Highlight the GRangesFilter region
> rect(xleft=start(grf), xright=end(grf), ybottom=0, ytop=length(txs), col="red", border="red")
> for(i in 1:length(txs)){
+     current <- txs[i]
+     rect(xleft=start(current), xright=end(current), ybottom=i-0.975, ytop=i-0.125, border="grey")
+     text(start(current), y=i-0.5,pos=4, cex=0.75, labels=current$tx_id)
+ }
> ## Thus, we can see that only 4 transcripts of that gene are indeed overlapping the region.
> 
> 
> ## No exon is overlapping that region, thus we're not getting anything
> exons(edb, filter=grf)
GRanges object with 0 ranges and 1 metadata column:
   seqnames    ranges strand |     exon_id
      <Rle> <IRanges>  <Rle> | <character>
  -------
  seqinfo: no sequences
> 
> 
> ## Example for ExonrankFilter
> ## Extract all exons 1 and (if present) 2 for all genes encoded on the
> ## Y chromosome
> exons(edb, columns=c("tx_id", "exon_idx"),
+       filter=list(SeqnameFilter("Y"),
+                   ExonrankFilter(3, condition="<")))
GRanges object with 1287 ranges and 3 metadata columns:
                  seqnames               ranges strand |         exon_id
                     <Rle>            <IRanges>  <Rle> |     <character>
  ENSE00002088309        Y   [2652790, 2652894]      + | ENSE00002088309
  ENSE00001494622        Y   [2654896, 2655740]      - | ENSE00001494622
  ENSE00002323146        Y   [2655049, 2655069]      - | ENSE00002323146
  ENSE00002201849        Y   [2655075, 2655644]      - | ENSE00002201849
  ENSE00002214525        Y   [2655145, 2655168]      - | ENSE00002214525
              ...      ...                  ...    ... .             ...
  ENSE00001632993        Y [28737695, 28737748]      - | ENSE00001632993
  ENSE00001616687        Y [28772667, 28773306]      - | ENSE00001616687
  ENSE00001638296        Y [28779492, 28779578]      - | ENSE00001638296
  ENSE00001797328        Y [28780670, 28780799]      - | ENSE00001797328
  ENSE00001794473        Y [59001391, 59001635]      + | ENSE00001794473
                            tx_id  exon_idx
                      <character> <integer>
  ENSE00002088309 ENST00000516032         1
  ENSE00001494622 ENST00000383070         1
  ENSE00002323146 ENST00000525526         2
  ENSE00002201849 ENST00000525526         1
  ENSE00002214525 ENST00000534739         2
              ...             ...       ...
  ENSE00001632993 ENST00000456738         1
  ENSE00001616687 ENST00000435741         1
  ENSE00001638296 ENST00000435945         2
  ENSE00001797328 ENST00000435945         1
  ENSE00001794473 ENST00000431853         1
  -------
  seqinfo: 1 sequence from GRCh37 genome
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>