Last data update: 2014.03.03

R: Retrieving the Bgee database data
Bgee-classR Documentation

Retrieving the Bgee database data

Description

A Reference Class to give annotation available on Bgee for particular species and the requested data (rna_seq, affymetrix)

Details

The expression calls come from Bgee (http://r.bgee.org), that integrates different expression data types (RNA-seq, Affymetrix microarray, ESTs, or in-situ hybridizations) in multiple animal species. Expression patterns are based exclusively on curated "normal", healthy, expression data (e.g., no gene knock-out, no treatment, no disease), to provide a reference of normal gene expression. This Class retrieves annotation of all experiments in Bgee database (get_annotation), downloading the data (get_data), and formating the data into expression matrix (format_data). See examples and vignette.

Value

  • A get_annotation() list, lists the annotation of experiments for chosen species.

  • A get_data(), if empty returns a list of experiments, if chosen experiment ID, then returns the dataframe of the chosen experiment; for chosen species

  • A format_data(), transforms the data into matrix of expression values, e.g. RPKMs or raw counts

Fields

species

A character of species name as listed from Bgee. The species are:

  • "Anolis_carolinensis"

  • "Bos_taurus"

  • "Caenorhabditis_elegans"

  • "Danio_rerio"

  • "Drosophila_melanogaster"

  • "Gallus_gallus"

  • "Gorilla_gorilla"

  • "Homo_sapiens"

  • "Macaca_mulatta"

  • "Monodelphis_domestica"

  • "Mus_musculus"

  • "Ornithorhynchus_anatinus"

  • "Pan_paniscus"

  • "Pan_troglodytes"

  • "Rattus_norvegicus"

  • "Sus_scrofa"

  • "Xenopus_tropicalis"

Homo sapiens is default species.

datatype

A character of data platform. Two types of datasets can be downloaded:

  • "rna_seq"

  • "affymetrix"

By default, RNA-seq data is retrieved from database.

experiment.id

A character. On default is NULL: takes all available data for that species. If GSE[0-9]+: takes specified experiment, eg. GSE30617.

data

A dataframe of downloaded Bgee data.

calltype

A character. There exist two types of expression calls in Bgee - present and absent.

  • "expressed"

  • "all"

User can retrieve only expressed (present) calls, or mixed (present and absent) calls. The default is expressed (present) calltype.

stats

A character. The expression values can be retrieved in RPKMs and raw counts:

  • "rpkm"

  • "counts"

The default is RPKMs.

Author(s)

Andrea Komljenovic andrea.komljenovic at unil.ch.

Examples

{
 bgee <- Bgee$new(species = "Mus_musculus", datatype = "rna_seq")
 annotation_bgee_mouse <- bgee$get_annotation()
 data_bgee_mouse <- bgee$get_data()
 data_bgee_mouse_gse30617 <- bgee$get_data(experiment.id = "GSE30617")
 gene.expression.mouse.rpkm <- bgee$format_data(data_bgee_mouse_gse30617,
 calltype = "expressed", stats = "rpkm")
 }


Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(BgeeDB)
Loading required package: topGO
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: graph
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: GO.db
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums


Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve


groupGOTerms: 	GOBPTerm, GOMFTerm, GOCCTerm environments built.

Attaching package: 'topGO'

The following object is masked from 'package:IRanges':

    members

Loading required package: tidyr

Attaching package: 'tidyr'

The following object is masked from 'package:S4Vectors':

    expand

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BgeeDB/Bgee-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: Bgee-class
> ### Title: Retrieving the Bgee database data
> ### Aliases: Bgee Bgee-class
> 
> ### ** Examples
> 
> {
+  bgee <- Bgee$new(species = "Mus_musculus", datatype = "rna_seq")
+  annotation_bgee_mouse <- bgee$get_annotation()
+  data_bgee_mouse <- bgee$get_data()
+  data_bgee_mouse_gse30617 <- bgee$get_data(experiment.id = "GSE30617")
+  gene.expression.mouse.rpkm <- bgee$format_data(data_bgee_mouse_gse30617,
+  calltype = "expressed", stats = "rpkm")
+  }
Downloading annotation files...
trying URL 'ftp://ftp.bgee.org/current/download/processed_expr_values/rna_seq/Mus_musculus//Mus_musculus_RNA-Seq_experiments_libraries.zip'
ftp data connection made, file length 9292 bytes
==================================================
downloaded 9292 bytes

Saved annotation files in Mus_musculus folder.
The experiment is not defined. Hence taking all rna_seq available for Mus_musculus .
Downloading expression data...
trying URL 'ftp://ftp.bgee.org/current/download/processed_expr_values/rna_seq/Mus_musculus//Mus_musculus_RNA-Seq_read_counts_RPKM.zip'
ftp data connection made, file length 32872528 bytes
==================================================
downloaded 31.3 MB

Saved expression data file in Mus_musculus folder.
Unzipping file...
 Read 63.8% of 1410444 rows Read 1410444 rows and 13 (of 13) columns from 0.210 GB file in 00:00:03
Saving all data in .rds file...
Downloading expression data for the experiment GSE30617 
Downloading expression data...
trying URL 'ftp://ftp.bgee.org/current/download/processed_expr_values/rna_seq/Mus_musculus//Mus_musculus_RNA-Seq_read_counts_RPKM_GSE30617.tsv.zip'
ftp data connection made, file length 10651169 bytes
==================================================
downloaded 10.2 MB

Saved expression data file in Mus_musculus folder.
Unzipping file...
Error in fread(x) : 
  'input' must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself
Calls: <Anonymous> -> lapply -> FUN -> as.data.frame -> fread
In addition: Warning message:
In FUN(X[[i]], ...) : error 1 in extracting from zip file
Execution halted