A Reference Class to give annotation available on Bgee for particular species and the requested data (rna_seq, affymetrix)
Details
The expression calls come from Bgee (http://r.bgee.org), that integrates different expression data types (RNA-seq, Affymetrix microarray, ESTs, or in-situ hybridizations) in multiple animal species. Expression patterns are based exclusively on curated "normal", healthy, expression data (e.g., no gene knock-out, no treatment, no disease), to provide a reference of normal gene expression.
This Class retrieves annotation of all experiments in Bgee database (get_annotation), downloading the data (get_data), and formating the data into expression matrix (format_data). See examples and vignette.
Value
A get_annotation() list, lists the annotation of experiments for chosen species.
A get_data(), if empty returns a list of experiments, if chosen experiment ID, then returns the dataframe of the chosen experiment; for chosen species
A format_data(), transforms the data into matrix of expression values, e.g. RPKMs or raw counts
Fields
species
A character of species name as listed from Bgee. The species are:
"Anolis_carolinensis"
"Bos_taurus"
"Caenorhabditis_elegans"
"Danio_rerio"
"Drosophila_melanogaster"
"Gallus_gallus"
"Gorilla_gorilla"
"Homo_sapiens"
"Macaca_mulatta"
"Monodelphis_domestica"
"Mus_musculus"
"Ornithorhynchus_anatinus"
"Pan_paniscus"
"Pan_troglodytes"
"Rattus_norvegicus"
"Sus_scrofa"
"Xenopus_tropicalis"
Homo sapiens is default species.
datatype
A character of data platform. Two types of datasets can be downloaded:
"rna_seq"
"affymetrix"
By default, RNA-seq data is retrieved from database.
experiment.id
A character.
On default is NULL: takes all available data for that species.
If GSE[0-9]+: takes specified experiment, eg. GSE30617.
data
A dataframe of downloaded Bgee data.
calltype
A character. There exist two types of expression calls in Bgee - present and absent.
"expressed"
"all"
User can retrieve only expressed (present) calls, or mixed (present and absent) calls. The default is expressed (present) calltype.
stats
A character. The expression values can be retrieved in RPKMs and raw counts:
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(BgeeDB)
Loading required package: topGO
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: graph
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: GO.db
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: IRanges
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: SparseM
Attaching package: 'SparseM'
The following object is masked from 'package:base':
backsolve
groupGOTerms: GOBPTerm, GOMFTerm, GOCCTerm environments built.
Attaching package: 'topGO'
The following object is masked from 'package:IRanges':
members
Loading required package: tidyr
Attaching package: 'tidyr'
The following object is masked from 'package:S4Vectors':
expand
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BgeeDB/Bgee-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: Bgee-class
> ### Title: Retrieving the Bgee database data
> ### Aliases: Bgee Bgee-class
>
> ### ** Examples
>
> {
+ bgee <- Bgee$new(species = "Mus_musculus", datatype = "rna_seq")
+ annotation_bgee_mouse <- bgee$get_annotation()
+ data_bgee_mouse <- bgee$get_data()
+ data_bgee_mouse_gse30617 <- bgee$get_data(experiment.id = "GSE30617")
+ gene.expression.mouse.rpkm <- bgee$format_data(data_bgee_mouse_gse30617,
+ calltype = "expressed", stats = "rpkm")
+ }
Downloading annotation files...
trying URL 'ftp://ftp.bgee.org/current/download/processed_expr_values/rna_seq/Mus_musculus//Mus_musculus_RNA-Seq_experiments_libraries.zip'
ftp data connection made, file length 9292 bytes
==================================================
downloaded 9292 bytes
Saved annotation files in Mus_musculus folder.
The experiment is not defined. Hence taking all rna_seq available for Mus_musculus .
Downloading expression data...
trying URL 'ftp://ftp.bgee.org/current/download/processed_expr_values/rna_seq/Mus_musculus//Mus_musculus_RNA-Seq_read_counts_RPKM.zip'
ftp data connection made, file length 32872528 bytes
==================================================
downloaded 31.3 MB
Saved expression data file in Mus_musculus folder.
Unzipping file...
Read 63.8% of 1410444 rows Read 1410444 rows and 13 (of 13) columns from 0.210 GB file in 00:00:03
Saving all data in .rds file...
Downloading expression data for the experiment GSE30617
Downloading expression data...
trying URL 'ftp://ftp.bgee.org/current/download/processed_expr_values/rna_seq/Mus_musculus//Mus_musculus_RNA-Seq_read_counts_RPKM_GSE30617.tsv.zip'
ftp data connection made, file length 10651169 bytes
==================================================
downloaded 10.2 MB
Saved expression data file in Mus_musculus folder.
Unzipping file...
Error in fread(x) :
'input' must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself
Calls: <Anonymous> -> lapply -> FUN -> as.data.frame -> fread
In addition: Warning message:
In FUN(X[[i]], ...) : error 1 in extracting from zip file
Execution halted