Last data update: 2014.03.03

R: Generate Profiles
generateprofilesR Documentation

Generate Profiles

Description

Processing Affymetrix data to generate ranked lists of differential gene expression and associated p-values.

Usage

generateprofiles(input = c("AE", "GEO", "localAE", "local"), normalisation = c("rma", "mas5"), accession = NULL, customfile = NULL, celfilepath = NULL, sdrfpath = NULL, case = c("disease", "drug"), statistic = c("coef", "t", "diff"), annotation = NULL, factorvalue = NULL,annotationmap=NULL,type=c("average","medpolish","maxvar","max"),outputgenedata=FALSE)

Arguments

input

Character string denoting the source of the data. One of AE (default), GEO, localAE or local.

normalisation

Character string denoting the normalisation procedure as implemented in the affy package. One of mas5 (default) or rma.

accession

Optional character string giving the database reference for use with either the AE or GEO options.

customfile

Optional character string giving the path of a file containing the factor values associated with the CEL files specified in folder celfilepath

celfilepath

Optional character string giving the path of a folder containing CEL files to analyse.

sdrfpath

Optional character string giving path of an sdrf file corresponding to CEL files in celfilepath

case

Character string, one of disease (default) or drug denoting whether the input profiles are disease or drug profiles.

statistic

Character string, one of coef (default), t or diff.

annotation

Optional character string giving the platform of the affymetrix files

factorvalue

Optional character string giving the name of the factor value in the GEO database.

annotationmap

Optional matrix, or string to text file, containing an annotation map to convert from probes (first column) to HUGO gene symbols (second column). If passing a file path name the text file should have only two columns without rownames or headers.

type

The type of statistic to use to combine multiple probes to a single gene. Can be one of average (default) expression values, median polish, maxvar: the single probe to represent the set which has maximum variance or max to use the probe with maximal variance.

outputgenedata

Boolean set to default FALSE. Outputs the gene data produced by generate profiles instead of the fitted coefficients from the linear models.

Details

Input types of AE and GEO use raw data download from Array Express using the ArrayExpress [1] package or processed GDS files from GEO using the GEOquery package [2]. CEL files and sdrf files downloaded from Array Express and stored locally can be processed using localAE option with the sdrf file path specified in sdrfpath and the path of the folder containing the CEL files contained in celfilepath. Users data stored locally can be processed using the local option with CEL file folders in celfilepath and factors associated with the CEL files in customfile. Where metadata may be missing from the GEO database, platform annotations can be specified using the annotation parameters and the name of main factor value (e.g. disease status, or compound treatment) using factorvalue option. Raw CEL files are normalised (rma or mast)[3] and data is converted from probes to genes using BioMart annotations [4]. Linear models are fitted using the database factor vales or user provided factors for locally stored data [5]. The differential expression is calculated for HUGO genes with the mapping performed automatically for Affymetrix platforms, HGU133A, HGU133Plus2 and HGU133A2 using BioMart. The differential expression statistic is one of coef (default), which corresponds to log (base 2) FC, diff (which is the difference between raw (non-logged) expression values, or t for the t-statistic based on log base 2 expression values.

Value

List with two elements:

Ranklist

Matrix containing the ranks of gene expression. Rows containing the genes, columns the different profiles

Pvalues

Matrix containing the associated p-values to the differential expression profiles in Ranklist

Author(s)

C. Pacini

References

[1]Kauffmann et al. (2009) Importing Array Express datasets into R/Bioconductor. Bioinformatics, 25(16):2092-4.

[2]Davis et al. (2007) GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics, 14, 1846-1847.

[3]Irizarry et al. (2003) Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Research, 31(4); e15.

[4]Durinck et al. (2009). Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nature Protocols 4, 1184-1191.

[5]Smyth et al. (2004). Linear models and empirical Bayes method for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, Vol. 3, No. 1, Article 3.

See Also

classifyprofile

Examples


profiles<-generateprofiles(input="GEO",accession="GDS2617",case="disease",statistic="t",annotation="hgu133a")


Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(DrugVsDisease)
Loading required package: affy
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: limma

Attaching package: 'limma'

The following object is masked from 'package:BiocGenerics':

    plotMA

Loading required package: biomaRt
Loading required package: ArrayExpress
Loading required package: GEOquery
Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)
Loading required package: DrugVsDiseasedata
Loading required package: cMap2data

Attaching package: 'cMap2data'

The following object is masked from 'package:DrugVsDiseasedata':

    genelist

Loading required package: qvalue
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/DrugVsDisease/generateprofiles.Rd_%03d_medium.png", width=480, height=480)
> ### Name: generateprofiles
> ### Title: Generate Profiles
> ### Aliases: generateprofiles
> ### Keywords: ~kwd1 ~kwd2
> 
> ### ** Examples
> 
> 
> profiles<-generateprofiles(input="GEO",accession="GDS2617",case="disease",statistic="t",annotation="hgu133a")
File stored at: 
./GDS2617.soft.gz
File stored at: 
/tmp/Rtmptyv6ZZ/GPL96.annot.gz
Note: Ensembl genes do not match genelist in reference data. Consider uploading pre-processed lists to classifyprofiles
[1] "Fitted linear models: "
[1] "factornon.tumorigenic.cancer.cell-factornormal:all"                 
[2] "factornon.tumorigenic.cancer.cell-factortumorigenic.cancer.cell:all"
[3] "factornormal-factortumorigenic.cancer.cell:all"                     
[4] "factornormal-factornon.tumorigenic.cancer.cell:all"                 
[5] "factortumorigenic.cancer.cell-factornon.tumorigenic.cancer.cell:all"
[6] "factortumorigenic.cancer.cell-factornormal:all"                     
There were 32 warnings (use warnings() to see them)
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>