R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Gene set analysis without permutations

GSA.func

R Documentation

Gene set analysis without permutations

Description

Determines the significance of pre-defined sets of genes with respect to an outcome variable, such as a group indicator, quantitative variable or survival time. This is the basic function called by GSA.

Usage

GSA.func(x,y, genesets, genenames,geneset.names=NULL,
 method=c("maxmean","mean","absmean"), resp.type=c("Quantitative",
"Two class unpaired","Survival","Multiclass", "Two class paired",  "tCorr", "taCorr" ),
censoring.status=NULL,
 first.time = TRUE, return.gene.ind = TRUE, 
ngenes = NULL, gs.mat =NULL, gs.ind = NULL,
 catalog = NULL, catalog.unique =NULL, 
s0 = NULL, s0.perc = NULL, minsize = 15, maxsize= 500, restand = TRUE, restand.basis=c("catalog","data"))

Arguments

`x`	Data x: p by n matrix of features, one observation per column (missing values allowed)
`y`	Vector of response values: 1,2 for two class problem, or 1,2,3 ... for multiclass problem, or real numbers for quantitative or survival problems
`genesets`	Gene set collection (a list)
`genenames`	Vector of genenames in expression dataset
`geneset.names`	Optional vector of gene set names
`method`	Method for summarizing a gene set: "maxmean" (default), "mean" or "absmean"
`resp.type`	Problem type: "quantitative" for a continuous parameter; "Two class unpaired" ; "Survival" for censored survival outcome; "Multiclass" : more than 2 groups; "Two class paired" for paired outcomes, coded -1,1 (first pair), -2,2 (second pair), etc
`censoring.status`	Vector of censoring status values for survival problems, 1 mean death or failure, 0 means censored)
`first.time`	internal use
`return.gene.ind`	internal use
`ngenes`	internal use
`gs.mat`	internal use
`gs.ind`	internal use
`catalog`	internal use
`catalog.unique`	internal use
`s0`	Exchangeability factor for denominator of test statistic; Default is automatic choice
`s0.perc`	Percentile of standard deviation values to use for s0; default is automatic choice; -1 means s0=0 (different from s0.perc=0, meaning s0=zeroeth percentile of standard deviation values= min of sd values
`minsize`	Minimum number of genes in genesets to be considered
`maxsize`	Maximum number of genes in genesets to be considered
`restand`	Should restandardization be done? Default TRUE
`restand.basis`	What should be used to do the restandardization? The set of genes in the genesets ("catalog", the default) or the genes in the data set ("data")

Details

Carries out a Gene set analysis, computing the gene set scores. This function does not do any permutations for estimation of false discovery rates. GSA calls this function to estimate FDRs.

Value

A list with components

scores

Gene set scores for each gene set

norm.scores

Gene set scores transformed by the inverse Gaussian cdf

`mean`	Means of gene expression values for each sample
`sd`	Standard deviation of gene expression values for each sample
`gene.ind`	List indicating whch genes in each positive gene set had positive individual scores, and similarly for negative gene sets
`geneset.names`	Names of the gene sets
`nperms`	Number of permutations used
`gene.scores`	Individual gene scores (eg t-statistics for two class problem)
`s0`	Computed exchangeability factor
`s0.perc`	Computed percentile of standard deviation values
`stand.info`	Information computed used in the restandardization process
`method`	Method used (from call to GSA.func)
`call`	The call to GSA

Author(s)

Robert Tibshirani

References

Efron, B. and Tibshirani, R. On testing the significance of sets of genes. Stanford tech report rep 2006. http://www-stat.stanford.edu/~tibs/ftp/GSA.pdf

Examples


######### two class unpaired comparison
# y must take values 1,2

set.seed(100)
x<-matrix(rnorm(1000*20),ncol=20)
dd<-sample(1:1000,size=100)

u<-matrix(2*rnorm(100),ncol=10,nrow=100)
x[dd,11:20]<-x[dd,11:20]+u
y<-c(rep(1,10),rep(2,10))


genenames=paste("g",1:1000,sep="")

#create some random gene sets
genesets=vector("list",50)
for(i in 1:50){
 genesets[[i]]=paste("g",sample(1:1000,size=30),sep="")
}
geneset.names=paste("set",as.character(1:50),sep="")

GSA.func.obj<-GSA.func(x,y, genenames=genenames, genesets=genesets,  resp.type="Two class unpaired")




#to use  "real" gene set collection, we read it in from a gmt file:
# 
# geneset.obj<- GSA.read.gmt("file.gmt")
# 
# where file.gmt is a gene set collection from GSEA collection or
#  or the website http://www-stat.stanford.edu/~tibs/GSA, or one
# that you have created yourself. Then

#   GSA.func.obj<-GSA.func(x,y, genenames=genenames, genesets=geneset.obj$genesets,  resp.type="Two class unpaired")
#
#

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GSA)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/GSA/GSA.func.Rd_%03d_medium.png", width=480, height=480)
> ### Name: GSA.func
> ### Title: Gene set analysis without permutations
> ### Aliases: GSA.func
> ### Keywords: univar survival ts nonparametric
> 
> ### ** Examples
> 
> 
> ######### two class unpaired comparison
> # y must take values 1,2
> 
> set.seed(100)
> x<-matrix(rnorm(1000*20),ncol=20)
> dd<-sample(1:1000,size=100)
> 
> u<-matrix(2*rnorm(100),ncol=10,nrow=100)
> x[dd,11:20]<-x[dd,11:20]+u
> y<-c(rep(1,10),rep(2,10))
> 
> 
> genenames=paste("g",1:1000,sep="")
> 
> #create some random gene sets
> genesets=vector("list",50)
> for(i in 1:50){
+  genesets[[i]]=paste("g",sample(1:1000,size=30),sep="")
+ }
> geneset.names=paste("set",as.character(1:50),sep="")
> 
> GSA.func.obj<-GSA.func(x,y, genenames=genenames, genesets=genesets,  resp.type="Two class unpaired")
> 
> 
> 
> 
> #to use  "real" gene set collection, we read it in from a gmt file:
> # 
> # geneset.obj<- GSA.read.gmt("file.gmt")
> # 
> # where file.gmt is a gene set collection from GSEA collection or
> #  or the website http://www-stat.stanford.edu/~tibs/GSA, or one
> # that you have created yourself. Then
> 
> #   GSA.func.obj<-GSA.func(x,y, genenames=genenames, genesets=geneset.obj$genesets,  resp.type="Two class unpaired")
> #
> #
> 
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>