Determines the significance of pre-defined sets of genes with respect to
an outcome
variable, such as a group indicator, quantitative variable or survival time.
This is the basic function called by GSA.
Data x: p by n matrix of features,
one observation per column (missing values allowed)
y
Vector of response values: 1,2 for two class
problem, or 1,2,3 ... for multiclass problem, or real numbers for quantitative
or survival problems
genesets
Gene set collection (a list)
genenames
Vector of genenames in expression dataset
geneset.names
Optional vector of gene set names
method
Method for summarizing a gene set: "maxmean" (default), "mean" or "absmean"
resp.type
Problem type: "quantitative" for a continuous parameter;
"Two class unpaired" ; "Survival" for censored survival outcome; "Multiclass" :
more than 2 groups; "Two class paired" for paired outcomes, coded -1,1 (first pair), -2,2 (second pair), etc
censoring.status
Vector of censoring status values for survival problems,
1 mean death or failure, 0 means censored)
first.time
internal use
return.gene.ind
internal use
ngenes
internal use
gs.mat
internal use
gs.ind
internal use
catalog
internal use
catalog.unique
internal use
s0
Exchangeability factor for denominator of test statistic; Default
is automatic choice
s0.perc
Percentile of standard deviation values to use for s0; default is
automatic choice; -1 means s0=0 (different from s0.perc=0, meaning
s0=zeroeth percentile of standard deviation values= min of sd values
minsize
Minimum number of genes in genesets to be considered
maxsize
Maximum number of genes in genesets to be considered
restand
Should restandardization be done? Default TRUE
restand.basis
What should be used to do the restandardization?
The set of genes in the genesets ("catalog", the default) or the
genes in the data set ("data")
Details
Carries out a Gene set analysis, computing the gene set scores.
This function does not do any permutations for estimation of false discovery rates.
GSA calls this function to estimate FDRs.
Value
A list with components
scores
Gene set scores for each gene set
,
norm.scores
Gene set scores transformed by the inverse Gaussian cdf
,
mean
Means of gene expression values for each sample
sd
Standard deviation of gene expression values for each sample
gene.ind
List indicating whch genes in each positive gene set
had positive individual scores, and similarly for negative gene sets
geneset.names
Names of the gene sets
nperms
Number of permutations used
gene.scores
Individual gene scores (eg t-statistics for two class problem)
s0
Computed exchangeability factor
s0.perc
Computed percentile of standard deviation values
stand.info
Information computed used in the restandardization process
method
Method used (from call to GSA.func)
call
The call to GSA
Author(s)
Robert Tibshirani
References
Efron, B. and Tibshirani, R.
On testing the significance of sets of genes. Stanford tech report rep 2006.
http://www-stat.stanford.edu/~tibs/ftp/GSA.pdf
Examples
######### two class unpaired comparison
# y must take values 1,2
set.seed(100)
x<-matrix(rnorm(1000*20),ncol=20)
dd<-sample(1:1000,size=100)
u<-matrix(2*rnorm(100),ncol=10,nrow=100)
x[dd,11:20]<-x[dd,11:20]+u
y<-c(rep(1,10),rep(2,10))
genenames=paste("g",1:1000,sep="")
#create some random gene sets
genesets=vector("list",50)
for(i in 1:50){
genesets[[i]]=paste("g",sample(1:1000,size=30),sep="")
}
geneset.names=paste("set",as.character(1:50),sep="")
GSA.func.obj<-GSA.func(x,y, genenames=genenames, genesets=genesets, resp.type="Two class unpaired")
#to use "real" gene set collection, we read it in from a gmt file:
#
# geneset.obj<- GSA.read.gmt("file.gmt")
#
# where file.gmt is a gene set collection from GSEA collection or
# or the website http://www-stat.stanford.edu/~tibs/GSA, or one
# that you have created yourself. Then
# GSA.func.obj<-GSA.func(x,y, genenames=genenames, genesets=geneset.obj$genesets, resp.type="Two class unpaired")
#
#
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(GSA)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/GSA/GSA.func.Rd_%03d_medium.png", width=480, height=480)
> ### Name: GSA.func
> ### Title: Gene set analysis without permutations
> ### Aliases: GSA.func
> ### Keywords: univar survival ts nonparametric
>
> ### ** Examples
>
>
> ######### two class unpaired comparison
> # y must take values 1,2
>
> set.seed(100)
> x<-matrix(rnorm(1000*20),ncol=20)
> dd<-sample(1:1000,size=100)
>
> u<-matrix(2*rnorm(100),ncol=10,nrow=100)
> x[dd,11:20]<-x[dd,11:20]+u
> y<-c(rep(1,10),rep(2,10))
>
>
> genenames=paste("g",1:1000,sep="")
>
> #create some random gene sets
> genesets=vector("list",50)
> for(i in 1:50){
+ genesets[[i]]=paste("g",sample(1:1000,size=30),sep="")
+ }
> geneset.names=paste("set",as.character(1:50),sep="")
>
> GSA.func.obj<-GSA.func(x,y, genenames=genenames, genesets=genesets, resp.type="Two class unpaired")
>
>
>
>
> #to use "real" gene set collection, we read it in from a gmt file:
> #
> # geneset.obj<- GSA.read.gmt("file.gmt")
> #
> # where file.gmt is a gene set collection from GSEA collection or
> # or the website http://www-stat.stanford.edu/~tibs/GSA, or one
> # that you have created yourself. Then
>
> # GSA.func.obj<-GSA.func(x,y, genenames=genenames, genesets=geneset.obj$genesets, resp.type="Two class unpaired")
> #
> #
>
>
>
>
>
>
>
>
> dev.off()
null device
1
>