A charater string or number to indicated the gene sets under
conserderation.
nf
The number of PCs used in the caluculation of gene set scores.
The default is NA, which means using all the PCs in the mogsa. This
should work for most of the cases.
barcol
The color of the bars, which is used to distinguish features/genes from
different datasets, so its length should be the same as the number of
data sets.
topN
An positive integer specify the number of top influencers that should
to returned.
plot
A logical indicate if the result should be plotted.
Fvalue
A logical indicate if the GIS should be calculated in a supervised manner.
ff
The vector indicates the group of columns for calculating the F-ratio when
Fvalue=TRUE.
cor
A logical indicates whether use correlation between reconstructed expression with GSS.
This is faster than the standard GIS.
Details
The evaluation of the importance of a single feature is calculated in
the supervised or unsupervised manner.
In the unsupervise manner, the value is calculated by:
log2(var(GS_-i)/var(GS))
where GS is the gene set score, and the GS_-i is a recalculate of
gene set score without i'th feature. var() is the variance.
In the supervised manner, the value is caluclated as the F-ratio over
a class vector:
log2(F(GS_-i)/F(GS))
Where F() is the calculation of F-ratio. The unsupervised GIS is encouraged
since it works better for most of the cases in practice.
Value
An object of class data.frame contains three columns. The first column is the feature name,
the second columns is the gene influential score. The third columns indicates from where the
feature/gene is selected.
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(mogsa)
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/mogsa/GIS.Rd_%03d_medium.png", width=480, height=480)
> ### Name: GIS
> ### Title: calculate gene influential scores of genes in a gene set.
> ### Aliases: GIS
>
> ### ** Examples
>
> # library(mogsa)
> # loading gene expression data and supplementary data
> data(NCI60_4array_supdata)
> data(NCI60_4arrays)
> mgsa <- mogsa(x = NCI60_4arrays, sup=NCI60_4array_supdata, nf=9,
+ proc.row = "center_ssq1", w.data = "inertia", statis = TRUE)
> allgs <- colnames(NCI60_4array_supdata[[1]])
>
> # unsupervised measurement
> GIS(mgsa, allgs[1], topN = 5)
feature GIS data
1 MYO1C 1.012768 hgu133
2 PALLD 1.012174 hgu133p2
3 RETSAT 1.011508 hgu133
4 ANLN 1.011359 hgu133
5 DERL2 1.011304 agilent
>
> # supervised measurement
> tissueType <- as.factor(sapply(strsplit(colnames(NCI60_4arrays$agilent), split="\."), "[", 1))
> GIS(mgsa, allgs[1], topN = 5, Fvalue = TRUE, ff = tissueType)
feature GIS data
1 ANAPC4 1.0000000 hgu133p2
2 REPIN1 0.9753993 hgu133p2
3 GTF3C5 0.9162151 agilent
4 IMP3 0.8400354 hgu133p2
5 H2AFY 0.8388857 hgu95
> # more PCs to calcualte
> GIS(mgsa, allgs[1], nf = 20, topN = 5, Fvalue = TRUE, ff = tissueType)
feature GIS data
1 REPIN1 1.0000000 hgu133p2
2 GTF3C5 0.8995660 agilent
3 ACN9 0.8842652 hgu133
4 IMP3 0.8695248 hgu133p2
5 DNAJA3 0.8693984 hgu133
>
>
>
>
>
> dev.off()
null device
1
>