This function will perform dual KS discriminant analysis on a
training set of gene expression data (in the form of an
ExpressionSet) and a vector of classes describing which of
(two or more) classes each column of data corresponds to. Genes
will be be ranked based on the degree to which they are
upregulated or downregulated in each class, or both.
Discriminant gene signatures are then extracted using
dksSelectGenes and applied to new samples with dksClassify.
Usage
dksTrain(eset, class, type = "up", verbose=FALSE, weights=FALSE, logweights=TRUE, method='kort')
Arguments
eset
Gene expression data in the form of an
ExpressionSet or matrix
class
A factor with two or more levels indicating which
class each sample in the expression set belongs OR
an integer indicating which column of pData(eset)
contains this information.
type
One of "up", "down", or "both" indicating whether you
want to analyze and classify based on up or down
regulated genes, or both (note that classification of
samples based on down regulated genes from single
color experiments should be expected to work well due
to the noise at low expression levels. Therefore,
'down', or 'both' should only be used for two color
experiments or one color data that has been converted
to ratios based on some reference sample(s).)
verbose
Set to TRUE if you want more evidence of progress
while data is being processed. Set to FALSE if you
want your CPU cycles to be used on analysis and not
printing messages.
weights
Value determines whether and how genes are weighted
when building the signatures. See details.
logweights
Should the weights be log10 transformed prior to applying?
method
Two methods are supported. The 'kort' method returns
the maximum of the running sum. The 'yang' method
returns the sum of the maximum and the minimum of the
running sum, thereby penalizing genes that are highly enriched
in a subset of samples of a given class, but highly
down regulated in another subset of that same class.
Details
This function calculates the Kolmogorov-Smirnov rank sum statistic for
each gene and each level of 'class'. The highest scoring genes can
then be extracted for use in classification.
If weights=FALSE, signatures are defined based on the ranks of members
of each class when sorted on each gene. Those genes for which a given
class has the highest rank when sorting samples by those genes will
be included in the classifier, with no regard to the absolute expression
level of those genes. This is the classic KS statistic.
Very discriminant genes identified in this way may or may not be the
highest expressed genes. The result is that signatures identified
in this way have arbitrary "baseline" values. This may lead to
misclassification when comparing two signatures (using, for example,
dksClassify). Therefore, one may wish to weight genes
based on absolute expression level, or some other metric.
Setting weights = TRUE causes the genes to be weighted according
to the log (base 10) of the relative rank of the mean expression of
each gene in each class. Alternatively, you may provide your own weight
matrix as the argument to weights. This matrix must have one
column for each possible value of class, and one row for each
gene in eset. Note that for type='down' or the down
component of type='both', the weight matrix will be inverted
as 1-matrix, so the range of weights should be 0 - 1 for each
class. NAs are handled "gracefully" by discarding any
genes for which any column of the corresponding row of weights
is NA. Our experience has been that weights that are a linear function
of some feature of the gene expression (like mean) can be too subtle. The
effect of the weights can be increased by setting logweights=TRUE
(which is the default).
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(dualKS)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: affy
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/dualKS/dksTrain.Rd_%03d_medium.png", width=480, height=480)
> ### Name: dksTrain
> ### Title: Perform Dual KS Discriminant Analysis
> ### Aliases: dksTrain
> ### Keywords: classif
>
> ### ** Examples
>
> data("dks")
> tr <- dksTrain(eset, 1, "up")
> cl <- dksSelectGenes(tr, 100)
> pr <- dksClassify(eset, cl)
> summary(pr, pData(eset)[,1])
Dual KS Classification Summary:
Predicted class frequencies:
normal osteo rheumatoid
11 0 4
Concordance rate (predicted==actual): 60 %
> show(pr)
sample predicted class prediction score
1 GSM34379 normal 1024.367
2 GSM34383 normal 1073.083
3 GSM34385 normal 1116.797
4 GSM34388 normal 971.7
5 GSM34391 normal 1159.983
6 GSM34393 normal 592.5
7 GSM34394 normal 671.763
8 GSM34395 normal 610.143
9 GSM34396 normal 624.89
10 GSM34397 normal 604.087
11 GSM34398 normal 604.613
12 GSM34399 rheumatoid 599.083
13 GSM34400 rheumatoid 727.853
14 GSM34401 rheumatoid 606.457
15 GSM34402 rheumatoid 657.28
> plot(pr, actual=pData(eset)[,1])
>
>
>
>
>
> dev.off()
null device
1
>