R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Perform Dual KS Discriminant Analysis

dksTrain

R Documentation

Perform Dual KS Discriminant Analysis

Description

This function will perform dual KS discriminant analysis on a training set of gene expression data (in the form of an ExpressionSet) and a vector of classes describing which of (two or more) classes each column of data corresponds to. Genes will be be ranked based on the degree to which they are upregulated or downregulated in each class, or both. Discriminant gene signatures are then extracted using dksSelectGenes and applied to new samples with dksClassify.

Usage

	dksTrain(eset, class, type = "up", verbose=FALSE, weights=FALSE, logweights=TRUE, method='kort')

Arguments

`eset`	Gene expression data in the form of an `ExpressionSet` or `matrix`
`class`	A factor with two or more levels indicating which class each sample in the expression set belongs OR an integer indicating which column of pData(eset) contains this information.
`type`	One of "up", "down", or "both" indicating whether you want to analyze and classify based on up or down regulated genes, or both (note that classification of samples based on down regulated genes from single color experiments should be expected to work well due to the noise at low expression levels. Therefore, 'down', or 'both' should only be used for two color experiments or one color data that has been converted to ratios based on some reference sample(s).)
`verbose`	Set to TRUE if you want more evidence of progress while data is being processed. Set to FALSE if you want your CPU cycles to be used on analysis and not printing messages.
`weights`	Value determines whether and how genes are weighted when building the signatures. See details.
`logweights`	Should the weights be log10 transformed prior to applying?
`method`	Two methods are supported. The 'kort' method returns the maximum of the running sum. The 'yang' method returns the sum of the maximum and the minimum of the running sum, thereby penalizing genes that are highly enriched in a subset of samples of a given class, but highly down regulated in another subset of that same class.

Details

This function calculates the Kolmogorov-Smirnov rank sum statistic for each gene and each level of 'class'. The highest scoring genes can then be extracted for use in classification.

If weights=FALSE, signatures are defined based on the ranks of members of each class when sorted on each gene. Those genes for which a given class has the highest rank when sorting samples by those genes will be included in the classifier, with no regard to the absolute expression level of those genes. This is the classic KS statistic.

Very discriminant genes identified in this way may or may not be the highest expressed genes. The result is that signatures identified in this way have arbitrary "baseline" values. This may lead to misclassification when comparing two signatures (using, for example, dksClassify). Therefore, one may wish to weight genes based on absolute expression level, or some other metric.

Setting weights = TRUE causes the genes to be weighted according to the log (base 10) of the relative rank of the mean expression of each gene in each class. Alternatively, you may provide your own weight matrix as the argument to weights. This matrix must have one column for each possible value of class, and one row for each gene in eset. Note that for type='down' or the down component of type='both', the weight matrix will be inverted as 1-matrix, so the range of weights should be 0 - 1 for each class. NAs are handled "gracefully" by discarding any genes for which any column of the corresponding row of weights is NA. Our experience has been that weights that are a linear function of some feature of the gene expression (like mean) can be too subtle. The effect of the weights can be increased by setting logweights=TRUE (which is the default).

Value

An object of class DKSGeneScores.

Author(s)

Eric J. Kort, Yarong Yang

Examples

	data("dks")
	tr <- dksTrain(eset, 1, "up")
	cl <- dksSelectGenes(tr, 100)
	pr <- dksClassify(eset, cl)
	summary(pr, pData(eset)[,1])
	show(pr)
	plot(pr, actual=pData(eset)[,1])

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(dualKS)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: affy
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/dualKS/dksTrain.Rd_%03d_medium.png", width=480, height=480)
> ### Name: dksTrain
> ### Title: Perform Dual KS Discriminant Analysis
> ### Aliases: dksTrain
> ### Keywords: classif
> 
> ### ** Examples
> 
> 	data("dks")
> 	tr <- dksTrain(eset, 1, "up")
> 	cl <- dksSelectGenes(tr, 100)
> 	pr <- dksClassify(eset, cl)
> 	summary(pr, pData(eset)[,1])


Dual KS Classification Summary:

Predicted class frequencies:

    normal      osteo rheumatoid 
        11          0          4 


Concordance rate (predicted==actual):  60 %

> 	show(pr)
     sample predicted class prediction score
1  GSM34379          normal         1024.367
2  GSM34383          normal         1073.083
3  GSM34385          normal         1116.797
4  GSM34388          normal            971.7
5  GSM34391          normal         1159.983
6  GSM34393          normal            592.5
7  GSM34394          normal          671.763
8  GSM34395          normal          610.143
9  GSM34396          normal           624.89
10 GSM34397          normal          604.087
11 GSM34398          normal          604.613
12 GSM34399      rheumatoid          599.083
13 GSM34400      rheumatoid          727.853
14 GSM34401      rheumatoid          606.457
15 GSM34402      rheumatoid           657.28
> 	plot(pr, actual=pData(eset)[,1])	
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>