Selection of Differential Distributions with Kullback Leibler Distance


Ranks features by largest Kullback-Leibler distance and chooses the features which have best resubstitution performance.


  ## S4 method for signature 'matrix'
KullbackLeiblerSelection(expression, classes, ...)
  ## S4 method for signature 'ExpressionSet'
KullbackLeiblerSelection(expression, datasetName,
                                      trainParams, predictParams, resubstituteParams, ...,
                                      selectionName, verbose = 3)



Either a matrix or ExpressionSet containing the training data. For a matrix, the rows are features, and the columns are samples.


A vector of class labels.


A name for the dataset used. Stored in the result.


A container of class TrainParams describing the classifier to use for training.


A container of class PredictParams describing how prediction is to be done.


An object of class ResubstituteParams describing the performance measure to consider and the numbers of top features to try for resubstitution classification.


Variables passed to getLocationsAndScales.


A name to identify this selection method by. Stored in the result.


A number between 0 and 3 for the amount of progress messages to give. This function only prints progress messages if the value is 3.


The distance is defined as 0.5 * (location1 - location2)^2 / scale1^2 + (location1 - location2)^2 / scale2^2 + scale1^2 / scale2^2 + scale2^2 / scale1^2

The subscripts denote the group which the parameter is calculated for.


An object of class SelectResult or a list of such objects, if the classifier which was used for determining resubstitution error rate made a number of prediction varieties.


Dario Strbenac


    # First 20 features have bimodal distribution for Poor class. Other 80 features have normal distribution for
    # both classes.
    genesMatrix <- sapply(1:25, function(sample) c(rnorm(20, sample(c(8, 12), 20, replace = TRUE), 1), rnorm(80, 10, 1)))
    genesMatrix <- cbind(genesMatrix, sapply(1:25, function(sample) rnorm(100, 10, 1)))
    classes <- factor(rep(c("Poor", "Good"), each = 25))
    KullbackLeiblerSelection(genesMatrix, classes, "Example",
                             trainParams = TrainParams(naiveBayesKernel, FALSE, doesTests = TRUE),
                             predictParams = PredictParams(function(){}, FALSE, getClasses = function(result) result),
                             resubstituteParams = ResubstituteParams(nFeatures = seq(10, 100, 10), performanceType = "balanced", better = "lower")


Loading required package: sparsediscrim
An object of class 'SelectResult'.
Dataset Name: Example.
Feature Selection Name: Kullback-Leibler Divergence.
Features Considered: 100.
Selections: List of length 1.
Selection Size : 40 features.

$`weighted=weighted,weight=crossover distance`
An object of class 'SelectResult'.
Dataset Name: Example.
Feature Selection Name: Kullback-Leibler Divergence.
Features Considered: 100.
Selections: List of length 1.
Selection Size : 40 features.

$`weighted=weighted,weight=height difference`
An object of class 'SelectResult'.
Dataset Name: Example.
Feature Selection Name: Kullback-Leibler Divergence.
Features Considered: 100.
Selections: List of length 1.
Selection Size : 60 features.

$`weighted=weighted,weight=sum differences`
An object of class 'SelectResult'.
Dataset Name: Example.
Feature Selection Name: Kullback-Leibler Divergence.
Features Considered: 100.
Selections: List of length 1.
Selection Size : 40 features.

