R: Plot Pair-wise Overlap or Selection Size Distribution of...
selectionPlot
R Documentation
Plot Pair-wise Overlap or Selection Size Distribution of Selected Features
Description
Pair-wise overlaps can be done for two types of analyses. Firstly, each cross-validation iteration
can be considered within a single classification. This explores the feature selection stability. Secondly, the
overlap may be considered between different classification results. This approach compares the feature selection
commonality between different selection methods. Two types of commonality are possible to analyse. One summary is
the average pair-wise overlap between a level of the comparison factor and the other summary is the pair-wise
overlap of each level of the comparison factor that is not the reference level against the reference level.
The overlaps are converted to percentages and plotted as lineplots.
Additionally, a heatmap of selection size frequencies can be made.
The aspect of the experimental design to compare. See Details section for a
detailed description.
referenceLevel
The level of the comparison factor to use as the reference to compare each
non-reference level to. If NULL, then each level has the
average pairwise overlap calculated to all other levels.
xVariable
The factor to make separate boxes in the boxplot for.
boxFillColouring
A factor to colour the boxes by.
boxFillColours
A vector of colours, one for each level of boxFillColouring. If NULL,
a default palette is used.
boxFillBinBoundaries
Used only if comparison is "size". A vector of integers, specifying the bin
boundaries of percentages of size bins observed. e.g. 0, 10, 20, 30, 40, 50.
setSizeBinBoundaries
Used only if comparison is "size". A vector of integers, specifying the bin
boundaries of set size bins. e.g. 50, 100, 150, 200, 250.
boxLineColouring
A factor to colour the box lines by.
boxLineColours
A vector of colours, one for each level of boxLineColouring. If NULL,
a default palette is used.
rowVariable
The slot name that different levels of are plotted as separate rows of boxplots.
columnVariable
The slot name that different levels of are plotted as separate columns of boxplots.
yMax
The maximum value of the percentage to plot.
fontSizes
A vector of length 4. The first number is the size of the title.
The second number is the size of the axes titles. The third number is
the size of the axes values. The fourth number is the font size of the
titles of grouped plots, if any are produced. In other words, when
rowVariable or columnVariable are not NULL.
title
An overall title for the plot.
xLabel
Label to be used for the x-axis.
yLabel
Label to be used for the y-axis of overlap percentages.
margin
The margin to have around the plot.
rotate90
Logical. If TRUE, the boxplot is horizontal.
showLegend
If TRUE, a legend is plotted next to the plot. If FALSE, it is hidden.
plot
Logical. If TRUE, a plot is produced on the current graphics device.
parallelParams
An object of class MulticoreParam or SnowParam.
Details
Possible values for characteristics are "datasetName", "classificationName", "size",
"selectionName", and "validation". If "None", then that graphical element is not used.
If comparison is "within", then the feature selection overlaps are compared within a particular
analysis. The result will inform how stable the selections are between different iterations of cross-validation
for a particular analysis. If comparison is "classificationName", then the feature
selections are compared across different classification algorithm types, for each level of "datasetName",
"selectionName" and "validation". The result will inform how stable the feature selections
are between different classification algorithms, for every cross-validation scheme, selction algorithm and
dataset. If comparison is "selectionName", then the feature selections are compared across
different feature selection algorithms, for each level of "datasetName", "classificationName" and
"validation". The result will inform how stable the feature selections are between feature selection
algorithms, for every dataset, classification algorithm, and cross-validation scheme. If comparison
is "validation", then the feature selections are compared across different cross-validation schemes,
for each level of "classificationName", "selectionName" and "datasetName". The result
will inform how stable the feature selections are between different cross-validation schemes, for every
selection algorithm, classification algorithm and every dataset. If comparison is
"datasetName", then the feature selections are compared across different datasets,
for each level of "classificationName", "selectionName", and "validation".
The result will inform how stable the feature selections are between different datasets, for every
classification algorithm and every dataset. This could be used to consider if different
experimental studies have a highly overlapping feature selection pattern.
Calculating all pair-wise set overlaps can be time-consuming. This stage can be done on multiple CPUs by
providing the relevant options to parallelParams. The percentage is calculated as the intersection
of two sets of features divided by the union of the sets, multiplied by 100.
For the selection size mode, boxFillBins is used to create bins which include the lowest value for
the first bin, and the highest value for the last bin using cut.
Value
An object of class ggplot and a plot on the current graphics device, if plot is TRUE.