AggregateMC
(Package: GeneSelector) :
Aggregation of repeated rankings using a Markov chain approach
All obtained rankings are aggregated on the basis of Markov chain model, in which each gene constitutes an element of the state space. For details, see DeConde et al. (2006).
The idea behind this form of aggregation is to find a compromise between quality on the one hand, represented by the list position/rank, and variability on the other hand. The latter is assessed by calling the function dispersion.
AggregateSVD
(Package: GeneSelector) :
Agregation of repeated rankings using the singular value decomposition (SVD)
A matrix storing all rankings is centered rowwise (=genewise), and then approximated using only the first singular value and the first singular vectors (s. Golub and Van Loan (1983) for details about the SVD). The rowwise mean vector is added afterwards, and the rowwise mean are finally used as aggregation. A weighting scheme giving more weight to top genes is incorporated by an (iteratively) weighted SVD, which is re-computed until convergence. Note that the SVD is closely related to principal component analysis, a standard tool for dimension reduction in high-dimensional datasets.
The term 'GeneSelector' refers to a filter selecting those genes which are consistently identified as differentially expressed using various statistical procedures. 'Selected' genes are those present at the top of the list in various featured ranking methods (currently 14). In addition, the stability of the findings can be taken into account in the final ranking by examining perturbed versions of the original data set, e.g. by leaving samples, swapping class labels, generating bootstrap replicates or adding noise.
Given GeneRankings or AggregatedRankings obtained from several ranking procedures, the aim is to find is a unifying output. A threshold equal to the maximum rank/list position which is still relevant for the question of interest may be provided by the user, or the threshold can adaptively be determined via significance analysis in multiple testing procedures. Then, all genes are checked whether their ranks fall below this threshold consistenly in all ranking procedures used. If this holds, then the gene is selected. A final order of the genes is defined by the following criteria