CLERE methodology for simultaneous variables clustering and regression


The methodology consists in creating clusters of variables involved in a high dimensional linear regression model so as to reduce the dimensionality. A model-based approach is proposed and fitted using a Stochastic EM-Gibbs algorithm (SEM-Gibbs).


The package implements mainly the fitClere function (an example is given below) for fitting the model from a matrix of covariates and a vector of response. The package also implements a summary method and graphical summary plot which represents the course of each parameters at each step of the SEM-Gibbs and a predict method for making prediction from a new design matrix.


Loic Yengo


Yengo L., Jacques J. and Biernacki C. Variable clustering in high dimensional linear regression, Journal de la Societe Francaise de Statistique (2013).

 # Simple example using simulated data
 # to see how to you the main function clere
 # library(clere)
 x     <- matrix(rnorm(50 * 100), nrow = 50, ncol = 100)
 y     <- rnorm(50)
 model <- fitClere(y = y, x = x, g = 2, plotit = FALSE)
 clus <- clusters(model, threshold = NULL)
 predict(model, newx = x+1)
