Last data update: 2014.03.03

R: fitClere function
fitClereR Documentation

fitClere function

Description

This function runs the CLERE Model. It returns an object of class Clere. For more details please refer to clere.

Usage

    fitClere(y = rnorm(10), x = matrix(rnorm(50), nrow = 10), 
              g = 1, nItMC = 50, nItEM = 1000, nBurn = 200, 
              dp = 5, nsamp = 200, maxit = 500, tol = 0.001,
              nstart=2, parallel = FALSE,seed = NULL, plotit = FALSE, sparse = FALSE, 
              analysis = "fit",algorithm = "SEM",theta0 = NULL, Z0 = NULL)

Arguments

y

[numeric]: The vector of observed responses - size n.

x

[matrix]: The matrix of predictors - size n rows and p columns.

g

[integer]: Either the number or the maximum of groups for fitting CLERE. Maximum number of groups is considered when model selection is required.

nItMC

[numeric]: Number of Gibbs iterations to generate the partitions. After the nBurn iterations, this number is automatically set to 1.

nItEM

[numeric]: Number of SEM iterations.

nBurn

[numeric]: Number of SEM iterations discarded before calculating the MLE which is averaged over SEM draws.

dp

[numeric]: Number of iterations between sampled partitions when calculating the likelihood at the end of the run.

nsamp

[numeric]: Number of sampled partitions for calculating the likelihood at the end of the run.

maxit

[numeric]: An EM algorithm is used inside the SEM to maximize the complete log-likelihood p(y, Z|theta). maxit stands as the maximum number of EM iterations for the internal EM.

tol

[numeric]: Maximum increased in complete log-likelihood for the internal EM (stopping criterion).

nstart

[integer]: Number of random starting points to be used for fitting the model.

parallel

[logical]: Should the estimation from nstart random starting points run in parallel?

seed

[integer]: An integer given as a seed for random number generation. If set to NULL, then a random seed is generated between 1 and 1000.

plotit

[logical]: Should a summary plot (base plot) be drawn after the run?

sparse

[logical]: Should a 0 class be imposed to the model?

analysis

[character]: Which analysis is to be performed. Values are "fit", "bic", "aic" and "icl".

algorithm

[character]: The algorithm to be chosen to fit the model. Either the SEM-Gibbs algorithm or the MCEM algorithm. The most efficient algorithm being the SEM-Gibbs approach. MCEM is not available for binary response.

theta0

[vector(numeric)]: An initial guess of the model parameters. When considering g components, the length of theta0 must be 2*g+3 and theta0 should be filled as intercept, the b_k's (g real numbers), the pi_k's (g real numbers summing to 1), sigma^2 and gamma^2 (two positive numbers).

Z0

[vector(integer)]: A vector of integers representing an initial partition for the variables. For 10 variables and 3 groups Z0 can be defined as Z0 = c(rep(0, 2), rep(1, 3), rep(2, 5)).

Value

Object of class Clere.

Author(s)

Loic Yengo loic.yengo@gmail.com

See Also

Overview : clere-package
Classes : Clere
Methods : show, plot, clusters, predict, summary
Functions : fitClere fitPacs

Examples

# library(clere)
plotit    <- FALSE
sparse    <- FALSE
nItEM     <- 100
nBurn     <- nItEM / 2
nsamp     <- 100
analysis  <- "fit"
algorithm <- "SEM"
nItMC     <- 1
dp        <- 2
maxit     <- 200
tol       <- 1e-3

n         <- 50
p         <- 50
intercept <- 0
sigma     <- 10
gamma     <- 10
rho       <- 0.5

g         <- 5 
probs     <- c(0.36, 0.28, 0.20, 0.12, 0.04)
Eff       <- p * probs
a         <- 5
B         <- a**(0:(g-1))-1
Z         <- matrix(0, nrow = p, ncol = g)
imax      <- 0
imin      <- 1

for (k in 1:g) {
    imin <- imax+1
    imax <- imax+Eff[k]
    Z[imin:imax, k] <- 1
}
Z <- Z[sample(1:p, p), ]
if (g>1) {
    Beta <- rnorm(p, mean = c(Z%*%B), sd = gamma)
} else {
    Beta <- rnorm(p, mean = B, sd = gamma)
}

theta0 <- NULL # c(intercept, B, probs, sigma^2, gamma^2)
Z0     <- NULL # apply(Z, 1, which.max)-1

gmax <- 7

## Prediction
eps  <- rnorm(n, mean = 0, sd = sigma)
X    <- matrix(rnorm(n*p), nrow = n, ncol = p)
Y    <- as.numeric(intercept+X%*%Beta+eps)
tt   <- system.time(mod <- fitClere(y = Y, x = X, g = gmax, 
                        analysis = analysis,algorithm = algorithm,
                        plotit = plotit, 
                        sparse = FALSE,nItEM = nItEM, 
                        nBurn = nBurn, nItMC = nItMC, 
                        nsamp = nsamp, theta0 = theta0, Z0 = Z0) )
plot(mod)
Yv <- predict(object = mod, newx = X)

Results