R Graphical Manual

Browse All

Last data update: 2014.03.03

R: fitClere function

fitClere

R Documentation

fitClere function

Description

This function runs the CLERE Model. It returns an object of class Clere. For more details please refer to clere.

Usage

    fitClere(y = rnorm(10), x = matrix(rnorm(50), nrow = 10), 
              g = 1, nItMC = 50, nItEM = 1000, nBurn = 200, 
              dp = 5, nsamp = 200, maxit = 500, tol = 0.001,
              nstart=2, parallel = FALSE,seed = NULL, plotit = FALSE, sparse = FALSE, 
              analysis = "fit",algorithm = "SEM",theta0 = NULL, Z0 = NULL)

Arguments

`y`	[numeric]: The vector of observed responses - size `n`.
`x`	[matrix]: The matrix of predictors - size `n` rows and `p` columns.
`g`	[integer]: Either the number or the maximum of groups for fitting CLERE. Maximum number of groups is considered when model selection is required.
`nItMC`	[numeric]: Number of Gibbs iterations to generate the partitions. After the `nBurn` iterations, this number is automatically set to `1`.
`nItEM`	[numeric]: Number of SEM iterations.
`nBurn`	[numeric]: Number of SEM iterations discarded before calculating the MLE which is averaged over SEM draws.
`dp`	[numeric]: Number of iterations between sampled partitions when calculating the likelihood at the end of the run.
`nsamp`	[numeric]: Number of sampled partitions for calculating the likelihood at the end of the run.
`maxit`	[numeric]: An EM algorithm is used inside the SEM to maximize the complete log-likelihood p(y, Z\|theta). `maxit` stands as the maximum number of EM iterations for the internal EM.
`tol`	[numeric]: Maximum increased in complete log-likelihood for the internal EM (stopping criterion).
`nstart`	[integer]: Number of random starting points to be used for fitting the model.
`parallel`	[logical]: Should the estimation from `nstart` random starting points run in parallel?
`seed`	[integer]: An integer given as a seed for random number generation. If set to `NULL`, then a random seed is generated between `1` and `1000`.
`plotit`	[logical]: Should a summary plot (base plot) be drawn after the run?
`sparse`	[logical]: Should a `0` class be imposed to the model?
`analysis`	[character]: Which analysis is to be performed. Values are `"fit"`, `"bic"`, `"aic"` and `"icl"`.
`algorithm`	[character]: The algorithm to be chosen to fit the model. Either the SEM-Gibbs algorithm or the MCEM algorithm. The most efficient algorithm being the SEM-Gibbs approach. MCEM is not available for binary response.
`theta0`	[vector(numeric)]: An initial guess of the model parameters. When considering g components, the length of `theta0` must be `2*g+3` and `theta0` should be filled as intercept, the b_k's (g real numbers), the pi_k's (g real numbers summing to 1), sigma^2 and gamma^2 (two positive numbers).
`Z0`	[vector(integer)]: A vector of integers representing an initial partition for the variables. For 10 variables and 3 groups `Z0` can be defined as `Z0 = c(rep(0, 2), rep(1, 3), rep(2, 5))`.

Value

Object of class Clere.

Author(s)

Loic Yengo loic.yengo@gmail.com

Examples

# library(clere)
plotit    <- FALSE
sparse    <- FALSE
nItEM     <- 100
nBurn     <- nItEM / 2
nsamp     <- 100
analysis  <- "fit"
algorithm <- "SEM"
nItMC     <- 1
dp        <- 2
maxit     <- 200
tol       <- 1e-3

n         <- 50
p         <- 50
intercept <- 0
sigma     <- 10
gamma     <- 10
rho       <- 0.5

g         <- 5 
probs     <- c(0.36, 0.28, 0.20, 0.12, 0.04)
Eff       <- p * probs
a         <- 5
B         <- a**(0:(g-1))-1
Z         <- matrix(0, nrow = p, ncol = g)
imax      <- 0
imin      <- 1

for (k in 1:g) {
    imin <- imax+1
    imax <- imax+Eff[k]
    Z[imin:imax, k] <- 1
}
Z <- Z[sample(1:p, p), ]
if (g>1) {
    Beta <- rnorm(p, mean = c(Z%*%B), sd = gamma)
} else {
    Beta <- rnorm(p, mean = B, sd = gamma)
}

theta0 <- NULL # c(intercept, B, probs, sigma^2, gamma^2)
Z0     <- NULL # apply(Z, 1, which.max)-1

gmax <- 7

## Prediction
eps  <- rnorm(n, mean = 0, sd = sigma)
X    <- matrix(rnorm(n*p), nrow = n, ncol = p)
Y    <- as.numeric(intercept+X%*%Beta+eps)
tt   <- system.time(mod <- fitClere(y = Y, x = X, g = gmax, 
                        analysis = analysis,algorithm = algorithm,
                        plotit = plotit, 
                        sparse = FALSE,nItEM = nItEM, 
                        nBurn = nBurn, nItMC = nItMC, 
                        nsamp = nsamp, theta0 = theta0, Z0 = Z0) )
plot(mod)
Yv <- predict(object = mod, newx = X)