Last data update: 2014.03.03

R: fit a GLM with fusion penalty for data integraion
metafuseR Documentation

fit a GLM with fusion penalty for data integraion

Description

Fit a GLM with fusion penalty on coefficients within each covariate, generate solution path for model selection.

Usage

metafuse(X = X, y = y, sid = sid, fuse.which = c(0:ncol(X)),
  family = "gaussian", intercept = TRUE, alpha = 0, criterion = "EBIC",
  verbose = TRUE, plots = TRUE, loglambda = TRUE)

Arguments

X

a matrix (or vector) of predictor(s), with dimensions of N*p, where N is the total sample size of all studies

y

a vector of response, with length N, the total sample size of all studies

sid

study id, numbered from 1 to K

fuse.which

a vector of a subset of integers from 0 to p, indicating which covariates to be considered for fusion; 0 corresponds to intercept

family

"gaussian" for continuous response, "binomial" for binary response, "poisson" for count response

intercept

if TRUE, intercept will be included in the model

alpha

the ratio of sparsity penalty to fusion penalty, default is 0 (no penalty on sparsity)

criterion

"AIC" for AIC, "BIC" for BIC, "EBIC" for extended BIC

verbose

if TRUE, output fusion events and tuning parameter lambda

plots

if TRUE, create plots of solution paths and clustering trees

loglambda

if TRUE, lambda will be plot in log-10 transformed scale

Details

Adaptive lasso penalty is used. See Zou (2006) for detail.

Value

a list containing the following items will be returned:

family

the model type

criterion

model selection criterion used

alpha

the ratio of sparsity penalty to fusion penalty

if.fuse

whether the covariate is fused (1) or not (0)

betahat

the estimated coefficients

betainfo

additional information about the fit, including degree of freedom, lambda optimal, lambda fuse, friction of fusion for each covariate

Examples

n <- 200    # sample size in each study
K <- 10     # number of studies
p <- 3      # number of covariates in X (including intercept)
N <- n*K    # total sample size

# the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K)
beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept
                  0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc.
                  0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p)

# generate a data set, family=c("gaussian", "binomial", "poisson")
data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123)

# prepare the input (y, X, studyID)
y       <- data$y
sid     <- data$group
X       <- data[,-c(1,ncol(data))]

# fuse slopes of X1 (it is heterogeneous with 2 groups)
metafuse(X=X, y=y, sid=sid, fuse.which=c(1), family="gaussian", intercept=TRUE, alpha=0,
          criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)

# fuse slopes of X2 (it is heterogeneous with 3 groups)
metafuse(X=X, y=y, sid=sid, fuse.which=c(2), family="gaussian", intercept=TRUE, alpha=0,
          criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)

# fuse all three covariates
metafuse(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE, alpha=0,
          criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)

# fuse all three covariates, with sparsity penalty
metafuse(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE, alpha=1,
          criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)

Results