R Graphical Manual

Browse All

Last data update: 2014.03.03

R: fit a GLM with fusion penalty for data integraion

metafuse

R Documentation

fit a GLM with fusion penalty for data integraion

Description

Fit a GLM with fusion penalty on coefficients within each covariate, generate solution path for model selection.

Usage

metafuse(X = X, y = y, sid = sid, fuse.which = c(0:ncol(X)),
  family = "gaussian", intercept = TRUE, alpha = 0, criterion = "EBIC",
  verbose = TRUE, plots = TRUE, loglambda = TRUE)

Arguments

`X`	a matrix (or vector) of predictor(s), with dimensions of N*p, where N is the total sample size of all studies
`y`	a vector of response, with length N, the total sample size of all studies
`sid`	study id, numbered from 1 to K
`fuse.which`	a vector of a subset of integers from 0 to p, indicating which covariates to be considered for fusion; 0 corresponds to intercept
`family`	"gaussian" for continuous response, "binomial" for binary response, "poisson" for count response
`intercept`	if TRUE, intercept will be included in the model
`alpha`	the ratio of sparsity penalty to fusion penalty, default is 0 (no penalty on sparsity)
`criterion`	"AIC" for AIC, "BIC" for BIC, "EBIC" for extended BIC
`verbose`	if TRUE, output fusion events and tuning parameter lambda
`plots`	if TRUE, create plots of solution paths and clustering trees
`loglambda`	if TRUE, lambda will be plot in log-10 transformed scale

Details

Adaptive lasso penalty is used. See Zou (2006) for detail.

Value

a list containing the following items will be returned:

`family`	the model type
`criterion`	model selection criterion used
`alpha`	the ratio of sparsity penalty to fusion penalty
`if.fuse`	whether the covariate is fused (1) or not (0)
`betahat`	the estimated coefficients
`betainfo`	additional information about the fit, including degree of freedom, lambda optimal, lambda fuse, friction of fusion for each covariate

Examples

n <- 200    # sample size in each study
K <- 10     # number of studies
p <- 3      # number of covariates in X (including intercept)
N <- n*K    # total sample size

# the coefficient matrix, used this to set desired heterogeneous pattern (depends on p and K)
beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # intercept
                  0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1, etc.
                  0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), K, p)

# generate a data set, family=c("gaussian", "binomial", "poisson")
data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123)

# prepare the input (y, X, studyID)
y       <- data$y
sid     <- data$group
X       <- data[,-c(1,ncol(data))]

# fuse slopes of X1 (it is heterogeneous with 2 groups)
metafuse(X=X, y=y, sid=sid, fuse.which=c(1), family="gaussian", intercept=TRUE, alpha=0,
          criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)

# fuse slopes of X2 (it is heterogeneous with 3 groups)
metafuse(X=X, y=y, sid=sid, fuse.which=c(2), family="gaussian", intercept=TRUE, alpha=0,
          criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)

# fuse all three covariates
metafuse(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE, alpha=0,
          criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)

# fuse all three covariates, with sparsity penalty
metafuse(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE, alpha=1,
          criterion="EBIC", verbose=TRUE, plots=TRUE, loglambda=TRUE)