Last data update: 2014.03.03

R: Estimate and simulate hierarchical exponential-family random...
hergmR Documentation

Estimate and simulate hierarchical exponential-family random graph models with local dependence

Description

The function hergm estimates and simulates three classes of hierarchical exponential-family random graph models:

1. The p_1 model of Holland and Leinhardt (1981) in exponential-family form and extensions by Vu, Hunter, and Schweinberger (2013) and Schweinberger, Petrescu-Prahova, and Vu (2014) to both directed and undirected random graphs with additional model terms, with and without covariates, and with parametric and nonparametric priors (see arcs_i, arcs_j, edges_i, edges_ij, mutual_i, mutual_ij).

2. The stochastic block model of Snijders and Nowicki (1997) and Nowicki and Snijders (2001) in exponential-family form and extensions by Vu, Hunter, and Schweinberger (2013) and Schweinberger, Petrescu-Prahova, and Vu (2014) with additional model terms, with and without covariates, and with parametric and nonparametric priors (see arcs_i, arcs_j, edges_i, edges_ij, mutual_i, mutual_ij).

3. The exponential-family random graph models with local dependence of Schweinberger and Handcock (2015), with and without covariates, and with parametric and nonparametric priors (see arcs_i, arcs_j, edges_i, edges_ij, mutual_i, mutual_ij, twostar_ijk, triangle_ijk, ttriple_ijk, ctriple_ijk). The exponential-family random graph models with local dependence replace the long-range dependence of conventional exponential-family random graph models by short-range dependence. Therefore, the exponential-family random graph models with local dependence replace the strong dependence of conventional exponential-family random graph models by weak dependence, reducing the problem of model degeneracy (Handcock, 2003; Schweinberger, 2011) and improving goodness-of-fit (Schweinberger and Handcock, 2015).

The function hergm postprocesses the output returned by the function hergm. If hergm is called with relabel > 0, it solves the so-called label-switching problem. The label-switching problem is rooted in the invariance of the likelihood function to permutations of the labels of blocks, and implies that raw MCMC samples from the posterior cannot be used to infer to block-dependent entities. The label-switching problem can be solved in a Bayesian decision-theoretic framework: by choosing a loss function and minimizing the posterior expected loss. Two loss functions are implemented in hergm, the loss function of Schweinberger and Handcock (2015) (relabel == 1) and the loss function of Peng and Carvalho (2015) (relabel == 2). The first loss function seems to be superior in terms of the reported clustering probabilities, but is more expensive in terms of computing time. A rule of thumb is to use the first loss function when max_number < 15 and use the second loss function otherwise.

Usage


hergm(formula,
                  max_number = NULL,
                  hierarchical = TRUE,
                  parametric = FALSE,
                  initialize = FALSE,
                  perturb = FALSE,
                  scaling = NULL,
                  alpha = NULL,
                  alpha_shape = NULL,
                  alpha_rate = NULL,
                  eta = NULL,
                  eta_mean = NULL,
                  eta_sd = NULL,
                  eta_mean_mean = NULL,
                  eta_mean_sd = NULL,
                  eta_precision_shape = NULL,
                  eta_precision_rate = NULL,
                  mean_between = NULL,
                  all_indicators_fixed = FALSE,
                  indicators_fixed = FALSE,
                  indicator = NULL,
                  parallel = 1,
                  simulate = FALSE,
                  seeds = NULL,
                  samplesize = 1e+5,
                  interval = 1024,
                  burnin = 16*interval,
                  mh.scale = 0.25,
                  variational = FALSE,
                  temperature = c(1,100),
                  predictions = FALSE,
                  posterior.burnin = 0,
                  posterior.thinning = 1,
                  relabel = 0,
                  number.runs = 1,
                  verbose = 1,
                  ...)

Arguments

formula

formula of the form network ~ terms. Networks can be created by calling the function network. Possible terms can be found in ergm.terms and hergm.terms.

max_number

maximum number of blocks.

hierarchical

hierarchical prior; if hierarchical == TRUE, prior is hierarchical (i.e., the means and variances of block parameters are governed by a hyper-prior), otherwise non-hierarchical (i.e., the means and variances of block parameters are fixed).

parametric

parametric prior; if parametric == FALSE, prior is truncated Dirichlet process prior, otherwise parametric Dirichlet prior.

initialize

if initialize == TRUE, initialize block memberships by spectral clustering.

perturb

if initialize == TRUE and perturb == TRUE, initialize block memberships by spectral clustering and perturb.

scaling

if scaling == TRUE, use size-dependent parameterizations which ensure that the scaling of between- and within-neighborhood terms is consistent with sparse edge terms.

alpha

concentration parameter of truncated Dirichlet process prior of natural parameters of exponential-family model.

alpha_shape, alpha_rate

shape and rate parameter of Gamma prior of scaling parameter.

eta

natural parameters of exponential-family random graph model.

eta_mean, eta_sd

means and standard deviations of Gaussian baseline distribution of Dirichlet process prior of natural parameters.

eta_mean_mean, eta_mean_sd

means and standard deviations of Gaussian prior of mean of Gaussian baseline distribution of Dirichlet process prior.

eta_precision_shape, eta_precision_rate

shape and rate (inverse scale) parameter of Gamma prior of precision parameter of Gaussian baseline distribution of Dirichlet process prior.

mean_between

if simulate == TRUE and eta == NULL, then mean_between specifies the mean-value parameter of edges between blocks.

all_indicators_fixed

indicates whether all indicators of the block memberships are fixed at the specified indicators; if some block memberships are unsspecified, spectral clustering is used to initialize all block memberships.

indicators_fixed

indicates whether the indicators of the block memberships are fixed at the specified indicators.

indicator

if the indicators of block memberships are specified as numbers between 1 and max_number, the specified indicators are either used as starting values (indicator_fixed == FALSE) or the indicators are fixed at the specified indicators (indicator_fixed == TRUE), which is useful when indicators of block memberhips are observed (which is the case in multilevel networks).

parallel

number of computing nodes; if parallel > 1, hergm is run on parallel computing nodes.

simulate

if simulate == TRUE, simulation of networks, otherwise Bayesian inference.

seeds

seed of pseudo-random number generator; if parallel > 1, number of seeds must equal number of computing nodes.

samplesize

if simulate == TRUE, number of network draws, otherwise number of posterior draws; if parallel > 1, number of draws on each computing node.

interval

if simulate == TRUE, number of proposals between sampled networks.

burnin

if simulate == TRUE, number of burn-in iterations.

mh.scale

if simulate == FALSE, scale factor of candicate-generating distribution of Metropolis-Hastings algorithm.

variational

if simulate == FALSE and variational == TRUE, variational methods are used to construct the proposal distributions of block memberships; limited to selected models.

temperature

if simulate == FALSE and variational == TRUE, minimum and maximum temperature; the temperature is used to melt down the proposal distributions of indicators, which are based on the full conditional distributions of indicators but can have low entropy, resulting in slow mixing of the Markov chain; the temperature is a function of the entropy of the full conditional distributions and is designed to increase the entropy of the proposal distributions, and the minimum and maximum temperature are user-defined lower and upper bounds on the temperature.

predictions

if predictions == TRUE and simulate == FALSE, returns posterior predictions of statistics in the model.

posterior.burnin

number of burn-in iterations; if computing is parallel, number of burn-in iterations per processor.

posterior.thinning

if thinning > 1, every thinning-th sample point is used while all others discarded.

relabel

if relabel > 0, relabel MCMC sample by minimizing the posterior expected loss of Schweinberger and Handcock (2015) (relabel == 1) or Peng and Carvalho (2015) (relabel == 2).

number.runs

if relabel == 1, number of runs of relabeling algorithm.

verbose

if verbose == -1, no console output; if verbose == 0, short console output; if verbose == +1, long console output.

...

additional arguments, to be passed to lower-level functions in the future.

Value

If called with the option simulate == TRUE, the function hergm returns a sample of networks, otherwise a MCMC sample from the posterior.

ergm_theta

parameters of ergm-terms.

alpha

concentration parameter of truncated Dirichlet process prior of parameters of hergm-terms.

eta_mean

mean parameters of Gaussian base distribution of parameters of hergm-terms.

eta_precision

precision parameters of Gaussian base distribution of parameters of hergm-terms.

hergm_theta

parameters of hergm-terms.

loss

if relabel == TRUE, local minimum of loss function.

p_k

probabilities of membership to blocks.

indicator

indicators of memberships of nodes.

p_i_k

probabilities of membership of nodes to blocks.

prediction

posterior predictions of statistics.

References

Handcock, M. S. (2003). Assessing degeneracy in statistical models of social networks. Technical report, Center for Statistics and the Social Sciences, University of Washington, Seattle, http://www.csss.washington.edu/Papers.

Holland, P. W. and S. Leinhardt (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, Theory & Methods, 76, 33–65.

Nowicki, K. and T. A. B. Snijders (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, Theory & Methods, 96, 1077–1087.

Peng, L. and L. Carvalho (2015). Bayesian degree-corrected stochastic block models for community detection. Technical report, Boston University, arXiv:1309.4796v1.

Snijders, T. A. B. and K. Nowicki (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification 14, 75–100.

Schweinberger, M. (2011). Instability, sensitivity, and degeneracy of discrete exponential families. Journal of the American Statistical Association, Theory & Methods, 106, 1361–1370.

Schweinberger, M. and M. S. Handcock (2015). Local dependence in random graph models: characterization, properties, and statistical Inference. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 7, 1-30, in press.

Schweinberger, M., Petrescu-Prahova, M. and D. Q. Vu (2014). Disaster response on September 11, 2001 through the lens of statistical network analysis. Social Networks, 37, 42–55.

Vu, D. Q., Hunter, D. R. and M. Schweinberger (2013). Model-based clustering of large networks. Annals of Applied Statistics, 7, 1010–1039.

See Also

network, ergm.terms, hergm.terms, hergm.gof, hergm.plot, summary

Examples

## Not run: 
data(example)

hergm(d ~ edges_i)

hergm(d ~ edges_ij)

hergm(d ~ edges_ij + triangle_ijk)

data(sampson)

hergm(samplike ~ arcs_i + arcs_j)

hergm(samplike ~ edges_ij + mutual_ij)

hergm(samplike ~ edges_ij + mutual_ij + ttriple_ijk)

## End(Not run)

Results