R: Estimate and simulate hierarchical exponential-family random...
hergm
R Documentation
Estimate and simulate hierarchical exponential-family random graph models with local dependence
Description
The function hergm estimates and simulates three classes of hierarchical exponential-family random graph models:
1. The p_1 model of Holland and Leinhardt (1981) in exponential-family form and extensions by Vu, Hunter, and Schweinberger (2013) and Schweinberger, Petrescu-Prahova, and Vu (2014) to both directed and undirected random graphs with additional model terms, with and without covariates, and with parametric and nonparametric priors (see arcs_i, arcs_j, edges_i, edges_ij, mutual_i, mutual_ij).
2. The stochastic block model of Snijders and Nowicki (1997) and Nowicki and Snijders (2001) in exponential-family form and extensions by Vu, Hunter, and Schweinberger (2013) and Schweinberger, Petrescu-Prahova, and Vu (2014) with additional model terms, with and without covariates, and with parametric and nonparametric priors (see arcs_i, arcs_j, edges_i, edges_ij, mutual_i, mutual_ij).
3. The exponential-family random graph models with local dependence of Schweinberger and Handcock (2015), with and without covariates, and with parametric and nonparametric priors (see arcs_i, arcs_j, edges_i, edges_ij, mutual_i, mutual_ij, twostar_ijk, triangle_ijk, ttriple_ijk, ctriple_ijk).
The exponential-family random graph models with local dependence replace the long-range dependence of conventional exponential-family random graph models by short-range dependence.
Therefore, the exponential-family random graph models with local dependence replace the strong dependence of conventional exponential-family random graph models by weak dependence,
reducing the problem of model degeneracy (Handcock, 2003; Schweinberger, 2011) and improving goodness-of-fit (Schweinberger and Handcock, 2015).
The function hergm postprocesses the output returned by the function hergm.
If hergm is called with relabel > 0,
it solves the so-called label-switching problem.
The label-switching problem is rooted in the invariance of the likelihood function to permutations of the labels of blocks, and implies that raw MCMC samples from the posterior cannot be used to infer to block-dependent entities.
The label-switching problem can be solved in a Bayesian decision-theoretic framework: by choosing a loss function and minimizing the posterior expected loss.
Two loss functions are implemented in hergm, the loss function of Schweinberger and Handcock (2015) (relabel == 1) and the loss function of Peng and Carvalho (2015) (relabel == 2).
The first loss function seems to be superior in terms of the reported clustering probabilities, but is more expensive in terms of computing time.
A rule of thumb is to use the first loss function when max_number < 15 and use the second loss function otherwise.
formula of the form network ~ terms.
Networks can be created by calling the function network.
Possible terms can be found in ergm.terms and hergm.terms.
max_number
maximum number of blocks.
hierarchical
hierarchical prior; if hierarchical == TRUE, prior is hierarchical (i.e., the means and variances of block parameters are governed by a hyper-prior), otherwise non-hierarchical (i.e., the means and variances of block parameters are fixed).
parametric
parametric prior; if parametric == FALSE, prior is truncated Dirichlet process prior, otherwise parametric Dirichlet prior.
initialize
if initialize == TRUE, initialize block memberships by spectral clustering.
perturb
if initialize == TRUE and perturb == TRUE, initialize block memberships by spectral clustering and perturb.
scaling
if scaling == TRUE, use size-dependent parameterizations which ensure that the scaling of between- and within-neighborhood terms is consistent with sparse edge terms.
alpha
concentration parameter of truncated Dirichlet process prior of natural parameters of exponential-family model.
alpha_shape, alpha_rate
shape and rate parameter of Gamma prior of scaling parameter.
eta
natural parameters of exponential-family random graph model.
eta_mean, eta_sd
means and standard deviations of Gaussian baseline distribution of Dirichlet process prior of natural parameters.
eta_mean_mean, eta_mean_sd
means and standard deviations of Gaussian prior of mean of Gaussian baseline distribution of Dirichlet process prior.
eta_precision_shape, eta_precision_rate
shape and rate (inverse scale) parameter of Gamma prior of precision parameter of Gaussian baseline distribution of Dirichlet process prior.
mean_between
if simulate == TRUE and eta == NULL, then mean_between specifies the mean-value parameter of edges between blocks.
all_indicators_fixed
indicates whether all indicators of the block memberships are fixed at the specified indicators; if some block memberships are unsspecified, spectral clustering is used to initialize all block memberships.
indicators_fixed
indicates whether the indicators of the block memberships are fixed at the specified indicators.
indicator
if the indicators of block memberships are specified as numbers between 1 and max_number, the specified indicators are either used as starting values (indicator_fixed == FALSE) or the indicators are fixed at the specified indicators (indicator_fixed == TRUE), which is useful when indicators of block memberhips are observed (which is the case in multilevel networks).
parallel
number of computing nodes; if parallel > 1, hergm is run on parallel computing nodes.
simulate
if simulate == TRUE, simulation of networks, otherwise Bayesian inference.
seeds
seed of pseudo-random number generator; if parallel > 1, number of seeds must equal number of computing nodes.
samplesize
if simulate == TRUE, number of network draws, otherwise number of posterior draws; if parallel > 1, number of draws on each computing node.
interval
if simulate == TRUE, number of proposals between sampled networks.
burnin
if simulate == TRUE, number of burn-in iterations.
mh.scale
if simulate == FALSE, scale factor of candicate-generating distribution of Metropolis-Hastings algorithm.
variational
if simulate == FALSE and variational == TRUE, variational methods are used to construct the proposal distributions of block memberships; limited to selected models.
temperature
if simulate == FALSE and variational == TRUE, minimum and maximum temperature; the temperature is used to melt down the proposal distributions of indicators, which are based on the full conditional distributions of indicators but can have low entropy, resulting in slow mixing of the Markov chain; the temperature is a function of the entropy of the full conditional distributions and is designed to increase the entropy of the proposal distributions, and the minimum and maximum temperature are user-defined lower and upper bounds on the temperature.
predictions
if predictions == TRUE and simulate == FALSE, returns posterior predictions of statistics in the model.
posterior.burnin
number of burn-in iterations; if computing is parallel, number of burn-in iterations per processor.
posterior.thinning
if thinning > 1, every thinning-th sample point is used while all others discarded.
relabel
if relabel > 0, relabel MCMC sample by minimizing the posterior expected loss of Schweinberger and Handcock (2015) (relabel == 1) or Peng and Carvalho (2015) (relabel == 2).
number.runs
if relabel == 1, number of runs of relabeling algorithm.
verbose
if verbose == -1, no console output; if verbose == 0, short console output; if verbose == +1, long console output.
...
additional arguments, to be passed to lower-level functions in the future.
Value
If called with the option simulate == TRUE,
the function hergm returns a sample of networks,
otherwise a MCMC sample from the posterior.
ergm_theta
parameters of ergm-terms.
alpha
concentration parameter of truncated Dirichlet process prior of parameters of hergm-terms.
eta_mean
mean parameters of Gaussian base distribution of parameters of hergm-terms.
eta_precision
precision parameters of Gaussian base distribution of parameters of hergm-terms.
hergm_theta
parameters of hergm-terms.
loss
if relabel == TRUE, local minimum of loss function.
p_k
probabilities of membership to blocks.
indicator
indicators of memberships of nodes.
p_i_k
probabilities of membership of nodes to blocks.
prediction
posterior predictions of statistics.
References
Handcock, M. S. (2003). Assessing degeneracy in statistical models of social networks. Technical report, Center for Statistics and the Social Sciences, University of Washington, Seattle, http://www.csss.washington.edu/Papers.
Holland, P. W. and S. Leinhardt (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, Theory & Methods, 76, 33–65.
Nowicki, K. and T. A. B. Snijders (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, Theory & Methods, 96, 1077–1087.
Peng, L. and L. Carvalho (2015). Bayesian degree-corrected stochastic block models for community detection. Technical report, Boston University, arXiv:1309.4796v1.
Snijders, T. A. B. and K. Nowicki (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification 14, 75–100.
Schweinberger, M. (2011). Instability, sensitivity, and degeneracy of discrete exponential families. Journal of the American Statistical Association, Theory & Methods, 106, 1361–1370.
Schweinberger, M. and M. S. Handcock (2015). Local dependence in random graph models: characterization, properties, and statistical Inference. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 7, 1-30, in press.
Schweinberger, M., Petrescu-Prahova, M. and D. Q. Vu (2014). Disaster response on September 11, 2001 through the lens of statistical network analysis. Social Networks, 37, 42–55.
Vu, D. Q., Hunter, D. R. and M. Schweinberger (2013). Model-based clustering of large networks. Annals of Applied Statistics, 7, 1010–1039.