Last data update: 2014.03.03

R: The Random Subgraph Model
rsmR Documentation

The Random Subgraph Model

Description

Partition the vertices of a directed network with typed edges into clusters describing the connection patterns of subgraphs given as inputs. The inference is performed using a variational Bayes EM algorithm and the estimation of the number of clusters is obtained through a variational approximation of the marginal log likelihood.

Usage

rsm(X, sub, Klist, nbredo = 5, disp = F, maxit = 50)

Arguments

X

an adjacency matrix with oriented and typed edges.

sub

a vector indicating the subgraph in which each vertex belongs.

Klist

a vector indicating the range of possible values for the number of latent clusters; must be at least 2.

nbredo

number of initializations to be made; to avoid local optimum during variational Bayes EM algorithm.

disp

logical; if TRUE, display comments during the estimation.

maxit

maximum number of iterations for the algorithm.

Details

rsm implements the variational Bayes EM algorithm for the random subgraph model (RSM) as proposed by Jernite et al.(2012). The function takes a directed network with typed edges along with a decomposition of the network into known subgraphs as inputs.

The RSM model assumes that the presence of an edge between two vertices depends on the subgraphs they belong to. If an edge is present, its type is then assumed to be sampled from a multinomial distribution, depending on latent clusters. In practice, the subgraphs are known and given as inputs while the clusters have to be infered from the data. The clustering of the vertices is performed using a variational Bayes algorithm and the number of clusters is obtained with a model selection criterion which is a variational approximation of the marginal log likelihood.

The algorithm is runned for various values of the number of clusters (Klist). For each value of K in Klist, the algorithm is initialized nbredo times.

Assuming that the number of clusters is K, output[[K]]$lower represents the lower bound of the marginal log likelihood. output[[K]]$Pi contains the cluster connectivity. Each element of the K*K matrix is a vector of probability of C elements. Hence, output[[K]]$Pi is an array of size K*K*C.

In order to have a better chance of reaching a global optimum, VBEM is run for several initializations of a kmeans like algorithm (by default, nbredo = 5).

Value

Returns a list including:

N

Number of vertices of the network

R

Number of subgraphs

C

Number of edges types

Klist

vector indicating the range of possible values for the number of latent cluster; must be greater or equal to 2

X

an adjacency matrix with oriented and typed edges.

K_star

the number of clusters estimated.

gama

matrix of size R*R containing the probabilities of connection between subgraphs.

output

output list of length(Klist) + 1 items. Each item contains the result of the estimation for a given number of class K. Details of output field:

output[[K]]$lower

lower bound of the marginal log likelihood.

output[[K]]$alpha

matrix of size R*K containing the posterior proportions of each cluster in subgraphs.

output[[K]]$Pi

array of size K*K*C containing the cluster connectivity. Each element of the K*K matrix is a vector of C elements.

output[[K]]$Tau

matrix of posterior probabilities (estimated probabilies for each vertex to belong to the different clusters.

output[[K]]$Zcol

vector indicating the cluster of each vertex (MAP estimate).

Author(s)

Yacine Jernite, Laetitia Nouedoui, Charles Bouveyron, Pierre Latouche.

References

Yacine Jernite, Pierre Latouche, Charles Bouveyron, Patrick Rivera, Laurent Jegou and Stephane Lamasse(2012), The Random Subgraph Model for the Analysis of an Ecclesiastical Network in Merovingian Gaul, http://arxiv.org/abs/1212.5497

See Also

summary.rsm, plot.rsm

Examples

data(Regions)
res <- rsm(Regions$X, Regions$sub, Klist=2:4, nbredo=1, maxit=5) 
plot(res)
summary(res)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(Rambo)
Loading required package: sna
sna: Tools for Social Network Analysis
Version 2.3-2 created on 2014-01-13.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
 For citation information, type citation("sna").
 Type help(package="sna") to get started.

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/Rambo/rsm.Rd_%03d_medium.png", width=480, height=480)
> ### Name: rsm
> ### Title: The Random Subgraph Model
> ### Aliases: rsm
> ### Keywords: subgraph rsm
> 
> ### ** Examples
> 
> data(Regions)
> res <- rsm(Regions$X, Regions$sub, Klist=2:4, nbredo=1, maxit=5) 
> plot(res)
> summary(res)
Initial settings:
 30 vertices 
 2 subgraphs
 3 relations types

 The optimal number of cluster is K =  3 ; 
 with the lower bound equal:  -739.5941 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>