Last data update: 2014.03.03

R: A function to summarize input data into sufficient statistics...
GetStatisticsR Documentation

A function to summarize input data into sufficient statistics for estimating the attachment function and node fitness

Description

The function summarizes input data into sufficient statistics for estimating the attachment function and node fitness, together with additional information about the data, such as the total number of nodes, the number of time-steps, the maximum degree, and the final degree of the network, etc. . It also provides mechanisms to automatically deal with very large datasets by binning the degree, setting a degree threshold, or discarding time-steps.

Usage

GetStatistics(data, net_type, only_PA,Binning, G, start_deg, deg_threshold, 
CompressMode, CompressRatio, CustomTime)

Arguments

data

Matrix. A 3-column matrix whose each row contains information of one edge in the form (from_node id, to_node id, time_stamp). from_node id is the id of the source node. to_node id is the id of the destination node. time_stamp is the arrival time of the edge. from_node id and to_node id are assumed to be integers starting from 0. time_stamp can be either numeric or string. We assume that a smaller time_stamp represents an earlier arrival time.

net_type

String. Indicates the type of network. Default value is "directed".

only_PA

Logical. Indicates whether the statistics for estimating A_k are summarized (if TRUE, saving memory at the cost of unable to estimate node fitness). Default value is FALSE.

Binning

Logical. Indicates whether the degree should be binned together. Default value is TRUE.

G

Integer. Number of bins. Default value is 1000.

start_deg

Integer. The degree from which the program start to binning the degree together. Default value is 0.

deg_threshold

Integer. Only the fitnesses of nodes whose number of new edges acquired is not less than deg_threshold will be estimated. The fitnesses of all other nodes are fixed at 1. Default value is 1.

CompressMode

Integer. Indicates whether the timeline should be compressed. Default value is 0. The value of CompressMode:

0: No compression

1: Compressed by using a subset of time-steps. The time stamps in this subset are equally spaced. The size of this subset is CompressRatio times the size of set of all time stamps.

2: Compressed by only starting from the first time-step when CompressRatio*100 percentages of the total number of edges (in the final state of the network) had already been added to the network.

3: This mode offers the most flexibility, but requires user to supply the time stamps in CustomTime. Only time stamps in this CustomTime will be used. This mode can be used, for example, when investigating the change of the attachment function or node fitness in different time intervals.

CompressRatio

Numeric. Indicates how much we should compress. Default value is 0.5.

CustomTime

Vector. Custom time stamps.Only effective if CompressMode == 3. In that case, only these time stamps are used.

Value

An object of class PAFitData, which is a list. Some important fields are:

offset_tk

A matrix where the (t,k+1) element is the number of nodes with degree k at time t, counting among all the nodes whose number of new edges acquired is less than deg_thresh

n_tk

A matrix where the (t,k+1) element is the number of nodes with degree k at time t

m_tk

A matrix where the (t,k+1) element is the number of new edges connect to a degree-k node at time t

Sum_m_k

A vector where the (k+1)-th element is the total number of edges that linked to a degree k node, counting over all time steps

node_degree

A matrix recording the degree of all nodes at each time step

m_t

A vector where the t-th element is the number of new edges at time t

z_j

A vector where the j-th element is the total number of edges that linked to node j

N

Numeric. The number of nodes in the network

T

Numeric. The number of time steps

deg.max

Numeric. The maximum degree in the final network

node_id

A vector contains the id of all nodes

final_deg

A vector contains the final degree of all nodes

deg_thresh

Numeric. The specified degree threshold.

f_position

Numeric vector. The index in the node_id vector of the nodes we want to estimate (i.e. nodes whose number of new edges acquired is not less deg_thresh)

start_deg

Numeric. The degree at which we start binning.

begin_deg

Numeric vector contains the beginning degree of each bin

end_deg

Numeric vector contains the ending degree of each bin

interval_length

Numeric vector contains the length of each bin.

Binning

Logical. Indicates whether binning was applied or not.

G

Integer. Number of bins

TimeCompressMode

Integer. The mode of time compression.

T_compressed

Integer. The number of time stamps actually used

compressed_unique_time

The time stamps that are actually used

CompressRatio

Numeric.

CustomTime

Vector. The time stamps specified by user.

Author(s)

Thong Pham thongpham@thongpham.net

References

1. Pham, T. and Sheridan, P. and Shimodaira, H. (2015). Nonparametric estimation of the preferential attachment function in complex networks: evidence of deviations from log linearity, in press. Proceedings of ECCS 2014: European Conference on Complex Systems.

2. Pham T, Sheridan P, Shimodaira H (2015) PAFit: A Statistical Method for Measuring Preferential Attachment in Temporal Complex Networks. PLoS ONE 10(9): e0137796. doi:10.1371/journal.pone.0137796 (http://dx.doi.org/10.1371/journal.pone.0137796)

Examples

library("PAFit")
data   <- GenerateNet(N = 1000,m = 1,mode = 1, alpha = 1, shape = 5, rate = 5)
stats  <- GetStatistics(data$graph)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(PAFit)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/PAFit/GetStatistics.Rd_%03d_medium.png", width=480, height=480)
> ### Name: GetStatistics
> ### Title: A function to summarize input data into sufficient statistics
> ###   for estimating the attachment function and node fitness
> ### Aliases: GetStatistics
> ### Keywords: fitness model Preferential Attachment function
> ###   Minorize-Maximization algorithms Bianconi-Barabasi model
> ###   Barabasi-Albert model
> 
> ### ** Examples
> 
> library("PAFit")
> data   <- GenerateNet(N = 1000,m = 1,mode = 1, alpha = 1, shape = 5, rate = 5)
> stats  <- GetStatistics(data$graph)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>