R/ergmitomethods.R
, R/ergmito.R
ergmito.Rd
ergmito
uses Maximum Likelihood Estimation (MLE) to fit Exponential Random
Graph Models for single or multiple small networks, the later using
pooleddata MLE. To do so we use exact likelihoods, which implies fully
enumerating the support of the model. Overall, the exact likelihood
calculation is only possible when dealing with directed (undirected) networks
size 5 (7). In general, directed (undirected) graphs with more than 5 (7)
vertices should not be fitted using MLE, but instead other methods such as
the MCMLE algorithm or the RobbinsMonro Stochastic Approximation algorithm,
both of which are available in the ergm R package.The workhorse function of
ergmito
is the ergm
package function ergm::ergm.allstats()
.
# S3 method for ergmito vcov(object, solver = NULL, ...) ergmito( model, model_update = NULL, stats_weights = NULL, stats_statmat = NULL, optim.args = list(), init = NULL, use.grad = TRUE, target_stats = NULL, ntries = 1L, keep.stats = TRUE, target_offset = NULL, stats_offset = NULL, ... )
object  An object of class 

solver  Function. Used to compute the inverse of the hessian matrix. When
not null, the variancecovariance matrix is recomputed using that function.
By default, 
...  Further arguments passed to the method. In the case of 
model  Model to estimate. See ergm::ergm. The only difference with

model_update  A 
stats_weights  Either an integer vector or a list of integer vectors (see exact_loglik). 
stats_statmat  Either a matrix or a list of matrices (see exact_loglik). 
optim.args  List. Passed to stats::optim. 
init  A numeric vector. Sets the starting parameters for the optimization routine. Default is a vector of zeros. 
use.grad  Logical. When 
target_stats  A matrix of target statistics (see ergm::ergm). 
ntries  Integer scalar. Number of tries to estimate the MLE (see details). 
keep.stats  Logical scalar. When 
target_offset, stats_offset  See 
An list of class ergmito
:
call
The program call.
coef
Named vector. Parameter estimates.
iterations
Integer. Number of times the loglikelihood was evaluated
(see stats::optim).
mle.lik
Numeric. Final value of the objective function.
null.lik
Numeric. Final value of the objective function for the null model.
covar
Square matrix of size length(coef)
. Variancecovariance matrix
computed using the exact hessian as implemented in exact_hessian.
coef.init
Named vector of length length(coef)
. Initial set of parameters
used in the optimization.
formulae
An object of class ergmito_loglik.
nobs
Integer scalar. Number of networks in the model.
network
Networks passed via model
.
optim.out
,optim.args
Results from the optim call and arguments passed to it.
status
,note
Convergence code. See check_convergence
best_try
Integer scalar. Index of the run with the highest loglikelihood value.
history
Matrix of size ntries * (k + 1)
. History of the parameter
estimates and the reached loglikelihood values.
timer
Vector of times (for benchmarking). Each unit marks the starting
point of the step.
Methods base::print()
, base::summary()
, stats::coef()
, stats::logLik()
,
stats::nobs()
, stats::vcov()
, stats::AIC()
, stats::BIC()
,
stats::confint()
, and stats::formula()
are available.
The support of the sufficient statistics is calculated using ERGM's
ergm::ergm.allstats()
function.
Maximum Likelihood Estimates are obtained using the stats::optim function.
The default method for maximization is BFGS
using both the loglikelihood
function and its corresponding gradient.
Another important factor to consider is the existence of the MLE estimates As shown in Handcock (2003), if the observed statistics are near the border if the support function (e.g. too many edges or almost none), then, even if the MLE estimates exists, the optimization function may not be able to reach the optima. Moreover, if the target (observed) statistics live in the boundary, then the MLE estimates do not exists. In general, this should not be an issue in the context of the pooled model, as the variability of observed statistics should be enough to avoid those situations.
The function ergmito
will try to identify possible cases of nonexistence,
of the MLE, and if identified then try to re estimate the model parameters using
larger values than the ones obtained, if the loglikelihood is greater, then it is
assumed that the model is degenerate and the corresponding values will be
replaced with either +Inf
or Inf
. By default, this behavior is checked
anytime that the absolute value of the estimates is greater than 5, or the
sufficient statistics were flagged as potentially outside of the interior of
the support (close to zero or to its max).
In the case of ntries
, the optimization is repeated that number of times,
each time perturbing the init
parameter by adding a Normally distributed
vector. The result which reaches the highest loglikelihood will be the one
reported as parameter estimates. This feature is intended for testing only.
Anecdotally, optim
reaches the max in the first try.
The function plot.ergmito()
and gof_ergmito()
for postestimation
diagnostics.
# Generating a small graph set.seed(12) n < 4 net < rbernoulli(n, p = .3) model < net ~ edges + mutual library(ergm) ans_ergmito < ergmito(model)#> Warning: The observed statistics (target.statistics) are near or at the boundary of its support, i.e. the Maximum Likelihood Estimates maynot exist or be hard to be estimated. In particular, the statistic(s) "edges", "mutual".#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#> [,1] #> [1,] 7.977968#> [,1] #> [1,] 7.977994summary(ans_ergmito)#> #> ERGMito estimates (MLE) #> This model includes 1 networks. #> #> formula: #> net ~ edges + mutual #> <environment: 0x11d8d5a0> #> #> Estimate Std. Error z value Pr(>z) #> edges 0.69312 1.11802 0.6199 0.5353 #> mutual 1.38627 1.73205 0.8004 0.4235 #> AIC: 19.95594 BIC: 20.92575 (Smaller is better.)summary(ans_ergm)#> #> ========================== #> Summary of model fit #> ========================== #> #> Formula: net ~ edges + mutual #> <environment: 0x11d8d5a0> #> #> Iterations: 2 out of 20 #> #> Monte Carlo MLE Results: #> Estimate Std. Error MCMC % z value Pr(>z) #> edges 0.689 1.141 0 0.604 0.546 #> mutual 1.375 1.750 0 0.786 0.432 #> #> Null Deviance: 16.64 on 12 degrees of freedom #> Residual Deviance: 15.93 on 10 degrees of freedom #> #> AIC: 19.93 BIC: 20.9 (Smaller is better.)# Example 2: Estimating an ERGMito using data with know DGP parameters  data(fivenets) model1 < ergmito(fivenets ~ edges + nodematch("female"))#> Warning: The observed statistics (target.statistics) are near or at the boundary of its support, i.e. the Maximum Likelihood Estimates maynot exist or be hard to be estimated. In particular, the statistic(s) "edges", "nodematch.female".#> #> ERGMito estimates (MLE) #> This model includes 5 networks. #> #> formula: #> fivenets ~ edges + nodematch("female") #> <environment: 0x11d8d5a0> #> #> Estimate Std. Error z value Pr(>z) #> edges 1.70475 0.54356 3.1363 0.001711 ** #> nodematch.female 1.58697 0.64305 2.4679 0.013592 * #>  #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> AIC: 73.34109 BIC: 77.52978 (Smaller is better.)# Example 3: Likelihood ratio test using the lmtest R package if (require(lmtest)) { data(fivenets) model1 < ergmito(fivenets ~ edges + nodematch("female")) model2 < ergmito(fivenets ~ edges + nodematch("female") + mutual) lrtest(model1, model2) # Likelihood ratio test # # Model 1: fivenets ~ edges + nodematch("female") # Model 2: fivenets ~ edges + nodematch("female") + mutual # #Df LogLik Df Chisq Pr(>Chisq) # 1 2 34.671 # 2 3 34.205 1 0.9312 0.3346 }#>#>#> #>#>#> #>#> Warning: The observed statistics (target.statistics) are near or at the boundary of its support, i.e. the Maximum Likelihood Estimates maynot exist or be hard to be estimated. In particular, the statistic(s) "edges", "nodematch.female".#> Warning: The observed statistics (target.statistics) are near or at the boundary of its support, i.e. the Maximum Likelihood Estimates maynot exist or be hard to be estimated. In particular, the statistic(s) "edges", "nodematch.female", "mutual".#> Likelihood ratio test #> #> Model 1: fivenets ~ edges + nodematch("female") #> Model 2: fivenets ~ edges + nodematch("female") + mutual #> #Df LogLik Df Chisq Pr(>Chisq) #> 1 2 34.671 #> 2 3 34.205 1 0.9312 0.3346# Example 4: Adding an reference term for edgecount  # Simulating networks of different sizes set.seed(12344) nets < rbernoulli(c(rep(4, 10), rep(5, 10)), c(rep(.2, 10), rep(.1, 10))) # Fitting an ergmito under the Bernoulli model ans0 < ergmito(nets ~ edges)#> Warning: The observed statistics (target.statistics) are near or at the boundary of its support, i.e. the Maximum Likelihood Estimates maynot exist or be hard to be estimated. In particular, the statistic(s) "edges".summary(ans0)#> #> ERGMito estimates (MLE) #> This model includes 20 networks. #> #> formula: #> nets ~ edges #> <environment: 0x11d8d5a0> #> #> Estimate Std. Error z value Pr(>z) #> edges 1.68640 0.15396 10.954 < 2.2e16 *** #>  #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> AIC: 279.3753 BIC: 283.1436 (Smaller is better.)# # ERGMito estimates # # formula: # nets ~ edges # # Estimate Std. Error z value Pr(>z) # edges 1.68640 0.15396 10.954 < 2.2e16 *** #  # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # AIC: 279.3753 BIC: 283.1436 (Smaller is better.) # Fitting the model including a reference term for networks of size 5. # Notice that the variable n and other graph attributes can be used # with model_update. ans1 < ergmito(nets ~ edges, model_update = ~ I(edges * (n == 5)))#> Warning: The observed statistics (target.statistics) are near or at the boundary of its support, i.e. the Maximum Likelihood Estimates maynot exist or be hard to be estimated. In particular, the statistic(s) "edges", "I(edges * (n == 5))".summary(ans1)#> #> ERGMito estimates (MLE) #> This model includes 20 networks. #> #> formula: #> nets ~ edges + I(edges * (n == 5)) #> <environment: 0x11d8d5a0> #> #> Estimate Std. Error z value Pr(>z) #> edges 1.18958 0.21583 5.5116 3.556e08 *** #> I(edges * (n == 5)) 0.90116 0.31250 2.8837 0.00393 ** #>  #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> AIC: 272.9916 BIC: 280.5282 (Smaller is better.)# # ERGMito estimates # # formula: # nets ~ edges + I(edges * (n == 5)) # # Estimate Std. Error z value Pr(>z) # edges 1.18958 0.21583 5.5116 3.556e08 *** # I(edges * (n == 5)) 0.90116 0.31250 2.8837 0.00393 ** #  # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # AIC: 272.9916 BIC: 280.5282 (Smaller is better.) # The resulting parameter for the edgecount is smaller for networks # of size five plogis(coef(ans1)[1]) # 0.23#> edges #> 0.2333333#> [1] 0.11# We can see that in this case the difference in edgecount matters. if (require(lmtest)) { lrtest(ans0, ans1) # Likelihood ratio test # # Model 1: nets ~ edges # Model 2: nets ~ edges + I(edges * (n == 5)) # #Df LogLik Df Chisq Pr(>Chisq) # 1 1 138.69 # 2 2 134.50 1 8.3837 0.003786 ** #  # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 }#> Likelihood ratio test #> #> Model 1: nets ~ edges #> Model 2: nets ~ edges + I(edges * (n == 5)) #> #Df LogLik Df Chisq Pr(>Chisq) #> 1 1 138.69 #> 2 2 134.50 1 8.3837 0.003786 ** #>  #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1