R/ergmito-methods.R
, R/ergmito.R
ergmito.Rd
ergmito
uses Maximum Likelihood Estimation (MLE) to fit Exponential Random
Graph Models for single or multiple small networks, the later using
pooled-data MLE. To do so we use exact likelihoods, which implies fully
enumerating the support of the model. Overall, the exact likelihood
calculation is only possible when dealing with directed (undirected) networks
size 5 (7). In general, directed (undirected) graphs with more than 5 (7)
vertices should not be fitted using MLE, but instead other methods such as
the MC-MLE algorithm or the Robbins-Monro Stochastic Approximation algorithm,
both of which are available in the ergm R package.The workhorse function of
ergmito
is the ergm
package function ergm::ergm.allstats()
.
# S3 method for ergmito vcov(object, solver = NULL, ...) ergmito( model, model_update = NULL, stats_weights = NULL, stats_statmat = NULL, optim.args = list(), init = NULL, use.grad = TRUE, target_stats = NULL, ntries = 1L, keep.stats = TRUE, target_offset = NULL, stats_offset = NULL, ... )
object | An object of class |
---|---|
solver | Function. Used to compute the inverse of the hessian matrix. When
not null, the variance-covariance matrix is recomputed using that function.
By default, |
... | Further arguments passed to the method. In the case of |
model | Model to estimate. See ergm::ergm. The only difference with
|
model_update | A |
stats_weights | Either an integer vector or a list of integer vectors (see exact_loglik). |
stats_statmat | Either a matrix or a list of matrices (see exact_loglik). |
optim.args | List. Passed to stats::optim. |
init | A numeric vector. Sets the starting parameters for the optimization routine. Default is a vector of zeros. |
use.grad | Logical. When |
target_stats | A matrix of target statistics (see ergm::ergm). |
ntries | Integer scalar. Number of tries to estimate the MLE (see details). |
keep.stats | Logical scalar. When |
target_offset, stats_offset | See |
An list of class ergmito
:
call
The program call.
coef
Named vector. Parameter estimates.
iterations
Integer. Number of times the log-likelihood was evaluated
(see stats::optim).
mle.lik
Numeric. Final value of the objective function.
null.lik
Numeric. Final value of the objective function for the null model.
covar
Square matrix of size length(coef)
. Variance-covariance matrix
computed using the exact hessian as implemented in exact_hessian.
coef.init
Named vector of length length(coef)
. Initial set of parameters
used in the optimization.
formulae
An object of class ergmito_loglik.
nobs
Integer scalar. Number of networks in the model.
network
Networks passed via model
.
optim.out
,optim.args
Results from the optim call and arguments passed to it.
status
,note
Convergence code. See check_convergence
best_try
Integer scalar. Index of the run with the highest log-likelihood value.
history
Matrix of size ntries * (k + 1)
. History of the parameter
estimates and the reached log-likelihood values.
timer
Vector of times (for benchmarking). Each unit marks the starting
point of the step.
Methods base::print()
, base::summary()
, stats::coef()
, stats::logLik()
,
stats::nobs()
, stats::vcov()
, stats::AIC()
, stats::BIC()
,
stats::confint()
, and stats::formula()
are available.
The support of the sufficient statistics is calculated using ERGM's
ergm::ergm.allstats()
function.
Maximum Likelihood Estimates are obtained using the stats::optim function.
The default method for maximization is BFGS
using both the log-likelihood
function and its corresponding gradient.
Another important factor to consider is the existence of the MLE estimates As shown in Handcock (2003), if the observed statistics are near the border if the support function (e.g. too many edges or almost none), then, even if the MLE estimates exists, the optimization function may not be able to reach the optima. Moreover, if the target (observed) statistics live in the boundary, then the MLE estimates do not exists. In general, this should not be an issue in the context of the pooled model, as the variability of observed statistics should be enough to avoid those situations.
The function ergmito
will try to identify possible cases of non-existence,
of the MLE, and if identified then try to re estimate the model parameters using
larger values than the ones obtained, if the log-likelihood is greater, then it is
assumed that the model is degenerate and the corresponding values will be
replaced with either +Inf
or -Inf
. By default, this behavior is checked
anytime that the absolute value of the estimates is greater than 5, or the
sufficient statistics were flagged as potentially outside of the interior of
the support (close to zero or to its max).
In the case of ntries
, the optimization is repeated that number of times,
each time perturbing the init
parameter by adding a Normally distributed
vector. The result which reaches the highest log-likelihood will be the one
reported as parameter estimates. This feature is intended for testing only.
Anecdotally, optim
reaches the max in the first try.
The function plot.ergmito()
and gof_ergmito()
for post-estimation
diagnostics.
# Generating a small graph set.seed(12) n <- 4 net <- rbernoulli(n, p = .3) model <- net ~ edges + mutual library(ergm) ans_ergmito <- ergmito(model)#> Warning: The observed statistics (target.statistics) are near or at the boundary of its support, i.e. the Maximum Likelihood Estimates maynot exist or be hard to be estimated. In particular, the statistic(s) "edges", "mutual".#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#> [,1] #> [1,] -7.977968#> [,1] #> [1,] -7.977994summary(ans_ergmito)#> #> ERGMito estimates (MLE) #> This model includes 1 networks. #> #> formula: #> net ~ edges + mutual #> <environment: 0x11d8d5a0> #> #> Estimate Std. Error z value Pr(>|z|) #> edges 0.69312 1.11802 0.6199 0.5353 #> mutual -1.38627 1.73205 -0.8004 0.4235 #> AIC: 19.95594 BIC: 20.92575 (Smaller is better.)summary(ans_ergm)#> #> ========================== #> Summary of model fit #> ========================== #> #> Formula: net ~ edges + mutual #> <environment: 0x11d8d5a0> #> #> Iterations: 2 out of 20 #> #> Monte Carlo MLE Results: #> Estimate Std. Error MCMC % z value Pr(>|z|) #> edges 0.689 1.141 0 0.604 0.546 #> mutual -1.375 1.750 0 -0.786 0.432 #> #> Null Deviance: 16.64 on 12 degrees of freedom #> Residual Deviance: 15.93 on 10 degrees of freedom #> #> AIC: 19.93 BIC: 20.9 (Smaller is better.)# Example 2: Estimating an ERGMito using data with know DGP parameters ----- data(fivenets) model1 <- ergmito(fivenets ~ edges + nodematch("female"))#> Warning: The observed statistics (target.statistics) are near or at the boundary of its support, i.e. the Maximum Likelihood Estimates maynot exist or be hard to be estimated. In particular, the statistic(s) "edges", "nodematch.female".#> #> ERGMito estimates (MLE) #> This model includes 5 networks. #> #> formula: #> fivenets ~ edges + nodematch("female") #> <environment: 0x11d8d5a0> #> #> Estimate Std. Error z value Pr(>|z|) #> edges -1.70475 0.54356 -3.1363 0.001711 ** #> nodematch.female 1.58697 0.64305 2.4679 0.013592 * #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> AIC: 73.34109 BIC: 77.52978 (Smaller is better.)# Example 3: Likelihood ratio test using the lmtest R package if (require(lmtest)) { data(fivenets) model1 <- ergmito(fivenets ~ edges + nodematch("female")) model2 <- ergmito(fivenets ~ edges + nodematch("female") + mutual) lrtest(model1, model2) # Likelihood ratio test # # Model 1: fivenets ~ edges + nodematch("female") # Model 2: fivenets ~ edges + nodematch("female") + mutual # #Df LogLik Df Chisq Pr(>Chisq) # 1 2 -34.671 # 2 3 -34.205 1 0.9312 0.3346 }#>#>#> #>#>#> #>#> Warning: The observed statistics (target.statistics) are near or at the boundary of its support, i.e. the Maximum Likelihood Estimates maynot exist or be hard to be estimated. In particular, the statistic(s) "edges", "nodematch.female".#> Warning: The observed statistics (target.statistics) are near or at the boundary of its support, i.e. the Maximum Likelihood Estimates maynot exist or be hard to be estimated. In particular, the statistic(s) "edges", "nodematch.female", "mutual".#> Likelihood ratio test #> #> Model 1: fivenets ~ edges + nodematch("female") #> Model 2: fivenets ~ edges + nodematch("female") + mutual #> #Df LogLik Df Chisq Pr(>Chisq) #> 1 2 -34.671 #> 2 3 -34.205 1 0.9312 0.3346# Example 4: Adding an reference term for edge-count ---------------------- # Simulating networks of different sizes set.seed(12344) nets <- rbernoulli(c(rep(4, 10), rep(5, 10)), c(rep(.2, 10), rep(.1, 10))) # Fitting an ergmito under the Bernoulli model ans0 <- ergmito(nets ~ edges)#> Warning: The observed statistics (target.statistics) are near or at the boundary of its support, i.e. the Maximum Likelihood Estimates maynot exist or be hard to be estimated. In particular, the statistic(s) "edges".summary(ans0)#> #> ERGMito estimates (MLE) #> This model includes 20 networks. #> #> formula: #> nets ~ edges #> <environment: 0x11d8d5a0> #> #> Estimate Std. Error z value Pr(>|z|) #> edges -1.68640 0.15396 -10.954 < 2.2e-16 *** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> AIC: 279.3753 BIC: 283.1436 (Smaller is better.)# # ERGMito estimates # # formula: # nets ~ edges # # Estimate Std. Error z value Pr(>|z|) # edges -1.68640 0.15396 -10.954 < 2.2e-16 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # AIC: 279.3753 BIC: 283.1436 (Smaller is better.) # Fitting the model including a reference term for networks of size 5. # Notice that the variable -n- and other graph attributes can be used # with -model_update-. ans1 <- ergmito(nets ~ edges, model_update = ~ I(edges * (n == 5)))#> Warning: The observed statistics (target.statistics) are near or at the boundary of its support, i.e. the Maximum Likelihood Estimates maynot exist or be hard to be estimated. In particular, the statistic(s) "edges", "I(edges * (n == 5))".summary(ans1)#> #> ERGMito estimates (MLE) #> This model includes 20 networks. #> #> formula: #> nets ~ edges + I(edges * (n == 5)) #> <environment: 0x11d8d5a0> #> #> Estimate Std. Error z value Pr(>|z|) #> edges -1.18958 0.21583 -5.5116 3.556e-08 *** #> I(edges * (n == 5)) -0.90116 0.31250 -2.8837 0.00393 ** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> AIC: 272.9916 BIC: 280.5282 (Smaller is better.)# # ERGMito estimates # # formula: # nets ~ edges + I(edges * (n == 5)) # # Estimate Std. Error z value Pr(>|z|) # edges -1.18958 0.21583 -5.5116 3.556e-08 *** # I(edges * (n == 5)) -0.90116 0.31250 -2.8837 0.00393 ** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # AIC: 272.9916 BIC: 280.5282 (Smaller is better.) # The resulting parameter for the edge-count is smaller for networks # of size five plogis(coef(ans1)[1]) # 0.23#> edges #> 0.2333333#> [1] 0.11# We can see that in this case the difference in edge-count matters. if (require(lmtest)) { lrtest(ans0, ans1) # Likelihood ratio test # # Model 1: nets ~ edges # Model 2: nets ~ edges + I(edges * (n == 5)) # #Df LogLik Df Chisq Pr(>Chisq) # 1 1 -138.69 # 2 2 -134.50 1 8.3837 0.003786 ** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 }#> Likelihood ratio test #> #> Model 1: nets ~ edges #> Model 2: nets ~ edges + I(edges * (n == 5)) #> #Df LogLik Df Chisq Pr(>Chisq) #> 1 1 -138.69 #> 2 2 -134.50 1 8.3837 0.003786 ** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1