Movatterモバイル変換

Hoffman, M., Block, P., & Snijders, T. A. (2023). Modelingpartitions of individuals. Sociological Methodology, 53(1), 1-41.

The model allows for the analysis of emergent group compositions inpartitions, i.e., sets of non-overlapping groups. For a given partitionof individuals (or nodes, to follow the language of network analysis),we can use this model to understand group formation processes that ledto the observation of this partition, based on individual attributes,relations between individuals, and size-related factors. It can be seenas an extension of Exponential Random Graph Models (ERGMs) to the caseof partition objects. With this package, one can either simulate thismodel or estimate its parameters for a given dataset.

This package also provides a longitudinal extension of the model, fora list of partitions, where each partition depends on the previouspartitions. It follows the definition proposed in:

Hoffman, M., & Chabot, T. (2023). The role of selection insocioeconomic homophily: Evidence from an adolescent summer camp. SocialNetworks, 74, 259-274.

Note from the developers

The package and the documentation might still have bugs or errors, oryou might not be able to do what you want. In that case, or if you areunsure please create an issue here or directly send an email to thepackage maintainer (marion.hoffman[at]iast.fr).

Installation

# from GitHub:# install.packages("remotes")remotes::install_github("stocnet/ERPM")

# from CRAN:install.packages("ERPM")

Cross-sectional example

library(ERPM)

The Data

Let us define a set of n = 6 nodes with three attributes (label,gender, and age), and an arbitrary covariate matrix (friendship). Wecreate the following dataframe.

n<-6nodes<-data.frame(label =c("A","B","C","D","E","F"),gender =c(1,1,2,1,2,2),age =c(20,22,25,30,30,31))friendship<-matrix(c(0,1,1,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,1,0,0,1,0,0,0,0,1,0),6,6,TRUE)

We consider a partition for these 6 individuals. We define a vectorwith six elements, indicating the id of each individual’s group.

partition<-c(1,1,2,2,2,3)

Model specification

First, we need to choose the effects (i.e., explaining variables) wewant to include. For example we set four (which is of course notreasonable for 6 nodes): 1. “num_groups”: tendency to form more groups2. “same” (for the gender covariate): tendency to form groups withindividuals with the same gender 3. “diff” (for the age covariate):tendency to form groups with individuals with high age differences 4.“tie” (for the friendship covariate): tendency to form groups withfriends

effects<-list(names =c("num_groups","same","diff","tie"),objects =c("partition","gender","age","friendship"))objects<-list()objects[[1]]<-list(name ="friendship",object = friendship)

The effect objects should contain names of pre-written functions inthe package (see manual for all effect names) as well as objects theyare referring to (either the partition, or covariates). When objects arenot individual covariates, we need to create an additional list to storethese extra objects.

Estimation

The parameters of the model can be estimated using Maximum-likelihoodestimation (equivalent to the Method of Moment estimation). For moredetails on the parametrization of the estimation algorithm, see themanual.

estimation<-estimate_ERPM(partition,                          nodes,                          objects,                          effects,startingestimates =c(-1.5,0.2,-0.2,0.2),burnin =100,thining =20,length.p1 =500,# number of samples in phase 1multiplication.iter.p2 =20,# multiplication factor for the number of iteration in phase 2 subphasesnum.steps.p2 =4,# number of phase 2 subphaseslength.p3 =1000)# number of samples in phase 3estimation$results#>       effect     object        est   std.err         conv#> 1 num_groups  partition -1.7227170 2.1461381  0.002953806#> 2       same     gender  0.4383805 1.3031423 -0.003575942#> 3       diff        age -0.1900982 0.1493854  0.057271044#> 4        tie friendship  0.1845696 1.8675821 -0.007750238

Simulation

We can check how the model reproduces statistics of the observed databy simulating the estimated model. We can also use this function tosimulate theoretical models.

nsimulations<-1000simulations<-draw_Metropolis_single(theta = estimation$results$est,first.partition = partition,nodes = nodes,effects = effects,objects = objects,burnin =100,thining =20,num.steps = nsimulations,neighborhood =c(1,1,1),sizes.allowed =1:n,sizes.simulated =1:n,return.all.partitions = T)

Log-likelihood and AIC

Finally, we can estimate the log-likelihood and AIC of the model(useful to compare two models for example). First we need to estimatethe ML estimates of a simple model with only one parameter for number ofgroups (this parameter should be in the model!).

likelihood_function<-function(x){exp(x*max(partition))/compute_numgroups_denominator(n,x)}curve(likelihood_function,from=-2,to=0)

parameter_base<-optimize(likelihood_function,interval=c(-2,0),maximum=TRUE)parameters_basemodel<-c(parameter_base$maximum,0,0,0)

logL_AIC<-estimate_logL(partition,                         nodes,                         effects,                         objects,theta = estimation$results$est,theta_0 = parameters_basemodel,M =3,num.steps =200,burnin =100,thining =20)logL_AIC$logL#>           [,1]#> [1,] -4.342806logL_AIC$AIC#>           [,1]#> [1,] 0.6856111

More …

For more details on the longitudinal version of the model or otherfunctions, have a look at the manual in the documentation folder or theexample script in the scripts folder.