1 Spatial DispersionIndex (SDI) for Analysis of Activity Outreach in Spatial and GeographicNetworks

Consider a network of movements or exchanges between places. This iscommonplace in socio-economic activities. For example: when you ordersomething from Amazon, the movement of your package from one warehouseto the next is part of Amazon’s shipment network, or even part of theglobal shipment network. Our commutes to work in the morning can beconsidered a commute network between neighborhoods/cities/offices. Eachof these cases can be considered as an specialized instance of themathematical concept of ‘graph’ called spatial graph: a graph consistingof vertices with fixed locations, and arcs/edges connecting thesevertices.

In social and economic sciences analysis of relations in such anetwork is very interesting, and with the recent availability andcoverage of spatial network data, very useful for managerial planning inprivate firms and policy decisions such as urban planning in publicagencies. On the other hand metrics concerning spatial aspects ofnetworks are almost always problem specific and not general. SpatialDispersion Index (SDI) is a generalized measurement index, or rather afamily of indices to evaluate spatial distances of movements in anetwork in a problem neutral way, thus aims to address this problem.rSDI computes and optionally visualizes this index with minimalhassle:

library(rSDI)SDI(TurkiyeMigration.flows, TurkiyeMigration.nodes,variant="vow")%>%plotSDI(variant="vow")

rSDI.R

The core idea of the SDI index was conceived as part of a large scalegovernment commissioned study report, in Turkish(Gençer et al.2020). The SDI index was later was generalized and publishedon its own merit, and explained in detail in the paper(Gençer2023).

rSDI package provides functions to compute SDI family of indices forspatial graphs in conjunction with its definition the paper(Gençer 2023).rSDI also provides some convenience functions to visualize SDI indexmeasurements. While this is not its primary reason of existence it isoften very practical for the user to have some preliminary visualizationat arm’s length. In sections 2 and 3 below we first explain the conceptof spatial networks and their data, then review mathematical graphformalism to represent spatial networks. Then we introduce the SDI indexfamily’s calculation, its interpretation, and thumb rules for choosingan index for your analyses. The last two sections provide a run throughof index calculation then visualization features of the rSDI packageusing an example data set provided by the package, on human migrationbetween provinces of Turkiye.

2 Spatial networks andtheir data

Spatial networks are represented as a particular type of graph wherethe graph nodes (vertices) are fixed locations and each graph arc/edgerepresent a flow/relation between two of these nodes. In most real lifecases these networks represent varying flows of people(e.g. transportation), good (e.g. trade, shopping), or information(e.g. Internet data transfer, phone call). Thus the graph is weightedand directed and has arcs, rather than edges. Also in most cases thenetwork is geospatial. In geospatial networks the locations of verticesin the representing graph are, for example, cities, airports, etc., andare defined with their latitude and longitude. This is the case for mostexamples of movements related to trade, migration, education, services,etc. In other cases the spatial network may span a smaller space and israther measured on its own Cartesian references; for example in the caseof student movement on a campus, or movement of parts in a productionfacility. In those latter cases vertices (e.g. campus library, a weldingstation) have an x-y position defined with respect to a chosen corner orcenter of the campus, production facility, etc.

Spatial network data consists of two data frames: one representingthe flows and the other detailing the locations, and possibly labels ofnodes in the network. The following is a simple, imaginary spatialnetwork data:

rSDI.R

Data frames providing flows (left) and nodes (right) for an imaginaryspatial network

from	to	weight
A	B	10
B	A	20
A	C	5

id	x	y
A	0	3
B	4	0
C	0	0
D	4	3

rSDI.R

This spatial network is visualized below, showing node locations aswell as flow amounts (weights) on lines representing edges:

rSDI.R

3 Graph notation torepresent spatial networks

A spatial network,\(N\), isrepresented with the mathematical concept of graph, which consists ofvertices,\(V\), representing thelocations/nodes in the spatial network and ties/edges,\(E\) representing flows tying them togetherinto a network, thus\(N=(V,E)\). Tocapture a flow over an edge\(e_{ij}\)from vertex\(i\) to vertex\(j\) let us denote the amount of flow on theedge as edge weight\(w_{i\rightarrowj}\). In graph theoretic terms this corresponds to a directed andweighted graph.

To capture spatial aspects of the network let\(p_i=<x_i,y_i>\) and\(p_j=<x_j, y_j>\) denote locations ofvertices\(i\) and\(j\), respectively, in some two dimensionalspace such as Cartesian or geographic locations. In the latter, thecoordinates\(x\) and\(y\) would denote the longitude and latitudeof a geographical location, respectively. One can now speak of a spatialdistance,\(\delta_{ij}\), between anytwo vertices. In the case of geographical networks Haversine distancewould be appropriate for determining spherical distances between twolocations:

\[\begin{equation}\delta^{H}_{ij}=2R\arcsin\left(\sqrt{\sin^{2}\left({\frac{y_j-y_i}{2}}\right)+\cos(\varphi_{i})\cos(\varphi_{j})\sin^{2}\left({\frac{x_j-x_i}{2}}\right)}\right)\end{equation}\] Where\(R\) isthe radius of the Earth, which is roughly\(6,371\) km.

In the case of a more local spatial network we would probably haveCartesian coordinates, e.g. x-y coordinates within a production plant,of which we analyse flows of parts between stations. In those cases anEuclidean distance can be used instead:\[\begin{equation}\delta^E_{ij}=\sqrt{(x_i-x_j)^2+(y_i-y_j)^2}\end{equation}\]

In our toy example from the previous section the Euclidean distancescan be easily calculated (since it is a simple 3-4-5 triangle) for eachedge as follows (defaulting to Euclidean distance, since we have nolatitude/longitude information in this example):

The flow data with distances between the source and targetlocation of flows added
from	to	weight	distance
A	B	10	5
B	A	20	5
A	C	5	3

rSDI.R

4 SDI: Definition anduses

In order to quantify spatial reach of the flows in a spatial network,the spatial distance of two nodes should be incorporated with the flowbetween the nodes. The Spatial Dispersion Index here is a directtranslation of this idea and is broadly defined as the weighted averagedistances the network flows span, wighted by flow amounts. The key ideawas conceived by the author, explained thoroughly and put into use in abroader field study report(Gençer 2023). A brief discussionand definition is presented here.

SDI is a family of indices rather than a single index. The reason forits variants is related to differential research interests whenanalyzing spatial networks. Here we explain these variations. Furtherbelow we introduce the three letter, XXX, notation to symbolizecorresponding SDI variants:

First: SDI can be computed for the whole network,\(\textrm{SDI}(N)\), or on a per-node orper-vertex basis,\(\textrm{SDI}(i)\).
Second: Particularly when it is computed on a per-node basis, onemay be interested in the dispersion of out-flows or in-flows only, orboth (undirected): ie.\(\textrm{SDI}_{-}(\ldots)\),\(\textrm{SDI}_{+}(\ldots)\), or\(\textrm{SDI}_{\pm}(\ldots)\), where a +sign denotes out-dispersion and a - sign denotes in-dispersionindices.
Third: In social network analysis theory, there is an importantdiscussion about strength of relations(Opsahl, Agneessens, and Skvoretz2010): presence of a relation has an importance as differentfrom the strength of it. This warns us against, for example, assumingthat a flow of one bird from location A-to-B in a bird migration networkis negligible when compared to flow of one thousand birds from A-to-C.The very existence of A-to-B flow has an importance at its own right, asseparate from its strength, e.g. for its future effects of spreading ofa bird species. For this reason there a several variants of socialnetwork metrics calculations in the social networks analysis area which:
- considers only presence of relations: this corresponds to unweightedSDI calculation variant which treats each flow as unit strength:\(\textrm{SDI}_{\ldots}^{u}(\ldots)\)
- considers only strength of relations: this corresponds to weightedSDI calculation variant which cares only strengths/amounts of flows:\(\textrm{SDI}_{\ldots}^{w}(\ldots)\)
- considers both presence and strength of relations, with a user givenpreference tuning parameter: this corresponds to generalized SDIcalculation:\(\textrm{SDI}_{\ldots}^{w\alpha}(\ldots)\)

As an illustrative example, network level, weighted SDI index wouldbe computed as follows¹:\[\begin{equation}\textrm{SDI}^w(N)=\frac{\sum_{i \rightarrow j \in E}{(w_{i\rightarrow j}\cdot \delta_{ij})}}{\sum_{i\rightarrow j \in E}{w_{i\rightarrow j}}}\end{equation}\] For our toy problem this could be computed as:\(\textrm{SDI}^w(N)=(10*5+20*5+5*3)/(10+20+5)\)

Whereas a node level, unweighted, out-flows only index would becomputed by replacing all weights with 1s:\[\begin{equation}\textrm{SDI}^u_{+}(i)=\frac{\sum_{i\rightarrow j \in E}{(1 \cdot\delta_{ij})}}{\sum_{i\rightarrow j \in E}{1}}\end{equation}\] Which is simply the average of distances of theflows towards the focal node. For our toy problem’s node A, this can becomputed as\(\textrm{SDI}^u_{+}(A)=(5+3)/2\)

Please consult the source paper,Gençer (2023), and help pages for anextensive description of index calculation for the above cases.

SDI computation uses a three letter index variant code to represent avariant of the index. The LDS code corresponds to usage ofLevel-Direction-and-Strength of network ties, respectively. For examplean LDS code of “nuw” would mean anetwork level,undirected, andweighted SDI variant.Each part of the LDS code can take the following values:

Level:network orvertex
Direction: usein-flows only, useout-flows only, orundirected (use alledges in an undirected graph or use both in and out flows in a directedgraph),
Strength:weighted,unweighted, orgeneralized (combination of weighted and unweightedwith alpha parameter)

5 A simple example

rSDI functions consume an igraph object and return their output as anigraph which has additional edge, vertex, and/or graph attributes. Letus start with an example involving the helper functiondist_calc(). This function is not neded to be calledexplicitly in a normal workflow, but normally invoked bySDI(), the main entry point of SDI calculations. Itcomputes the distances between pairs of nodes which are connected byeach graph edge. The computed distances are returned as edge attributesof the returned graph. Consider the following spatial network dataframes for the fictional spatial network above:

flows<-data.frame(from=c("A","B","A"),to=c("B","A","C"),weight=c(10,20,5))nodes<-data.frame(id=c("A","B","C","D"),x=c(0,4,0,4),y=c(3,0,0,3))library(igraph)toyGraph<-graph_from_data_frame(flows,directed=TRUE,vertices=nodes)

rSDI.R

The edges of the graph has only the ‘weight’ attribute:

#> [1] "weight"

rSDI.R

rSDI’s main function isSDI().SDI()function works in a similar fashion and adds its output as graph andvertex attribute (in addition to computing and adding edge distanceattributes if they are missing, which is a prerequisite for all SDImetrics):

toyGraphWithSDI<-SDI(toyGraph)#same as SDI(toyGraph, level="vertex", directionality="undirected", weight.use="weighted")edge_attr_names(toyGraphWithSDI)#> [1] "weight"   "distance"vertex_attr_names(toyGraphWithSDI)#> [1] "name"    "x"       "y"       "SDI_vuw"

rSDI.R

To help its user follow the theoretical distinctions explained in theprevious section, rSDI letter codes the index measurements it measuresşn accordance with that classification. In the the example above, callto SDI function computes (1)vertex level, (2)undirected,, and (3)weighted SDIindex, which are the defaults. Thus to each vertex of its input graph itadds and attribute named ‘SDI_vuw’. The attribute is added to eachvertex even if the index cannot be computed. This is the case for vertexD which has an NA value stored in its ‘SDI_vuw’ attribute:

vertex_attr(toyGraphWithSDI,"SDI_vuw")#> [1] 4.714286 5.000000 3.000000       NA

rSDI.R

If the index is computed at the network level the vertices will nothave additional attributes but the graph itself will, following the sameconvention:

toyGraphWithNetworkSDI<-SDI(toyGraph,level="network",directionality="undirected",weight.use="weighted")graph_attr_names(toyGraphWithNetworkSDI)#> [1] "SDI_nuw"graph_attr(toyGraphWithNetworkSDI,"SDI_nuw")#> [1] 4.714286

rSDI.R

Once you are comfortable with this convention you can shorten yourcalls toSDI() using the ‘variant’ parameter as follows,which is equivalent to the call in the example above:

toyGraphWithNetworkSDI<-SDI(toyGraph,variant="nuw")

rSDI.R

SDI will leave previously computed indices untouched. Thus, forexample, you can compute several indices in a pipe:

toyGraph%>%SDI(variant="nuw")%>%SDI(variant="niu")%>%# nuu?SDI(variant="vuw")%>%SDI(variant="vuu")-> toyGraphWithSeveralSDIgraph_attr_names(toyGraphWithSeveralSDI)#> [1] "SDI_nuw" "SDI_nuu"vertex_attr_names(toyGraphWithSeveralSDI)#> [1] "name"    "x"       "y"       "SDI_vuw" "SDI_vuu"

rSDI.R The same can be achieved by using a vector of variants in asingle call:

toyGraphWithSeveralSDI<-SDI(toyGraph,variant=c("nuw","niu","vuw","vuu"))graph_attr_names(toyGraphWithSeveralSDI)#> [1] "SDI_nuw" "SDI_nuu"vertex_attr_names(toyGraphWithSeveralSDI)#> [1] "name"    "x"       "y"       "SDI_vuw" "SDI_vuu"

rSDI.R

Note that for the generalized SDI variant you must provide theadditional\(\alpha\) parameter:

toyGraphWithGeneralizedSDI<-SDI(toyGraph,variant="vug",alpha=0.5)vertex_attr_names(toyGraphWithGeneralizedSDI)#> [1] "name"    "x"       "y"       "SDI_vug"vertex_attr(toyGraphWithGeneralizedSDI,"SDI_vug")#> [1] 4.252907 4.472136 3.464102       NA

rSDI.R ## Optional distance calculation

Calling thedist_calc() helper function adds a distanceattribute to an input graph. This is automatically performed whenSDI() is called, but you may facilitate it separately ifneeded. For the example in the previous section the call is made asfollows:

toyGraphWithDistances<-dist_calc(toyGraph)edge_attr_names(toyGraphWithDistances)#> [1] "weight"   "distance"

rSDI.R

Having seen the coordinate attributes as ‘x’ and ‘y’ (rather than as‘latitude’ and ‘longitude’) the function opts for a Euclidean distancecalculation and returns the 3-4-5 triangle distances:

edge_attr(toyGraphWithDistances,"distance")#> [1] 5 5 3

rSDI.R

6 Example: Computing andplotting SDI for a geospatial network

rSDI package comes with a real world data set consisting of two dataframes:TurkiyeMigration.flows contains the data onmigration of people between Türkiye’s provinces in the period2016-2017-2018, a consolidated version of raw data from TurkishStatistical Institute.TurkiyeMigration.nodes containslabels and geographic coordinates (latitute&longitude) ofprovinces:

head(TurkiyeMigration.flows)#>    from    to    weight#> 1 TRC12 TR621  737.0000#> 2 TR332 TR621  319.6667#> 3 TRA21 TR621  213.0000#> 4 TR712 TR621  412.6667#> 5 TR834 TR621  158.3333#> 6 TR510 TR621 2594.6667head(TurkiyeMigration.nodes)#>      id            label longitude latitude#> 1 TR100   \\u0130stanbul  28.96711 41.00893#> 2 TR211   Tekirda\\u011f  27.51167 40.97809#> 3 TR212           Edirne  26.55596 41.67717#> 4 TR213 K\\u0131rklareli  27.22437 41.73547#> 5 TR221  Bal\\u0131kesir  27.88834 39.65046#> 6 TR222  \\u00c7anakkale  26.40859 40.14672

rSDI.R

You may call theSDI() function either with an igraphobject you compose yourself from flow and node data, or directly givingthem to SDI, as follows:

TMSDI<-SDI(TurkiyeMigration.flows, TurkiyeMigration.nodes,variant="vuw")#   -- OR --library(igraph)TMgraph<-graph_from_data_frame(TurkiyeMigration.flows,directed=TRUE, TurkiyeMigration.nodes)TMSDI<-SDI(TMgraph,variant="vuw")

rSDI.R

rSDI plotting functions make use of available open map packages inthe R ecosystem to make a geographical plot of SDI measurements. TheplotSDI() function produces a visualization where thecircles for each note has an area proportional to the node’s selectedSDI measure. The function will try to optimize the circle sizes as bestas it can, but you can customize circle sizes, fill colors, etc. byoverriding its parameters. For example you can scale the circles sizesrelative to its default as:

plotSDI(TMSDI,variant="vuw",circle.size.scale=1)

rSDI.R

Please refer to documentation ofplotSDI() fur furtherfine grained control of its plotting parameters.

You may want to visualize the network flows along with the SDI indexmeasurements. This particular combination is provided as a convenience.You can turn on the displaying of network edges using the ‘edges’argument to SDO plotter:

plotSDI(TMSDI,variant="vuw",edges=TRUE)

rSDI.R

Please note that this combination is based on several graphvisualization and geospatial packages. If you need a fine control overall these underlying visualization layers you are recommended to go fora home made solution using packages such as ggraph, sf, andnaturalearth.

Movatterモバイル変換

rSDI

Mehmet Gençer

1 Spatial DispersionIndex (SDI) for Analysis of Activity Outreach in Spatial and GeographicNetworks

2 Spatial networks andtheir data

3 Graph notation torepresent spatial networks

4 SDI: Definition anduses

5 A simple example

6 Example: Computing andplotting SDI for a geospatial network

References