rSDI.R
Consider a network of movements or exchanges between places. This iscommonplace in socio-economic activities. For example: when you ordersomething from Amazon, the movement of your package from one warehouseto the next is part of Amazon’s shipment network, or even part of theglobal shipment network. Our commutes to work in the morning can beconsidered a commute network between neighborhoods/cities/offices. Eachof these cases can be considered as an specialized instance of themathematical concept of ‘graph’ called spatial graph: a graph consistingof vertices with fixed locations, and arcs/edges connecting thesevertices.
In social and economic sciences analysis of relations in such anetwork is very interesting, and with the recent availability andcoverage of spatial network data, very useful for managerial planning inprivate firms and policy decisions such as urban planning in publicagencies. On the other hand metrics concerning spatial aspects ofnetworks are almost always problem specific and not general. SpatialDispersion Index (SDI) is a generalized measurement index, or rather afamily of indices to evaluate spatial distances of movements in anetwork in a problem neutral way, thus aims to address this problem.rSDI computes and optionally visualizes this index with minimalhassle:
library(rSDI)SDI(TurkiyeMigration.flows, TurkiyeMigration.nodes,variant="vow")%>%plotSDI(variant="vow")rSDI.R
The core idea of the SDI index was conceived as part of a large scalegovernment commissioned study report, in Turkish(Gençer et al.2020). The SDI index was later was generalized and publishedon its own merit, and explained in detail in the paper(Gençer2023).
rSDI package provides functions to compute SDI family of indices forspatial graphs in conjunction with its definition the paper(Gençer 2023).rSDI also provides some convenience functions to visualize SDI indexmeasurements. While this is not its primary reason of existence it isoften very practical for the user to have some preliminary visualizationat arm’s length. In sections 2 and 3 below we first explain the conceptof spatial networks and their data, then review mathematical graphformalism to represent spatial networks. Then we introduce the SDI indexfamily’s calculation, its interpretation, and thumb rules for choosingan index for your analyses. The last two sections provide a run throughof index calculation then visualization features of the rSDI packageusing an example data set provided by the package, on human migrationbetween provinces of Turkiye.
Spatial networks are represented as a particular type of graph wherethe graph nodes (vertices) are fixed locations and each graph arc/edgerepresent a flow/relation between two of these nodes. In most real lifecases these networks represent varying flows of people(e.g. transportation), good (e.g. trade, shopping), or information(e.g. Internet data transfer, phone call). Thus the graph is weightedand directed and has arcs, rather than edges. Also in most cases thenetwork is geospatial. In geospatial networks the locations of verticesin the representing graph are, for example, cities, airports, etc., andare defined with their latitude and longitude. This is the case for mostexamples of movements related to trade, migration, education, services,etc. In other cases the spatial network may span a smaller space and israther measured on its own Cartesian references; for example in the caseof student movement on a campus, or movement of parts in a productionfacility. In those latter cases vertices (e.g. campus library, a weldingstation) have an x-y position defined with respect to a chosen corner orcenter of the campus, production facility, etc.
Spatial network data consists of two data frames: one representingthe flows and the other detailing the locations, and possibly labels ofnodes in the network. The following is a simple, imaginary spatialnetwork data:
rSDI.R
|
|
rSDI.R
This spatial network is visualized below, showing node locations aswell as flow amounts (weights) on lines representing edges:
rSDI.R
A spatial network,\(N\), isrepresented with the mathematical concept of graph, which consists ofvertices,\(V\), representing thelocations/nodes in the spatial network and ties/edges,\(E\) representing flows tying them togetherinto a network, thus\(N=(V,E)\). Tocapture a flow over an edge\(e_{ij}\)from vertex\(i\) to vertex\(j\) let us denote the amount of flow on theedge as edge weight\(w_{i\rightarrowj}\). In graph theoretic terms this corresponds to a directed andweighted graph.
To capture spatial aspects of the network let\(p_i=<x_i,y_i>\) and\(p_j=<x_j, y_j>\) denote locations ofvertices\(i\) and\(j\), respectively, in some two dimensionalspace such as Cartesian or geographic locations. In the latter, thecoordinates\(x\) and\(y\) would denote the longitude and latitudeof a geographical location, respectively. One can now speak of a spatialdistance,\(\delta_{ij}\), between anytwo vertices. In the case of geographical networks Haversine distancewould be appropriate for determining spherical distances between twolocations:
\[\begin{equation}\delta^{H}_{ij}=2R\arcsin\left(\sqrt{\sin^{2}\left({\frac{y_j-y_i}{2}}\right)+\cos(\varphi_{i})\cos(\varphi_{j})\sin^{2}\left({\frac{x_j-x_i}{2}}\right)}\right)\end{equation}\] Where\(R\) isthe radius of the Earth, which is roughly\(6,371\) km.
In the case of a more local spatial network we would probably haveCartesian coordinates, e.g. x-y coordinates within a production plant,of which we analyse flows of parts between stations. In those cases anEuclidean distance can be used instead:\[\begin{equation}\delta^E_{ij}=\sqrt{(x_i-x_j)^2+(y_i-y_j)^2}\end{equation}\]
In our toy example from the previous section the Euclidean distancescan be easily calculated (since it is a simple 3-4-5 triangle) for eachedge as follows (defaulting to Euclidean distance, since we have nolatitude/longitude information in this example):
| from | to | weight | distance |
|---|---|---|---|
| A | B | 10 | 5 |
| B | A | 20 | 5 |
| A | C | 5 | 3 |
rSDI.R
In order to quantify spatial reach of the flows in a spatial network,the spatial distance of two nodes should be incorporated with the flowbetween the nodes. The Spatial Dispersion Index here is a directtranslation of this idea and is broadly defined as the weighted averagedistances the network flows span, wighted by flow amounts. The key ideawas conceived by the author, explained thoroughly and put into use in abroader field study report(Gençer 2023). A brief discussionand definition is presented here.
SDI is a family of indices rather than a single index. The reason forits variants is related to differential research interests whenanalyzing spatial networks. Here we explain these variations. Furtherbelow we introduce the three letter, XXX, notation to symbolizecorresponding SDI variants:
As an illustrative example, network level, weighted SDI index wouldbe computed as follows1:\[\begin{equation}\textrm{SDI}^w(N)=\frac{\sum_{i \rightarrow j \in E}{(w_{i\rightarrow j}\cdot \delta_{ij})}}{\sum_{i\rightarrow j \in E}{w_{i\rightarrow j}}}\end{equation}\] For our toy problem this could be computed as:\(\textrm{SDI}^w(N)=(10*5+20*5+5*3)/(10+20+5)\)
Whereas a node level, unweighted, out-flows only index would becomputed by replacing all weights with 1s:\[\begin{equation}\textrm{SDI}^u_{+}(i)=\frac{\sum_{i\rightarrow j \in E}{(1 \cdot\delta_{ij})}}{\sum_{i\rightarrow j \in E}{1}}\end{equation}\] Which is simply the average of distances of theflows towards the focal node. For our toy problem’s node A, this can becomputed as\(\textrm{SDI}^u_{+}(A)=(5+3)/2\)
Please consult the source paper,Gençer (2023), and help pages for anextensive description of index calculation for the above cases.
SDI computation uses a three letter index variant code to represent avariant of the index. The LDS code corresponds to usage ofLevel-Direction-and-Strength of network ties, respectively. For examplean LDS code of “nuw” would mean anetwork level,undirected, andweighted SDI variant.Each part of the LDS code can take the following values:
rSDI functions consume an igraph object and return their output as anigraph which has additional edge, vertex, and/or graph attributes. Letus start with an example involving the helper functiondist_calc(). This function is not neded to be calledexplicitly in a normal workflow, but normally invoked bySDI(), the main entry point of SDI calculations. Itcomputes the distances between pairs of nodes which are connected byeach graph edge. The computed distances are returned as edge attributesof the returned graph. Consider the following spatial network dataframes for the fictional spatial network above:
flows<-data.frame(from=c("A","B","A"),to=c("B","A","C"),weight=c(10,20,5))nodes<-data.frame(id=c("A","B","C","D"),x=c(0,4,0,4),y=c(3,0,0,3))library(igraph)toyGraph<-graph_from_data_frame(flows,directed=TRUE,vertices=nodes)rSDI.R
The edges of the graph has only the ‘weight’ attribute:
#> [1] "weight"rSDI.R
rSDI’s main function isSDI().SDI()function works in a similar fashion and adds its output as graph andvertex attribute (in addition to computing and adding edge distanceattributes if they are missing, which is a prerequisite for all SDImetrics):
toyGraphWithSDI<-SDI(toyGraph)#same as SDI(toyGraph, level="vertex", directionality="undirected", weight.use="weighted")edge_attr_names(toyGraphWithSDI)#> [1] "weight" "distance"vertex_attr_names(toyGraphWithSDI)#> [1] "name" "x" "y" "SDI_vuw"rSDI.R
To help its user follow the theoretical distinctions explained in theprevious section, rSDI letter codes the index measurements it measuresşn accordance with that classification. In the the example above, callto SDI function computes (1)vertex level, (2)undirected,, and (3)weighted SDIindex, which are the defaults. Thus to each vertex of its input graph itadds and attribute named ‘SDI_vuw’. The attribute is added to eachvertex even if the index cannot be computed. This is the case for vertexD which has an NA value stored in its ‘SDI_vuw’ attribute:
rSDI.R
If the index is computed at the network level the vertices will nothave additional attributes but the graph itself will, following the sameconvention:
toyGraphWithNetworkSDI<-SDI(toyGraph,level="network",directionality="undirected",weight.use="weighted")graph_attr_names(toyGraphWithNetworkSDI)#> [1] "SDI_nuw"graph_attr(toyGraphWithNetworkSDI,"SDI_nuw")#> [1] 4.714286rSDI.R
Once you are comfortable with this convention you can shorten yourcalls toSDI() using the ‘variant’ parameter as follows,which is equivalent to the call in the example above:
rSDI.R
SDI will leave previously computed indices untouched. Thus, forexample, you can compute several indices in a pipe:
toyGraph%>%SDI(variant="nuw")%>%SDI(variant="niu")%>%# nuu?SDI(variant="vuw")%>%SDI(variant="vuu")-> toyGraphWithSeveralSDIgraph_attr_names(toyGraphWithSeveralSDI)#> [1] "SDI_nuw" "SDI_nuu"vertex_attr_names(toyGraphWithSeveralSDI)#> [1] "name" "x" "y" "SDI_vuw" "SDI_vuu"rSDI.R The same can be achieved by using a vector of variants in asingle call:
toyGraphWithSeveralSDI<-SDI(toyGraph,variant=c("nuw","niu","vuw","vuu"))graph_attr_names(toyGraphWithSeveralSDI)#> [1] "SDI_nuw" "SDI_nuu"vertex_attr_names(toyGraphWithSeveralSDI)#> [1] "name" "x" "y" "SDI_vuw" "SDI_vuu"rSDI.R
Note that for the generalized SDI variant you must provide theadditional\(\alpha\) parameter:
toyGraphWithGeneralizedSDI<-SDI(toyGraph,variant="vug",alpha=0.5)vertex_attr_names(toyGraphWithGeneralizedSDI)#> [1] "name" "x" "y" "SDI_vug"vertex_attr(toyGraphWithGeneralizedSDI,"SDI_vug")#> [1] 4.252907 4.472136 3.464102 NArSDI.R ## Optional distance calculation
Calling thedist_calc() helper function adds a distanceattribute to an input graph. This is automatically performed whenSDI() is called, but you may facilitate it separately ifneeded. For the example in the previous section the call is made asfollows:
toyGraphWithDistances<-dist_calc(toyGraph)edge_attr_names(toyGraphWithDistances)#> [1] "weight" "distance"rSDI.R
Having seen the coordinate attributes as ‘x’ and ‘y’ (rather than as‘latitude’ and ‘longitude’) the function opts for a Euclidean distancecalculation and returns the 3-4-5 triangle distances:
rSDI.R
rSDI package comes with a real world data set consisting of two dataframes:TurkiyeMigration.flows contains the data onmigration of people between Türkiye’s provinces in the period2016-2017-2018, a consolidated version of raw data from TurkishStatistical Institute.TurkiyeMigration.nodes containslabels and geographic coordinates (latitute&longitude) ofprovinces:
head(TurkiyeMigration.flows)#> from to weight#> 1 TRC12 TR621 737.0000#> 2 TR332 TR621 319.6667#> 3 TRA21 TR621 213.0000#> 4 TR712 TR621 412.6667#> 5 TR834 TR621 158.3333#> 6 TR510 TR621 2594.6667head(TurkiyeMigration.nodes)#> id label longitude latitude#> 1 TR100 \\u0130stanbul 28.96711 41.00893#> 2 TR211 Tekirda\\u011f 27.51167 40.97809#> 3 TR212 Edirne 26.55596 41.67717#> 4 TR213 K\\u0131rklareli 27.22437 41.73547#> 5 TR221 Bal\\u0131kesir 27.88834 39.65046#> 6 TR222 \\u00c7anakkale 26.40859 40.14672rSDI.R
You may call theSDI() function either with an igraphobject you compose yourself from flow and node data, or directly givingthem to SDI, as follows:
TMSDI<-SDI(TurkiyeMigration.flows, TurkiyeMigration.nodes,variant="vuw")# -- OR --library(igraph)TMgraph<-graph_from_data_frame(TurkiyeMigration.flows,directed=TRUE, TurkiyeMigration.nodes)TMSDI<-SDI(TMgraph,variant="vuw")rSDI.R
rSDI plotting functions make use of available open map packages inthe R ecosystem to make a geographical plot of SDI measurements. TheplotSDI() function produces a visualization where thecircles for each note has an area proportional to the node’s selectedSDI measure. The function will try to optimize the circle sizes as bestas it can, but you can customize circle sizes, fill colors, etc. byoverriding its parameters. For example you can scale the circles sizesrelative to its default as:
rSDI.R
Please refer to documentation ofplotSDI() fur furtherfine grained control of its plotting parameters.
You may want to visualize the network flows along with the SDI indexmeasurements. This particular combination is provided as a convenience.You can turn on the displaying of network edges using the ‘edges’argument to SDO plotter:
rSDI.R
Please note that this combination is based on several graphvisualization and geospatial packages. If you need a fine control overall these underlying visualization layers you are recommended to go fora home made solution using packages such as ggraph, sf, andnaturalearth.
please note that when run over the whole network,directionality makes no difference, so we omit the\(\textrm{SDI}_{\pm}(\ldots)\) notation inthis one↩︎