Movatterモバイル変換

This vignette describes a typical use case of thenetrankr package. It contrasts the usual centralityanalysis based on indices with the dominance based assessment. It isadvisable to read the other vignettes before going through thisexample.

Most central family (Index approach)

The network is often used to benchmark new centrality indices. Thepremise is that the Medici should always emerge as one of the mostcentral once (if notthe most central).

We start by applying some of the standard centrality indices given intheigraph package.

cent.df<-data.frame(degree =degree(florentine_m),betweenness =betweenness(florentine_m),closeness =closeness(florentine_m),eigenvector =eigen_centrality(florentine_m)$vector,subgraph =subgraph_centrality(florentine_m))# most central family according to the 5 indicesV(florentine_m)$name[apply(cent.df,2, which.max)]

## [1] "Medici" "Medici" "Medici" "Medici" "Medici"

In all cases, the Medici are considered to be the most centralfamily. However, it is possible to find indices that rank other familieson top. An example isodd subgraph centrality, which can beassembled with thenetrankr package.

# odd subgraph centralitysc_odd<- florentine_m%>%indirect_relations(type ="walks",FUN = walks_exp_odd)%>%aggregate_positions(type ="self")# family with highest scoreV(florentine_m)$name[which.max(sc_odd)]

## [1] "Strozzi"

In this example, the Strozzi family are considered to be the mostcentral family and the Medici are ranked third.

Although we have found$5$ indicesthat consider the Medici the most central family, we can not guarantee,that there do not exist hundreds (or thousands?) of indices that wouldgive an entirely different result.

Most central family (Dominance approach)

We start by calculating the neighborhood-inclusion preorder, the mostgeneral requirement for any centrality index.

P<-neighborhood_inclusion(florentine_m)

With the functioncomparable_pairs() we can assess howmany pairs of families are already ordered, before applying anyindex.

comparable_pairs(P)

## [1] 0.152381

Only around 15% of pairs of families are comparable, leaving 85% ofpairs of families that could be ordered (basically) arbitrarily.

If we want to visually assess the dominance relations we can use thefunctiondominance_graph().

d<-dominance_graph(P)V(d)$name<-V(florentine_m)$nameset.seed(113)plot(d,vertex.label.color ="black",vertex.color ="white",vertex.frame.color ="gray",edge.arrow.size =0.5)

The Castellan family neither dominates nor is dominated by any otherfamily. This means, that we can find indices that potentially rank themon top or on the bottom, or anything in between.

To better assess the potential ranks of nodes, we can plot the rankintervals

plot(rank_intervals(P))

Observe how big the intervals are, indicating that there is amplescope to rank the families differently. These intervals, however, onlygive us a rough estimate of this arbitrariness and not any rankprobabilities. To get all exact probabilities, we use the functionexact_rank_prob().

res<-exact_rank_prob(P)

There are 3,972,630,480 different possibilities to rank the families!(the value is stored inres$lin.ext.). This meansthat theoretically, we can find almost 4 Billion indices that rank thefamilies differently.

Therank probabilities of families can be found inres$rank.prob. They are returned as a matrix where rows arefamilies and columns are ranks. That is, an entry in row$u$ and column$k$ gives the probability that$u$ has rank$k$ (larger$k$ indicate higher ranks) Mostly, you willbe interested in the probability to be the most central node of anetwork. Below, we calculate these probability for all families andreturn the one’s that have a higher probability than$0.1$.

top_rank_prob<- res$rank.prob[,15]names(top_rank_prob)<-V(florentine_m)$nameround(top_rank_prob[top_rank_prob>0.1],3)

##  Albizzi Guadagni   Medici Salviati  Strozzi ##    0.109    0.106    0.123    0.111    0.133

The Strozzi family, with$0.13$,has the highest probability to be top ranked, followed by the Mediciwith$0.12$.

If we are only interested in a subset of nodes, in our case maybe theStrozzi and Medici, we can assess therelative rankprobabilities inres$relative.rank. Again,probabilities are returned as matrix objects, where an entry in row$u$ and column$v$ gives the probability that$u$ is ranked below$v$. Below we calculate this probabilityfor the Strozzi and Medici.

id_strozzi<-which(V(florentine_m)$name=="Strozzi")id_medici<-which(V(florentine_m)$name=="Medici")res$relative.rank[id_strozzi, id_medici]

## [1] 0.5219845

The probability that the Strozzi are less central than the Medici is0.52 and thus very close to a “fifty-fifty” chance.

The last result of interest returned byexact_rank_prob() are theexpected ranks inres$expected.rank. The expected ranks, as the nameindicates, returns the ranks that we expect families to have in acentrality ranking.

Name	Expected
Medici	11.09
Albizzi	10.72
Strozzi	10.67
Salviati	10.60
Guadagni	10.51
Tornabuon	10.11
Bischeri	9.64
Barbadori	9.37
Ridolfi	9.37
Castellan	8.00
Peruzzi	5.33
Pazzi	4.54
Ginori	4.01
Lambertes	3.28
Acciaiuol	2.74

Although the Strozzi have a higher probability to be the most centralfamily, over all we still expect the Medici to be the most central.

This very general assessment gives us a general idea of the scope ofpotential centrality analyses. The more possible rankings we have (as inthis case!) the more unreliable an index driven approach can be. We willexplore this in more detail in the following section.

Centrality as explanatory variable (Index approach)

Usually, we are not simply interested in a ranking of nodes, but werather would like to use centrality to explain certain node attributes.In our case, we might be interested in the question: “can an indexexplain the wealth of families?”, or if we already have a more concreteidea “can proximity to other families explain the wealth attribute?”

“Proximity” can be translated to the graph-theoretic concept ofshortest path distances, such that closeness centrality would be anadequate candidate as an index. We here use the pipeline approach of thenetrankr package instead of thecloseness()function ofigraph. The reasons will become evident in thenext section.

# Closenessc_C<- florentine_m%>%indirect_relations(type ="dist_sp")%>%aggregate_positions(type ="invsum")cor(c_C,V(florentine_m)$wealth,method ="kendall")

## [1] 0.08823953

The correlation between closeness and wealth (0.0882) is far to lowto constitute that “proximity” is related to wealth. However, thereexist various other indices, that are based on the shortest pathdistances in a graph. Refer to the literature for more details on theseindices.

# harmonic closenessc_HC<- florentine_m%>%indirect_relations(type ="dist_sp",FUN = dist_inv)%>%aggregate_positions(type ="sum")# residual closeness (Dangalchev,2006)c_RC<- florentine_m%>%indirect_relations(type ="dist_sp",FUN = dist_2pow)%>%aggregate_positions(type ="sum")# integration centrality (Valente & Foreman, 1998)dist_integration<-function(x) {  x<-1- (x-1)/max(x)}c_IN<- florentine_m%>%indirect_relations(type ="dist_sp",FUN = dist_integration)%>%aggregate_positions(type ="sum")c(cor(c_HC,V(florentine_m)$wealth,method ="kendall"),cor(c_RC,V(florentine_m)$wealth,method ="kendall"),cor(c_IN,V(florentine_m)$wealth,method ="kendall"))

## [1] 0.11594338 0.11594338 0.07804971

The highest correlation (0.1159) is achieved for residual closeness,however, this is still too low to conclude that proximity is related towealth.

Besides the already considered indices, there exist further one’sthat include a free parameter. The idea is that the parameter can betuned to maximize the correlation between the index and the attributeunder consideration. Again, the mathematical details can be found in therespective literature.

# generalized closeness (Agneessens et al.,2017) (alpha>0) sum(dist^-alpha)alpha<-c(seq(0.01,0.99,0.01),seq(1,10,0.1))scores<-sapply(alpha,function(x) {    florentine_m%>%indirect_relations(type ="dist_sp",FUN = dist_dpow,alpha = x)%>%aggregate_positions(type ="sum")  })cors_gc<-apply(  scores,2,function(x)cor(x,V(florentine_m)$wealth,method ="kendall"))res_gc<-c(max(cors_gc), alpha[which.max(cors_gc)])# decay centrality (Jackson, 2010) (alpha in [0,1]) sum(alpha^dist)alpha<-seq(0.01,0.99,0.01)scores<-sapply(alpha,function(x) {    florentine_m%>%indirect_relations(type ="dist_sp",FUN = dist_powd,alpha = x)%>%aggregate_positions(type ="sum")  })cors_dc<-apply(  scores,2,function(x)cor(x,V(florentine_m)$wealth,method ="kendall"))res_dc<-c(max(cors_dc), alpha[which.max(cors_dc)])

The highest correlation for generalized closeness is 0.1250058achieved for$\alpha$= 0.55

The highest correlation for decay centrality is 0.1250058 achievedfor$\alpha$=0.31

We could know accept that there is no index based on shortest pathdistances that could explain wealth. Or, we could start to craft newindices that might yield a better correlation with wealth. However, wethen enter the dilemma that was mentioned at the end of the lastsection. If we find one, we can not be certain that there might not evenbe a better one out there. In contrast, if we do not succeed, we can notguarantee that there does not exist an index with a highercorrelation.

Centrality as explanatory variable (Dominance approach)

Since we are postulating a connection between proximity and wealth,we compute the pairwise shortest path distances as ourindirectrelation of interest and calculate the positional dominancerelations.

D<- florentine_m%>%indirect_relations(type ="dist_sp")%>%positional_dominance(benefit = F)comparable_pairs(D)

## [1] 0.152381

Note that exactly the same pairs are comparable as forneighborhood-inclusion. However, with one additional assumption, we willbe able to increase the number of comparable pairs significantly (andthus reduce the space of potential rankings). By summing up distances invarious ways, as done by the indices above, we assume families to behomogeneous. It doesn’t matter to whom we have a smalldistance, it just matters that they are small.

If we can safely comply with this assumption, we can use positionaldominanceunder total homogeneity. It is important to note,that if a family is dominated by another under this premise, it willhave a lower score inany distance based centralityindex.

D<- florentine_m%>%indirect_relations(type ="dist_sp")%>%positional_dominance(benefit = F,map = T)comparable_pairs(D)

## [1] 0.8190476

The number of comparable pairs increased from$0.15$ to$0.82$, thus reducing the space ofpotential centrality rankings based on distances significantly.

We proceed to explore if there is potential for a distance basedranking to explain wealth perfectly. This is only possible, if familieswith lower wealth do not dominate wealthier family. Otherwise they wouldalways be ranked higher, prohibiting a perfect correlation.

The figure below shows the dominance relations as a directed graph,where the x coordinate of nodes is proportional to the wealth attributeand the y coordinate to the number of dominated families. Any edgepointing to the left (shown in red) denotes a pair of “wrongly” orderedfamilies, i.e. a wealthy family is dominated by a less wealthy one.

## Warning: `get.edgelist()` was deprecated in igraph 2.0.0.## ℹ Please use `as_edgelist()` instead.## This warning is displayed once every 8 hours.## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was## generated.

In total, we find$41$ such pairs($39$% of all pairs). This impliesthat we are (potentially) quite far from being able to explain wealthperfectly with shortest path distances. In the following we will explorehow far away.

We start by calculating the rank intervals to illustrate thedifference to neighborhood-inclusion.

plot(rank_intervals(D))

All intervals shrunk significantly and even correspond to a single pointfor two families (Pazzi and Medici). This implies that no matter whichdistance based index we use, the Pazzi family will always be ranked lastand the Medici always on top.

For a more exact assessment we again use the functionexact_rank_prob().

res<-exact_rank_prob(D)

In total, there are 654 distance based rankings possible. This is ahuge reduction from the general case where almost$4$ billion are possible.

To determine the best possible correlation between wealth and anydistance based ranking, we first need to determine all 654 rankings. Forthis purpose, we rerun the previous analysis withonly.results=FALSE to obtain the necessary datastructure.

res<-exact_rank_prob(D,only.results =FALSE)

Now, we can use the functionget_rankings() whichreturns all rankings as a matrix.

all_ranks<-get_rankings(res)dim(all_ranks)

## [1]  15 654

No we can simply loop over all rankings and calculate the correlationbetween the ranking and the wealth attribute.

dist_cor<-apply(  all_ranks,2,function(x)cor(V(florentine_m)$wealth, x,method ="kendall"))c(max_cor =max(dist_cor),mean_cor =mean(dist_cor))

##    max_cor   mean_cor ## 0.15459118 0.04600506

The highest achievable correlation is 0.1546.

We can conclude, that there can not be any distance based centralityindex that can reasonably explain the wealth attribute.

We can additionally consider the correlation between degree andwealth, calculated below.

cor(degree(florentine_m),V(florentine_m)$wealth,method ="kendall")

## [1] 0.1958605

The correlation is higher than any distance based index can have.Thus, we can additionally conclude that marriage ties are moreindicative for wealth than proximity in the marriage network.

Movatterモバイル変換

Use Case: Florentine Families

Data

Most central family (Index approach)

Most central family (Dominance approach)

Centrality as explanatory variable (Index approach)

Centrality as explanatory variable (Dominance approach)