Weights are used by most aggregation methods to optionally alter thecontribution of each indicator in an aggregation group, as well as byaggregates themselves if they are further aggregated. Weighting istherefore part of aggregation, but this vignette deals with itseparately because there are a few special tools for weighting inCOINr.
First, let’s see what weights look like in practice. When a coin isbuilt usingnew_coin(), theiMeta data frame(an input tonew_coin()) has a “Weight” column, which isalso required. Therefore, every coin should have a set of weights in itby default, which you had to specify as part of its construction. Setsof weights are stored in the.$Meta$Weights sub-list. Eachset of weights is stored as a data frame with a name. The set of weightscreated when callingnew_coin() is called “Original”. Wecan see this by building the example coin and accessing the “Original”set directly:
library(COINr)# build example coincoin<-build_example_coin(up_to ="Normalise",quietly =TRUE)# view weightshead(coin$Meta$Weights$Original)#> iCode Level Weight#> 9 Goods 1 1#> 10 Services 1 1#> 11 FDI 1 1#> 12 PRemit 1 1#> 13 ForPort 1 1#> 31 Renew 1 1The weight set simply has the indicator code, Level, and the weightitself. Notice that the indicator codes also include aggregate codes, upto the index:
# view rows not in level 1coin$Meta$Weights$Original[coin$Meta$Weights$Original$Level!=1, ]#> iCode Level Weight#> 50 Physical 2 1#> 51 ConEcFin 2 1#> 52 Political 2 1#> 53 Instit 2 1#> 54 P2P 2 1#> 55 Environ 2 1#> 56 Social 2 1#> 57 SusEcFin 2 1#> 58 Conn 3 1#> 59 Sust 3 1#> 60 Index 4 1And that the index itself doesn’t have a weight because it is notused in an aggregation. Notice also that weights can be specifiedrelative to one another. When an aggregation group isaggregated, the weights within that group are first scaled to sum to 1.This means that weights are relative within groups, but not betweengroups.
To change weights, one way is to simply go back to the originaliMeta data frame that you used to build the coin, and editit. If you don’t want to do that, you can also create a new weight set.This simply involves:
For example, if we want to change the weighting of the “Conn” and“Sust” sub-indices, we could do this:
# copy original weightsw1<- coin$Meta$Weights$Original# modify weights of Conn and Sust to 0.3 and 0.7 respectivelyw1$Weight[w1$iCode=="Conn"]<-0.3w1$Weight[w1$iCode=="Sust"]<-0.7# put weight set back with new namecoin$Meta$Weights$MyFavouriteWeights<- w1Now, to actually use these weights in aggregation, we have to directtheAggregate() function to find them. When weights arestored in the “Weights” sub-list as we have done here, this is easybecause we only have to pass the name of the weights toAggregate():
coin<-Aggregate(coin,dset ="Normalised",w ="MyFavouriteWeights")#> Written data set to .$Data$AggregatedAlternatively, we can pass the data frame itself toAggregate() if we don’t want to store it in the coin forsome reason:
coin<-Aggregate(coin,dset ="Normalised",w = w1)#> Written data set to .$Data$Aggregated#> (overwritten existing data set)When altering weights we may wish to compare the outcomes ofalternative sets of weights. See theAdjustments and comparisons vignette fordetails on how to do this.
COINr has some statistical tools for adjusting weights as explainedin the next sections. Before that, it is also interesting to look at“effective weights”. At the index level, the weighting of an indicatoris not due just to its own weight, but also to the weights of eachaggregation that it is involved in, plus the number ofindicators/aggregates in each group. This means that the finalweighting, at the index level, of each indicator, is slightly complex tounderstand. COINr has a built in function to get these “effectiveweights”:
w_eff<-get_eff_weights(coin,out2 ="df")head(w_eff)#> iCode Level Weight EffWeight#> 9 Goods 1 1 0.02000000#> 10 Services 1 1 0.02000000#> 11 FDI 1 1 0.02000000#> 12 PRemit 1 1 0.02000000#> 13 ForPort 1 1 0.02000000#> 31 Renew 1 1 0.03333333The “EffWeight” column is the effective weight of each component atthe highest level of aggregation (the index). These weights sum to 1 foreach level:
# get sum of effective weights for each leveltapply(w_eff$EffWeight, w_eff$Level, sum)#> 1 2 3 4#> 1 1 1 1The effective weights can also be viewed using theplot_framework() function, where the angle of eachindicator/aggregate is proportional to its effective weight:
Theget_PCA() function can be used to return a set ofweights which maximises the explained variance within aggregationgroups. This function is already discussed in theAnalysis vignette, so we will only focus on theweighting aspect here.
First of all, PCA weights come with a number of caveats which need tobe mentioned (this is also detailed in theget_PCA()function help). First, what constitutes “PCA weights” in compositeindicators is not very well-defined. In COINr, a simple option isadopted. That is, the loadings of the first principal component aretaken as the weights. The logic here is that these loadings shouldmaximise the explained variance - the implication being that if we usethese as weights in an aggregation, we should maximise the explainedvariance and hence the information passed from the indicators to theaggregate value. This is a nice property in a composite indicator, whereone of the aims is to represent many indicators by single composite. Seehere for adiscussion on this.
But. The weights that result from PCA have a number of downsides.First, they can often include negative weights which can be hard tojustify. Also PCA may arbitrarily flip the axes (since from a variancepoint of view the direction is not important). In the quest for maximumvariance, PCA will also weight the strongest-correlating indicators thehighest, which means that other indicators may be neglected. In short,it often results in a very unbalanced set of weights. Moreover, PCA canonly be performed on one level at a time.
The result is that PCA weights should be used carefully. All thatsaid, let’s see how to get PCA weights. We simply run theget_PCA() function without2 = "coin" andspecifying the name of the weights to use. Here, we will calculate PCAweights at level 2, i.e. at the first level of aggregation. To do this,we need to use the “Aggregated” data set because the PCA needs to havethe level 2 scores to work with:
coin<-get_PCA(coin,dset ="Aggregated",Level =2,weights_to ="PCAwtsLev2",out2 ="coin")#> Weights written to .$Meta$Weights$PCAwtsLev2This stores the new set of weights in the Weights sub-list, with thename we gave it. Let’s have a look at the resulting weights. The onlyweights that have changed are at level 2, so we look at those:
coin$Meta$Weights$PCAwtsLev2[coin$Meta$Weights$PCAwtsLev2$Level==2, ]#> iCode Level Weight#> 50 Physical 2 0.5117970#> 51 ConEcFin 2 0.3049926#> 52 Political 2 0.3547671#> 53 Instit 2 0.5081540#> 54 P2P 2 0.5108455#> 55 Environ 2 0.6513188#> 56 Social 2 -0.7443677#> 57 SusEcFin 2 0.1473108This shows the nature of PCA weights: actually in this case it is nottoo severe but the Social dimension is negatively weighted because it isnegatively correlated with the other components in its group. In anycase, the weights can sometimes be “strange” to look at and that may ormay not be a problem. As explained above, to actually use these weightswe can call them when callingAggregate().
While PCA is based on linear algebra, another way to statisticallyweight indicators is via numerical optimisation. Optimisation is anumerical search method which finds a set of values which maximise orminimise some criterion, called the “objective function”.
In composite indicators, different objectives are conceivable. Theget_opt_weights() function gives two options in thisrespect - either to look for the set of weights that “balances” theindicators, or the set that maximises the information transferred (seehere). Thisis done by looking at the correlations between indicators and the index.This needs a little explanation.
If weights are chosen to match the opinions of experts, or indeedyour own opinion, there is a catch that is not very obvious. Put simply,weights do not directly translate into importance.
To understand why, we must first define what “importance” means.Actually there is more than one way to look at this, but one possiblemeasure is to use the (possibly nonlinear) correlation between eachindicator and the overall index. If the correlation is high, theindicator is well-reflected in the index scores, and vice versa.
If we accept this definition of importance, then it’s important torealise that this correlation is affected not only by the weightsattached to each indicator, but also by the correlationsbetweenindicators. This means that these correlations must be accountedfor in choosing weights that agree with the budgets assigned by thegroup of experts.
In fact, it is possible to reverse-engineer the weights eitheranalyticallyusing a linear solution ornumerically using anonlinear solution. While the former method is far quicker than anonlinear optimisation, it is only applicable in the case of a singlelevel of aggregation, with an arithmetic mean, and using linearcorrelation as a measure. Therefore in COINr, the second method isused.
Let’s now see how to useget_opt_weights() in practice.Like with PCA weights, we can only optimise one level at a time. We alsoneed to say what kind of optimisation to perform. Here, we will searchfor the set of weights that results in equal influence of thesub-indexes (level 3) on the index. We need a coin with an aggregateddata set already present, because the function needs to know which kindof aggregation method you are using. Just before doing that, we willfirst check what the correlations look like between level 3 and theindex, using equal weighting:
# build example coincoin<-build_example_coin(quietly =TRUE)# check correlations between level 3 and indexget_corr(coin,dset ="Aggregated",Levels =c(3,4))#> Var1 Var2 Correlation#> 1 Index Conn 0.9397805#> 2 Index Sust 0.8382873This shows that the correlations are similar but not the same. Nowlet’s run the optimisation:
# optimise weights at level 3coin<-get_opt_weights(coin,itarg ="equal",dset ="Aggregated",Level =3,weights_to ="OptLev3",out2 ="coin")#> iterating... objective function = -7.11287670895252#> iterating... objective function = -6.75731482891423#> iterating... objective function = -7.5563175412706#> iterating... objective function = -8.21181051402935#> iterating... objective function = -10.0802172796095#> iterating... objective function = -13.3043247136273#> iterating... objective function = -8.7011048855954#> iterating... objective function = -7.93721550859392#> iterating... objective function = -9.92111795779074#> iterating... objective function = -8.57337082557942#> iterating... objective function = -13.0490317878554#> iterating... objective function = -10.1205749624737#> iterating... objective function = -11.4698196057753#> iterating... objective function = -11.5046209642509#> iterating... objective function = -12.938292451273#> Optimisation successful!#> Optimised weights written to .$Meta$Weights$OptLev3We can view the optimised weights (weights will only change at level3)
coin$Meta$Weights$OptLev3[coin$Meta$Weights$OptLev3$Level==3, ]#> iCode Level Weight#> 58 Conn 3 0.3902439#> 59 Sust 3 0.6097561To see if this was successful in balancing correlations, let’sre-aggregate using these weights and check correlations.
# re-aggregatecoin<-Aggregate(coin,dset ="Normalised",w ="OptLev3")#> Written data set to .$Data$Aggregated#> (overwritten existing data set)# check correlations between level 3 and indexget_corr(coin,dset ="Aggregated",Levels =c(3,4))#> Var1 Var2 Correlation#> 1 Index Conn 0.8971336#> 2 Index Sust 0.8925119This shows that indeed the correlations are now well-balanced - theoptimisation has worked.
We will not explore all the features ofget_opt_weights() here, especially because optimisationscan take a significant amount of CPU time. However, the main optionsinclude specifying a vector of “importances” rather than aiming forequal importance, and optimising to maximise total correlation, ratherthan balancing. There are also some numerical optimisation parametersthat could help if the optimisation doesn’t converge.