This vignette gives a guide to building “coins”, which are the objectclass representing a composite indicator used throughout COINr, and“purses”, which are time-indexed collections of coins.
COINr functions are designed to work in particular on an S3 objectclass called a “coin”. To introduce this, consider what constitutes acomposite indicator:
Meanwhile, in the process of building a composite indicator, a seriesof analysis data is generated, including information on dataavailability, statistics on individual indicators, correlations andinformation about data treatment.
If a composite indicator is built from scratch, it is easy togenerate an environment with dozens of variables and parameters. In casean alternative version of the composite indicator is built, multiplesets of variables may need to be generated. With this in mind, it makessense to structure all the ingredients of composite indicator, frominput data, to methodology and results, into a single object, which iscalled a “coin” in COINr.
How to construct a coin, and some details of its contents, will beexplained in more detail in the following sections. Although coins arethe main object class used in COINr, a number of COINr functions alsohave methods for data frames and vectors. This is explained in othervignettes.
To build a coin you need to use thenew_coin() function.The main two input arguments of this function are two data frames:iData (the indicator data), andiMeta (theindicator metadata). This builds a coin class object containing the rawdata, which can then be developed and expanded by COINr functions bye.g. normalising, treating data, imputing, aggregating and so on.
Before proceeding, we have to define a couple of things. The “things”that are being benchmarked/compared by the indicators and compositeindicator are more generally referred to asunits (quite often,units correspond to countries). Units are compared usingindicators, which are measured variables that are relevant tothe overall concept of the composite indicator.
The first data frame,iData specifies the value of eachindicator, for each unit. It can also contain further attributes andmetadata of units, for example groups, names, and denominating variables(variables which are used to adjust for size effects of indicators).
To see an example of whatiData looks like, we can lookat the built inASEMdata set. This data set is from a composite indicator covering 51countries with 49 indicators, and is used for examples throughoutCOINr:
head(ASEM_iData[1:20],5)#> uName uCode GDP_group GDPpc_group Pop_group EurAsia_group Time Area#> 1 Austria AUT L XL M Europe 2018 83871#> 2 Belgium BEL L L L Europe 2018 30528#> 3 Bulgaria BGR S S M Europe 2018 110879#> 4 Croatia HRV S M S Europe 2018 56594#> 5 Cyprus CYP S L S Europe 2018 9251#> Energy GDP Population LPI Flights Ship Bord Elec#> 1 27.00 390.79999 8735.453 4.097985 29.01725 0.000000 35 35.3697298#> 2 41.83 467.95527 11429.336 4.108538 31.88546 20.567121 48 26.5330467#> 3 9.96 53.23964 7084.571 2.807685 9.23588 7.919366 18 11.2775842#> 4 7.01 51.23100 4189.353 3.160829 9.24529 12.440452 41 19.5283620#> 5 1.43 20.04623 1179.551 2.999061 8.75467 11.689495 0 0.4393643#> Gas ConSpeed Cov4G Goods#> 1 0.273 14.1 98.00 278.42640#> 2 36.100 16.3 99.89 597.87230#> 3 0.312 15.5 56.73 42.82515#> 4 0.422 8.6 98.00 28.36795#> 5 0.029 6.9 60.00 8.76681Here only a few rows and columns are shown to illustrate. The ASEMdata covers covering 51 Asian and European countries, at the nationallevel, and uses 49 indicators. Notice that each row is an observation(here, a country), and each column is a variable (mostly indicators, butalso other things).
Columns can be named whatever you want, although a few names arereserved:
uName [optional] gives the name of each unit. Here,units are countries, so these are the names of each country.uCode [required] is a unique codeassigned to each unit (country). This is the main “reference” insideCOINr for units. If the units are countries, ISO Alpha-3 codes shouldideally be used, because these are recognised by COINr for generatingmaps.Time [optional] gives the reference time of the data.This is used if panel data is passed tonew_coin(). SeePurses and panel data.This means that at a minimum, you need to supply a data frame with auCode column, and some indicator columns.
Aside from the reserved names above, columns can be assigned todifferent uses using the correspondingiMeta data frame -this is clarified in the next section.
Some important rules and tips to keep in mind are:
TheiData data frame will be checked when it is passedtonew_coin(). You can also perform this check yourself inadvance by callingcheck_iData():
If there are issues with youriData data frame thisshould produce informative error messages which can help to correct theproblem.
TheiMeta data frame specifies everything about eachcolumn iniData, including whether it is an indicator, agroup, or something else; its name, its units, and where it appears inthestructure of the index.iMeta also requiresentries for any aggregates which will be created by aggregatingindicators. Let’s look at the built-in example.
head(ASEM_iMeta,5)#> Level iCode iName Direction Weight#> 1 1 LPI Logistics Performance Index 1 1#> 2 1 Flights International flights passenger capacity 1 1#> 3 1 Ship Liner Shipping Connectivity Index 1 1#> 4 1 Bord Border crossings 1 1#> 5 1 Elec Trade in electricity 1 1#> Unit Target Denominator Parent Type#> 1 Score 1-5 4.118031 <NA> Physical Indicator#> 2 Thousand seats 200.332655 Population Physical Indicator#> 3 Score 20.113377 <NA> Physical Indicator#> 4 Number of crossings 115.900000 Area Physical Indicator#> 5 TWh 104.670585 Energy Physical IndicatorRequired columns foriMeta are:
Level: The level in aggregation, where 1 is indicatorlevel, 2 is the level resulting from aggregating indicators, 3 is theresult of aggregating level 2, and so on. Set toNA forentries that are not included in the index (groups, denominators,etc).iCode: Indicator code, alphanumeric. Must not startwith a number. These entries generally correspond to the column names ofiData.Parent: Group (iCode) to whichindicator/aggregate belongs in level immediately above. Each entry hereshould also be found iniCode. Set toNA onlyfor the highest (Index) level (no parent), or for entries that are notincluded in the index (groups, denominators, etc).Direction: Numeric, either -1 or 1Weight: Numeric weight, will be re-scaled to sum to 1within aggregation group. Set toNA for entries that arenot included in the index (groups, denominators, etc).Type: The type, corresponding toiCode.Can be eitherIndicator,Aggregate,Group,Denominator, orOther.Optional columns that are recognised in certain functions are:
iName: Name of the indicator: a longer name which isused in some plotting functions.Denominator: specifies which denominator variableshould be used to denominate the indicator, ifDenominate()is called. See theDenominationvignette.Unit: the unit of the indicator, e.g. USD, thousands,score, etc. Used in some plots if available.Target: a target for the indicator. Used ifnormalisation type is distance-to-target.iMeta can also include other columns if needed forspecific uses, as long as they don’t use the names listed above.
TheiMeta data frame essentially gives details abouteach of the columns found iniData, as well as detailsabout additional data columns eventually created by aggregatingindicators. This means that the entries iniMeta mustincludeall columns iniData,except thethree “special” column names:uCode,uName,andTime. In other words, all column names ofiData should appear iniMeta$iCode, except thethree special cases mentioned.
TheType column specifies the type of the entry:Indicator should be used for indicators at level 1.Aggregate for aggregates created by aggregating indicatorsor other aggregates. Otherwise set toGroup if the variableis not used for building the index but instead is for defining groups ofunits. Set toDenominator if the variable is to be used forscaling (denominating) other indicators. Finally, set toOther if the variable should be ignored but passed through.Any other entries here will cause an error.
Apart from the indicator entries shown above, we can see aggregateentries:
ASEM_iMeta[ASEM_iMeta$Type=="Aggregate", ]#> Level iCode iName Direction Weight Unit Target#> 50 2 Physical Physical 1 1 Score NA#> 51 2 ConEcFin Economic and Financial (Con) 1 1 Score NA#> 52 2 Political Political 1 1 Score NA#> 53 2 Instit Institutional 1 1 Score NA#> 54 2 P2P People to People 1 1 Score NA#> 55 2 Environ Environmental 1 1 Score NA#> 56 2 Social Social 1 1 Score NA#> 57 2 SusEcFin Economic and Financial (Sus) 1 1 Score NA#> 58 3 Conn Connectivity 1 1 Score NA#> 59 3 Sust Sustainability 1 1 Score NA#> 60 4 Index Sustainable Connectivity 1 1 Score NA#> Denominator Parent Type#> 50 <NA> Conn Aggregate#> 51 <NA> Conn Aggregate#> 52 <NA> Conn Aggregate#> 53 <NA> Conn Aggregate#> 54 <NA> Conn Aggregate#> 55 <NA> Sust Aggregate#> 56 <NA> Sust Aggregate#> 57 <NA> Sust Aggregate#> 58 <NA> Index Aggregate#> 59 <NA> Index Aggregate#> 60 <NA> <NA> AggregateThese are the aggregates that will be created by aggregatingindicators. These values will only be created when we call theAggregate() function (see relevant vignette). We also havegroups:
ASEM_iMeta[ASEM_iMeta$Type=="Group", ]#> Level iCode iName Direction Weight Unit Target#> 61 NA GDP_group GDP group NA NA <NA> NA#> 62 NA GDPpc_group GDP per capita group NA NA <NA> NA#> 63 NA Pop_group Population group NA NA <NA> NA#> 64 NA EurAsia_group Europe or Asia NA NA <NA> NA#> Denominator Parent Type#> 61 <NA> <NA> Group#> 62 <NA> <NA> Group#> 63 <NA> <NA> Group#> 64 <NA> <NA> GroupNotice that theiCode entries here correspond to columnnames ofiData. There are also denominators:
ASEM_iMeta[ASEM_iMeta$Type=="Denominator", ]#> Level iCode iName Direction Weight Unit#> 65 NA Area Land area NA NA Thousand square km#> 66 NA Energy Energy consumption NA NA Unit#> 67 NA GDP GDP NA NA USD Bn#> 68 NA Population Population NA NA Thousands#> Target Denominator Parent Type#> 65 NA <NA> <NA> Denominator#> 66 NA <NA> <NA> Denominator#> 67 NA <NA> <NA> Denominator#> 68 NA <NA> <NA> DenominatorDenominators are used to divide or “scale” other indicators. They areideally included iniData because this ensures that theymatch the units and possibly the time points.
TheParent column requires a few extra words. This isused to define the structure of the index. Simply put, it specifies theaggregation group to which the indicator or aggregate belongs to, in thelevel immediately above. For indicators in level 1, this should refer toiCodes in level 2, and for aggregates in level 2, it shouldrefer toiCodes in level 3. Every entry inParent must refer to an entry that can be found in theiCode column, or else beNA for the highestaggregation level or for groups, denominators and otheriData columns that are not included in the index.
TheiMeta data frame is more complex thatiData and it may be easy to make errors. Use thecheck_iMeta() function (which is anyway called bynew_coin()) to check the validity of youriMeta. Informative error messages are included wherepossible to help correct any errors.
Whennew_coin() is run, additional cross-checks are runbetweeniData andiMeta.
new_coin()With theiData andiMeta data framesprepared, you can build a coin using thenew_coin()function. This has some other arguments and options that we will see ina minute, but by default it looks like this:
# build a new coin using example datacoin<-new_coin(iData = ASEM_iData,iMeta = ASEM_iMeta,level_names =c("Indicator","Pillar","Sub-index","Index"))#> iData checked and OK.#> iMeta checked and OK.#> Written data set to .$Data$RawThenew_coin() function checks and cross-checks bothinput data frames, and outputs a coin-class object. It also tells usthat it has written a data set to.$Data$Raw - this is thesub-list that contains the various data sets that will be created eachtime we run a coin-building function.
We can see a summary of the coin by calling the coin print method -this is done simply by calling the name of the coin at the command line,or equivalentlyprint(coin):
coin#> --------------#> A coin with...#> --------------#> Input:#> Units: 51 (AUS, AUT, BEL, ...)#> Indicators: 49 (Goods, Services, FDI, ...)#> Denominators: 4 (Area, Energy, GDP, ...)#> Groups: 4 (GDP_group, GDPpc_group, Pop_group, ...)#>#> Structure:#> Level 1 Indicator: 49 indicators (FDI, ForPort, Goods, ...)#> Level 2 Pillar: 8 groups (ConEcFin, Instit, P2P, ...)#> Level 3 Sub-index: 2 groups (Conn, Sust)#> Level 4 Index: 1 groups (Index)#>#> Data sets:#> Raw (51 units)This tells us some details about the coin - the number of units,indicators, denominators and groups; the structure of the index (noticethat thelevel_names argument is used to describe eachlevel), and the data sets present in the coin. Currently this onlyconsists of the “Raw” data set, which is the data set that is created bydefault when we runnew_coin(), and simply consists of theindicator data plus theuCode column. Indeed, we canretrieve any data set from within a coin at any time using theget_dset() function:
# first few cols and rows of Raw data setdata_raw<-get_dset(coin,"Raw")head(data_raw[1:5],5)#> uCode LPI Flights Ship Bord#> 31 AUS 3.793385 36.05498 14.004198 0#> 1 AUT 4.097985 29.01725 0.000000 35#> 2 BEL 4.108538 31.88546 20.567121 48#> 32 BGD 2.663902 4.27955 9.698165 16#> 3 BGR 2.807685 9.23588 7.919366 18By default, callingget_dset() returns only the unitcode plus the indicator/aggregate columns. We can also attach othercolumns such as groups and names by using thealso_getargument. This can be used to attach any of theiData“metadata” columns that were originally passed when callingnew_coin(), such as groups, etc.
get_dset(coin,"Raw",also_get =c("uName","Pop_group"))[1:5]|>head(5)#> uCode uName Pop_group LPI Flights#> 1 AUS Australia L 3.793385 36.05498#> 2 AUT Austria M 4.097985 29.01725#> 3 BEL Belgium L 4.108538 31.88546#> 4 BGD Bangladesh XL 2.663902 4.27955#> 5 BGR Bulgaria M 2.807685 9.23588Apart from thelevel_names argument,new_coin() also gives the possibility to only pass forwarda subset of the indicators iniMeta. This is done using theexclude argument, which is useful when testing alternativesets of indicators - see vignette on adjustments and comparisons.
# exclude two indicatorscoin<-new_coin(iData = ASEM_iData,iMeta = ASEM_iMeta,level_names =c("Indicator","Pillar","Sub-index","Index"),exclude =c("LPI","Flights"))#> iData checked and OK.#> iMeta checked and OK.#> Written data set to .$Data$Rawcoin#> --------------#> A coin with...#> --------------#> Input:#> Units: 51 (AUS, AUT, BEL, ...)#> Indicators: 47 (Goods, Services, FDI, ...)#> Denominators: 4 (Area, Energy, GDP, ...)#> Groups: 4 (GDP_group, GDPpc_group, Pop_group, ...)#>#> Structure:#> Level 1 Indicator: 47 indicators (FDI, ForPort, Goods, ...)#> Level 2 Pillar: 8 groups (ConEcFin, Instit, P2P, ...)#> Level 3 Sub-index: 2 groups (Conn, Sust)#> Level 4 Index: 1 groups (Index)#>#> Data sets:#> Raw (51 units)Here,new_coin() has removed the indicator columns fromiData and the corresponding entries iniMeta.However, the full originaliData andiMetatables are still stored in the coin.
Thenew_coin() function includes a thorough series ofchecks on its input arguments which may cause some initial errors whilethe format is corrected. The objective is that if you can successfullyassemble a coin, this should work smoothly for all COINr functions.
COINr includes a built in example coin which is constructed using afunctionbuild_example_coin(). This can be useful forlearning how the package works, testing and is used in COINrdocumentation extensively because many functions require a coin as aninput. Here we build the example coin (which is again from the ASEM dataset built into COINr) and inspect its contents:
ASEM<-build_example_coin(quietly =TRUE)ASEM#> --------------#> A coin with...#> --------------#> Input:#> Units: 51 (AUS, AUT, BEL, ...)#> Indicators: 49 (Goods, Services, FDI, ...)#> Denominators: 4 (Area, Energy, GDP, ...)#> Groups: 4 (GDP_group, GDPpc_group, Pop_group, ...)#>#> Structure:#> Level 1 Indicator: 49 indicators (FDI, ForPort, Goods, ...)#> Level 2 Pillar: 8 groups (ConEcFin, Instit, P2P, ...)#> Level 3 Sub-index: 2 groups (Conn, Sust)#> Level 4 Index: 1 groups (Index)#>#> Data sets:#> Raw (51 units)#> Denominated (51 units)#> Imputed (51 units)#> Screened (51 units)#> Treated (51 units)#> Normalised (51 units)#> Aggregated (51 units)This shows that the example is a fully populated coin with variousdata sets, each resulting from running COINr functions, up to theaggregation step.
A coin offers a very wide methodological flexibility, but some thingsare kept fixed throughout. One is that the set of indicators does notchange once the coin has been created. The other thing is that each coinrepresents a single point in time.
If you have panel data, i.e. multiple observations for eachunit-indicator pair, indexed by time, thennew_coin()allows you to create multiple coins in one go. Coins are collected intoa single object called a “purse”, and many COINr functions workon purses directly.
Here we simply explore how to create a purse. The procedure is almostthe same as creating a coin: you need theiData andiMeta data frames, and you callnew_coin().The difference is thatiData must now have aTime column, which must be a numeric column which recordswhich time point each observation is from. To see an example, we canlook at the built-in (artificial) panel data setASEM_iData_p.
# sample of 2018 observationsASEM_iData_p[ASEM_iData_p$Time==2018,1:15]|>head(5)#> uName uCode GDP_group GDPpc_group Pop_group EurAsia_group Time Area#> 1 Austria AUT L XL M Europe 2018 83871#> 2 Belgium BEL L L L Europe 2018 30528#> 3 Bulgaria BGR S S M Europe 2018 110879#> 4 Croatia HRV S M S Europe 2018 56594#> 5 Cyprus CYP S L S Europe 2018 9251#> Energy GDP Population LPI Flights Ship Bord#> 1 27.00 390.79999 8735.453 4.097985 29.01725 0.000000 35#> 2 41.83 467.95527 11429.336 4.108538 31.88546 20.567121 48#> 3 9.96 53.23964 7084.571 2.807685 9.23588 7.919366 18#> 4 7.01 51.23100 4189.353 3.160829 9.24529 12.440452 41#> 5 1.43 20.04623 1179.551 2.999061 8.75467 11.689495 0# sample of 2019 observationsASEM_iData_p[ASEM_iData_p$Time==2019,1:15]|>head(5)#> uName uCode GDP_group GDPpc_group Pop_group EurAsia_group Time Area#> 52 Austria AUT L XL M Europe 2019 83871#> 53 Belgium BEL L L L Europe 2019 30528#> 54 Bulgaria BGR S S M Europe 2019 110879#> 55 Croatia HRV S M S Europe 2019 56594#> 56 Cyprus CYP S L S Europe 2019 9251#> Energy GDP Population LPI Flights Ship Bord#> 52 27.00 390.79999 8735.453 4.153182 37.53763 0.6054851 39.752508#> 53 41.83 467.95527 11429.336 4.149371 41.53901 21.2045607 52.123937#> 54 9.96 53.23964 7084.571 2.868647 15.82871 7.9467542 23.203648#> 55 7.01 51.23100 4189.353 3.230168 16.06586 13.0958316 46.566308#> 56 1.43 20.04623 1179.551 3.098577 10.92502 12.3571194 3.993825This data set has five years of data, spanning 2018-2022 (the dataare artificially generated - at some point I will replace this with areal example). This means that each row now corresponds to a set ofindicator values for a unit, for a given time point.
To build a purse from this data, we input it intonew_coin()
# build purse from panel datapurse<-new_coin(iData = ASEM_iData_p,iMeta = ASEM_iMeta,split_to ="all",quietly =TRUE)Notice here that theiMeta argument is the same as whenwe assembled a single coin - this is because a purse is supposed toconsist of coins with the same indicators and structure, i.e. the aim isto calculate a composite indicator over several points in time, andgenerally to apply the same methodology to all coins in the purse. It ishowever possible to have different units between coins in the same purse- this might occur because of data availability differences at differenttime points.
Thesplit_to argument should be set to"all" to create a coin from each time point found in thedata. Alternatively, you can only include a subset of time points byspecifying them as a vector.
A quick way to check the contents of the purse is to call its printmethod:
purse#> -----------------------------#> A purse with... 5 coins#> -----------------------------#>#> Time n_Units n_Inds n_dsets#> 2018 51 49 1#> 2019 51 49 1#> 2020 51 49 1#> 2021 51 49 1#> 2022 51 49 1#>#> -----------------------------------#> Sample from first coin (2018):#> -----------------------------------#>#> Input:#> Units: 51 (AUS, AUT, BEL, ...)#> Indicators: 49 (Goods, Services, FDI, ...)#> Denominators: 4 (Area, Energy, GDP, ...)#> Groups: 4 (GDP_group, GDPpc_group, Pop_group, ...)#>#> Structure:#> Level 1 : 49 indicators (FDI, ForPort, Goods, ...)#> Level 2 : 8 groups (ConEcFin, Instit, P2P, ...)#> Level 3 : 2 groups (Conn, Sust)#> Level 4 : 1 groups (Index)#>#> Data sets:#> Raw (51 units)This tells us how many coins there are, the number of indicators andunits, and gives some structural information from one of the coins.
A purse is an S3 class object like a coin. In fact, it is simply adata frame with aTime column and acoincolumn, where entries in thecoin column are coin objects(in a so-called “list column”). This is convenient to work with, but ifyou try to view it in R Studio, for example, it can be a littlemessy.
As with coins, the purse class also has a function in COINr whichproduces an example purse:
ASEM_purse<-build_example_purse(quietly =TRUE)ASEM_purse#> -----------------------------#> A purse with... 5 coins#> -----------------------------#>#> Time n_Units n_Inds n_dsets#> 2018 51 49 5#> 2019 51 49 5#> 2020 51 49 5#> 2021 51 49 5#> 2022 51 49 5#>#> -----------------------------------#> Sample from first coin (2018):#> -----------------------------------#>#> Input:#> Units: 51 (AUS, AUT, BEL, ...)#> Indicators: 49 (Goods, Services, FDI, ...)#> Denominators: 4 (Area, Energy, GDP, ...)#> Groups: 4 (GDP_group, GDPpc_group, Pop_group, ...)#>#> Structure:#> Level 1 Indicator: 49 indicators (FDI, ForPort, Goods, ...)#> Level 2 Pillar: 8 groups (ConEcFin, Instit, P2P, ...)#> Level 3 Sub-index: 2 groups (Conn, Sust)#> Level 4 Index: 1 groups (Index)#>#> Data sets:#> Raw (51 units)#> Screened (46 units)#> Treated (46 units)#> Normalised (46 units)#> Aggregated (46 units)The purse class can be used directly with COINr functions - thisallows to impute/normalise/treat/aggregate all coins with a singlecommand, for example.
COINr is mostly designed to work with coins and purses. However, manykey functions also have methods for data frames or vectors. This meansthat COINr can either be used as an “ecosystem” of functions builtaround coins and purses, or else can just be used as a toolbox for doingyour own work with data frames and other objects.