lpinzari/homogeneity-location-indexPublic

NotificationsYou must be signed in to change notification settings
Fork4
Star3

License

View license

3 stars 4 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
NEWS.md		NEWS.md
README.md		README.md
create_Sa3.R		create_Sa3.R
dataSa3.csv		dataSa3.csv
hi_li.R		hi_li.R
hi_li.py		hi_li.py

Repository files navigation

The Homogeneity & Location Index : A Statistical Framework for the Classifcation of ordinal categorical data

The objective of this work is to provide tools to be used for the classification of ordinal categorical distributions. To demonstrate how to do it, we propose anHomogeneity (HI) andLocation (LI) Index to measure the concentration and central value of an ordinal categorical distribution. We also provide a transparent set of criteria that a user can follow to establish if a givenHI's value indicates a"high" or"low" concentration of values around the central value of a distribution. Finally, we provide aConcentration Index (CI) for the classifcation of nominal categorical variables.

We applied our framework to assess the socioeconomic homogenity of the commonly used [SA3](https://www.abs.gov.au/websitedbs/D3310114.nsf/home/Australian+Statistical+Geography+Standard+(ASGS) Australian Census geography. In particular, we look at the population distribution in the SA3'sIRSD (Index of Relative Socioeconomic Disadvantage) decile category.

For more information about this work, the interested reader can refer to the publication:A framework for the identification and classification of homogeneous socioeconomic areas in the analysis of health care variation.

Figure 1. Conceptual Framework for the classifcation of homogenous areas.Source: International Journal of Health Geographics

Description

Conceptually, theHI's value of a distribution (pdf) is a number between 0 (uniform pdf) and 1 (singleton pdf), that is defined as the degree to which the population is concentrated among the set of categories for that area. For example, in the case of the IRSD decile, an HI of zero expresses minimal concentration and occurs when the population is equally distributed among all decile categories (i.e an IRSD decile contains 10% of the population). Conversely, an HI value equals to 1 is attained if the whole population is concentrated in a single decile. In the latter case, there is no variation within the area in that characteristic and the geography is uniquely identified by the central value of the distribution.

TheLI of a distribution refers to the category which could be considered representative of the entire population in a unit. For example, in the case of the IRSD decile distribution is an integer ranging from 1 (most disadvantaged) to 10 (least disadvantaged).

The formal defintitions and statistical properties of the HI and LI are illustrated in the Additional File of the publication:Model

Datasets

ThedataSa3.csv file contains 330 SA3s and 15 columns:

id: SA3 sequential Identifier
SA3_code: ABS - 2016 SA3's code identifier
SA3_name: SA3's state name
State_code: ABS - 2016 SA3's State code identifier
State_name: SA3's State name
Columns (6-15): d1,d2,d3,...,di,...,d10. Number of people in each decile.

ABS: Australian Bureau of Statistics.

R files

Thehi_li.R file contains the implementation of the Homogeneity Index (HI) function [uni.hom] and the Location Index function [uni.loc].

It also includes the following statistical utilities:

[uni.conCI]: computes the convolution of two vectors
[uni.corr]: computes the autocorrelation of a vector
[uni.div]: computes the Divergence Index. It's a variance for ordinal categorical variables.Please refer toModel.

Thecreate_SA3.R contains the script to generate a new table with the first 15 columns ofdataSA3.csv and 4 additional columns:

Hom: The value of the Homogeneity Index - HI ϵ [0 1]
DI: The value of the Divergence Index - DI ϵ [0 1]
LI: The value of the Location Index - LI ϵ {1,2,..,10}
CL: The Homogeneity Classification - CL ϵ {A,B,C,D}

Table 1. SA3's IRSD HI CLASSIFICATION CRITERIA

CL	HI %	DECISION SUPPORT SYSTEM
A	[68.53 - 100]	Acceptably Homogeneous
B	[57.62 - 68.53)	Marginal Heterogeneity
C	[46.62 - 57.62)	Judgement Required
D	[0 - 46.62)	Heterogeneous

Table 2.HI(s): HI's value ofsequally populated deciles clustered on s consecutive bins

s	HI(s) %	pdf vector
1	100	[1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0]
4	68.53	[1/4 ,1/4 ,1/4 ,1/4 , 0 , 0 , 0 , 0 , 0 , 0]
5	57.62	[1/5 ,1/5 ,1/5 ,1/5 ,1/5 , 0 , 0 , 0 , 0 , 0]
6	46.62	[1/6 ,1/6 ,1/6 ,1/6 ,1/6 ,1/6 , 0 , 0 , 0 , 0]
10	0	[1/10 ,1/10 ,1/10 ,1/10 ,1/10 ,1/10 ,1/10 ,1/10 ,1/10 ,1/10]

Table 1 shows the partition of the HI's range into four classes. The selection of the breaks among classes is determined by the HI's value ofsequally populated deciles clustered on s consecutive bins. In this case, the parameters sets the smallest interval of categories which contains all the data. Consider for example the value68.53 (i.es = 4, HI(4) = 68.53;Table 2), then all distributions that have a bigger HI's value (Cl = A) are equivalent to a community whose socioeconomic groups are concentrated in at most four consecutive deciles.

The parameters is also known in the ecological literature astrue diversity. Clearly, other criteria can be chosen for the identification of homogeneous distributions and there is no definitive or "optimal" HI's threshold value. However, we believe that the specification ofs can help users to represent the homogenity of a distribution in "picture", and serves as a guide for interpreting dimensionless concentration indicies. For more information about the classifcation criteria and the notion of true diversity, the interested reader can refer to the publication:A framework for the identification and classification of homogeneous socioeconomic areas in the analysis of health care variation, sectionsConcentration Index and true diversity andHomogeneity Index and true diversity.

Dependencies

To run the scripts the following software requirements apply:

R version 3.3.2 or later version
library:data.table to read the dataset

Usage

Run thehi_li.R script to save the functions in the Global Environment Scope of the working directory. Then, place thecreate_SA3.R and thedataSa3.csv file in the working directory and run in the R console thecreate_SA3.R script. The outputSA3db.csv is a 330 x 20 table.

Feel free to use thehi_li.R library to classify your categorical dataset. Enjoy 😊 !

Guidelines for contributing

I welcome contributions to thehi_li.R library. Please see theCONTRIBUTING file for detailed guidelines of how to contribute.

Author

Ludovico Pinzari

License

Thehomogeneity-location-index package is licensed under the MIT. See theLICENSE file for more details.

Funding

This work was funded through a partnership agreement between the Capital Markets Cooperative Research Centre and the Australian Institute of Health and Welfare, which provided a Ph.D scholarship to me.

Future work

A complete discussion of the mathematical model is included in my thesis dissertation about to be submitted in June 2019. I'll soon share a link to my work. I'll also push new documentation and files to this repo.

Notes

If you wish to reproduce the results illustrated in the publicationA framework for the identification and classification of homogeneous socioeconomic areas in the analysis of health care variation, please use the following dataset:data

Contacts

For any enquiries about my work, please visit my web site:contacts or contact me on my linkedin profile:ludovico-pinzari

About

www.ludovicopinzari.net/portfolio.html

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

The Homogeneity & Location Index : A Statistical Framework for the Classifcation of ordinal categorical data

Table of Contents

Description

Datasets

R files

Dependencies

Usage

Guidelines for contributing

Author

License

Funding

Future work

Notes

Contacts

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

lpinzari/homogeneity-location-index

Folders and files

Latest commit

History

Repository files navigation

The Homogeneity & Location Index : A Statistical Framework for the Classifcation of ordinal categorical data

Table of Contents

Description

Datasets

R files

Dependencies

Usage

Guidelines for contributing

Author

License

Funding

Future work

Notes

Contacts

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages