- Notifications
You must be signed in to change notification settings - Fork4
License
litalbarkai/open-redatam
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Open Redatam is an open source software for extracting raw information from REDATAM databases. It was created to recover information of REDATAM databases for statistical analysis using standard tools such as SPSS, STATA, R, etc.
Please read our article for the full context of this project (Open Access):
Vargas Sepúlveda, Mauricio and Barkai, Lital. 2025. "The REDATAM format and its challenges for data access and information creation in public policy."Data & Policy 7 (January): e18.https://dx.doi.org/10.1017/dap.2025.4.
This software is a full C++ ground-up rewrite of the originalRedatam Converter created by Pablo de Grande and written in C#. Rewriting the original C# code in C++ allows for better portability and the ability to use the program within R, Python, and other languages.
If you use R: We have an Rpackage 📦 that allows to directly read REDATAM databases in R.
If you use Python: We have a Pythonpackage 📦 that allows to directly read REDATAM databases in Python.
If you only need the processed data: We provide tidymicrodata 📊 in R and CSV format.
For a given census, such as theChilean Census 2017, the following options are equivalent.
redatam input-dir/dictionary.dicx output-dir
The REDATAM database will be exported to CSV files and an XML summary of the tables and variables.
Download theDEB file and install it. This will installredatam andredatamgui in/usr/local/bin/ with the necessary dependencies and a desktop entry. The installer creates an entry in the Applications folder for Open Redatam GUI and you can also use the command line tool from the Terminal by callingredatam.
Download theDMG file. The image containsredatam andredatamgui, which you can copy to "Applications". The app is not verified because it does not make sense for us to pay 200 USD/year just to sign one image. You can install it anyways this by going to the System Settings in the Apple menu and then:
- Select Privacy & Security.
- Scroll down to the Security section.
- Click "Open Anyway" beneath the message "RedatamGUI was blocked for use because it is not from an identified developer."
Download theEXE file and install it. The installer creates an entry in the Start Menu for Open Redatam GUI and you can also use the command line tool from Power Shell by callingC:\Program Files (x86)\Open Redatam\redatam.exe.
The software requires C++11 or higher to compile.
On Linux, run the following commands:
git clone https://github.com/pachadotdev/open-redatam.gitsudo apt-get updatesudo apt-get install -y qtbase5-dev qtbase5-dev-tools qt5-qmakemake
Then run./redatam or./redatamgui.
On Mac, run the following commands:
git clone https://github.com/pachadotdev/open-redatam.gitbrew install qt@5export PATH="/opt/homebrew/opt/qt@5/bin:$PATH"export LDFLAGS="-L/opt/homebrew/opt/qt@5/lib"export CPPFLAGS="-I/opt/homebrew/opt/qt@5/include"export PKG_CONFIG_PATH="/opt/homebrew/opt/qt@5/lib/pkgconfig"make
Then run./redatam or./redatamgui.
On Windows, you needVisual Studio Code 2019 with C++ development tools andQt 5 for MSVC 2019 64-bit.
Then run the following commands:
git clone https://github.com/pachadotdev/open-redatam.gitcd redatamwindowscmake -G"Visual Studio 16 2019".cmake --build. --config Releasecmake --install. --config Releasecd redatamguiwindowscmake. -G"Visual Studio 16 2019"cmake --build. --config Release"C:\Qt\5.15.2\msvc2019_64\bin\windeployqt.exe" --release .\Release\redatamgui.execd ..
A simple validation exercise is to compare counts and percentages by sex and age groups against theIPUMS data to determine if the data was converted correctly.
Some census files feature a sample. For this exercise, we used the dictionaries for the countries containing the universe in DIC and DICX formats.
DIC and DICX parsing leads to the same results.
sexn_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [Male]5020766.49.9501944749.91319.0.0083922 [Female]5040041.50.1504040950.1-368.-0.00839
age2n_ipumspct_ipumsn_rdtmpct_rdtm<int+lbl><dbl><dbl><int><dbl>11 [0to4]1091540.10.8108994810.822 [5to9]988984.9.839926549.8733 [10to14]1076853.10.7107816410.744 [15to19]1107708.11.0110628411.0512 [20to24]981961.9.769786069.73613 [25to29]812924.8.088173958.13714 [30to34]754287.7.507538317.49815 [35to39]631461.6.286310326.27916 [40to44]543112.5.405447015.411017 [45to49]461436.4.594619844.591118 [50to54]404489.4.024032204.011219 [55to59]322182.3.203240253.221320 [60to64]280089.2.782798672.781421 [65to69]207331.2.062045292.031522 [70to74]152851.1.521524231.521623 [75to79]99799.0.992992760.9871724 [80to84]81862.0.814810950.8061825 [85+]61937.0.616608220.605
DIC and DICX parsing leads to the same results.
sexn_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [Male]860757049.0860198948.955810.045722 [Female]896142051.0897201451.1-10594-0.0457
age2n_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [0to4]11575306.5911661466.64-8616-0.047122 [5to9]12159606.9212101896.8957710.034833 [10to14]11476106.5311474156.531950.0029744 [15to19]12368007.0412446977.08-7897-0.0429512 [20to24]13843207.8813878227.90-3502-0.0177613 [25to29]14755008.4014741508.3913500.0101714 [30to34]12944807.3712936377.368430.00690815 [35to39]12026706.8512077776.87-5107-0.0271916 [40to44]11954606.8011985036.82-3043-0.01541017 [45to49]11623806.6211607636.6116170.01111118 [50to54]11862106.7511849546.7412560.009071219 [55to59]10479105.9610477795.961310.002451320 [60to64]8494704.848469154.8225550.01591421 [65to69]6567003.746530023.7236980.02211522 [70to74]5181202.955159092.9422110.01341623 [75to79]3653502.083635892.0717610.01061724 [80to84]2406401.372394461.3611940.007181825 [85+]2318801.322313101.325700.00362
DIC and DICX parsing leads to the same results.
sexn_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [Male]427067049.8426521549.85455-0.014922 [Female]430539050.2429732650.280640.0149
age2n_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<dbl+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [0to4]97762011.497364411.439760.028422 [5to9]97468011.497188111.427990.014733 [10to14]96041011.295933811.21072-0.0051644 [15to19]8399809.798382399.7917410.00487512 [20to24]7849409.157858029.18-862-0.0245613 [25to29]6874408.026877858.03-345-0.0167714 [30to34]6498207.586461127.5537080.0313815 [35to39]5914706.905907506.90720-0.00248916 [40to44]4751805.544766475.57-1467-0.02591017 [45to49]3794604.423800284.44-568-0.01361118 [50to54]3312103.863307133.86497-0.0002931219 [55to59]2339102.732339762.73-66-0.005081320 [60to64]2098002.452079332.4318670.01791421 [65to69]1579401.841583651.85-425-0.007871522 [70to74]1353301.581360681.59-738-0.01111623 [75to79]779900.909778710.909119-0.00004601724 [80to84]549600.641544020.6355580.005501825 [85+]536600.626529870.6196730.006871998 [Unknown]2600.00303NANANANA
DIC and DICX parsing leads to the same results.
sexn_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [Male]717820049.6717768349.65170.0075722 [Female]730413050.4730581650.4-1686-0.00757
age2n_ipumspct_ipumsn_rdtmpct_rdtm<int+lbl><dbl><dbl><int><dbl>11 [0to4]145998010.1146227710.122 [5to9]153038010.6152680610.533 [10to14]154217010.6153934210.644 [15to19]14214909.8214195379.80512 [20to24]12885608.9012921268.92613 [25to29]12030308.3112005648.29714 [30to34]10668707.3710672897.37815 [35to39]9418706.509387266.48916 [40to44]8194705.668190025.651017 [45to49]7530605.207501415.181118 [50to54]6083704.206101324.211219 [55to59]5139103.555158933.561320 [60to64]3970502.744007592.771421 [65to69]3229702.233238172.241522 [70to74]2382801.652400911.661623 [75to79]1645501.141652181.141724 [80to84]1157500.7991155520.7981825 [85+]945700.653962270.664
DIC and DICX parsing leads to the same results
sexn_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [Male]271859047.3271937147.3-781-0.0097022 [Female]302505052.7302474252.73080.00970
age2n_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<dbl+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [0to4]5569409.705558939.6810470.019022 [5to9]68341011.968472711.9-1317-0.021933 [10to14]70660012.370634712.32530.0054244 [15to19]59988010.460056510.5-685-0.0111512 [20to24]4851008.454865428.47-1442-0.0244613 [25to29]4596608.004578907.9717700.0315714 [30to34]4001006.974022497.00-2149-0.0368815 [35to39]3558506.203531476.1527030.0476916 [40to44]3039205.293036315.292890.005471017 [45to49]2516204.382521224.39-502-0.008381118 [50to54]2176603.792157343.7619260.03381219 [55to59]1829203.181830753.19-155-0.002441320 [60to64]1509302.631518642.64-934-0.01601421 [65to69]1255902.191251572.184330.007721522 [70to74]971101.69974571.70-347-0.005901623 [75to79]754001.31759841.32-584-0.01011724 [80to84]467700.814468700.816-100-0.001671825 [85+]441800.769448590.781-679-0.0118
DIC and DICX parsing leads to the same results.
sexn_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [Male]1365945049.71362264049.7368100.049422 [Female]1379950050.31378951750.39983-0.0494
age2n_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<dbl+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [0to4]27307409.9427246209.9461200.0053522 [5to9]26890609.7926839289.7951320.0020033 [10to14]294732010.7294898510.8-1665-0.024444 [15to19]27326909.9527307859.961905-0.0100512 [20to24]25425509.2625315549.24109960.0243613 [25to29]23048108.3922918658.36129450.0329714 [30to34]20737307.5520746917.57-961-0.0164815 [35to39]18807106.8518718526.8388580.0206916 [40to44]16531306.0216420595.99110710.03011017 [45to49]13744405.0113713855.0030550.002601118 [50to54]11524304.2011526474.20-217-0.007961219 [55to59]8851603.228921433.25-6983-0.03101320 [60to64]7313102.667309562.67354-0.003251421 [65to69]5793102.115793022.118-0.003571522 [70to74]4521301.654529981.65-868-0.005981623 [75to79]3427501.253439991.25-1249-0.006691724 [80to84]2007100.7312036360.743-2926-0.01191825 [85+]1859700.6771847520.67412180.00329
DIC and DICX parsing leads to the same results.
sexn_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [Male]157770048.0157741648.02840.032422 [Female]170655052.0170846152.0-1911-0.0324
age2n_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [0to4]2198606.692203456.71-485-0.011422 [5to9]2360807.192380687.25-1988-0.056933 [10to14]2555707.782565527.81-982-0.026044 [15to19]2630908.012616917.9613990.0465512 [20to24]2407007.332410067.33-306-0.00568613 [25to29]2298207.002283856.9514350.0471714 [30to34]2329307.092333657.10-435-0.00973815 [35to39]2216906.752225216.77-831-0.0219916 [40to44]2037406.202030986.186420.02261017 [45to49]1981606.031987736.05-613-0.01571118 [50to54]1950405.941945655.924750.01741219 [55to59]1725605.251730075.27-447-0.01101320 [60to64]1510504.601507754.592750.01061421 [65to69]1320904.021315634.005270.01801522 [70to74]1115003.391123953.42-895-0.02561623 [75to79]948102.89936592.8511510.03651724 [80to84]698802.13705052.15-625-0.01801825 [85+]556801.70556041.69760.00315
ByteArrayReader:
- Provides basic file reading and byte manipulation.
- Used by:
BitArrayReader,FuzzyEntityParser,Variable.
BitArrayReader:
- Splits binary data into variable-sized chunks.
- Used by:
Variable.
FuzzyEntityParser:
- Reads and parses
.dicfiles. - Depends on:
ByteArrayReader. - Used by:
RedatamDatabase.
XMLParser:- Reads and parses.dicx` files.
- Depends on: pugixml (
src/vendor). - Used by:
RedatamDatabase.
Variable:
- Represents a data structure within the system.
- Depends on:
ByteArrayReader,BitArrayReader. - Used by:
Entity.
Entity:
- A higher-level representation grouping
Variableand other metadata. - Depends on:
Variable. - Used by:
RedatamDatabase.
CSVExporter:
- Converts entities to CSV-compatible structures.
- Depends on:
Entity,ParentIDCalculatorandutils. - Used by:
RedatamDatabase.
XMLExporter:
- Converts entities and their variables into an XML-compatible structure.
- Depends on:
Entity,utils, and pugixml (src/vendor). - Used by:
RedatamDatabase.
ParentIDCalculator:
- Calculates parent IDs for a child
Entitybased on row data. - Depends on:
Entity. - Used by:
CSVExporter.
RListExporter (R package):
- Converts entities to R-compatible structures.
- Depends on:
Entity. - Used by:
RedatamDatabase.
PyDictExporter (Python package):
- Converts entities to Python-compatible structures.
- Depends on:
Entity. - Used by:
RedatamDatabase.
RedatamDatabase:
- The central orchestrator that manages entities and interacts with parsers and exporters.
- Depends on:
Entity,FuzzyEntityParser,XMLParser,RListExporter,PyDictExporter.
If you find this software useful, please consider donating. You can donate viaBuyMeACoffee.
De Grande, Pablo. 2016. “El formato Redatam.” Estudios demográficos y urbanos 31 (3): 811–32.Ruggles, Steven, Lara Cleveland, Rodrigo Lovaton, Sula Sarkar, Matthew Sobek, Derek Burk, Dan Ehrlich, Quinn Heimann, and Jane Lee. 2024. “Integrated Public Use Microdata Series (IPUMS).” 2024.https://international.ipums.org/international/.
Open Redatam was created and is supported by Lital Barkai (barkailital@gmail.com).
The tests, installation instructions and R and Python package were created by Mauricio "Pacha" Vargas Sepulveda (m.sepulveda@mail.utoronto.ca)
The original converter was created by Pablo De Grande. Seehere for more information.
This project usespugixml created by Arseny Kapoulkine to structure a part of the output data.
The author wishes to acknowledge the statistical offices that provided the underlying data used for the validation: National Institute of Statistics, Bolivia; National Institute of Statistics, Chile; National Institute of Statistics and Censuses, Ecuador; National Institute of Statistics, Uruguay.
About
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors3
Uh oh!
There was an error while loading.Please reload this page.
