Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

License

NotificationsYou must be signed in to change notification settings

litalbarkai/open-redatam

Repository files navigation

VersionUbuntu appMac appBuild Windows executablesR PackagePython PackageCRAN statusBuyMeACoffee

About

Open Redatam is an open source software for extracting raw information from REDATAM databases. It was created to recover information of REDATAM databases for statistical analysis using standard tools such as SPSS, STATA, R, etc.

Please read our article for the full context of this project (Open Access):

Vargas Sepúlveda, Mauricio and Barkai, Lital. 2025. "The REDATAM format and its challenges for data access and information creation in public policy."Data & Policy 7 (January): e18.https://dx.doi.org/10.1017/dap.2025.4.

This software is a full C++ ground-up rewrite of the originalRedatam Converter created by Pablo de Grande and written in C#. Rewriting the original C# code in C++ allows for better portability and the ability to use the program within R, Python, and other languages.

For R and Python users (otherwise skip this section)

If you use R: We have an Rpackage 📦 that allows to directly read REDATAM databases in R.

If you use Python: We have a Pythonpackage 📦 that allows to directly read REDATAM databases in Python.

If you only need the processed data: We provide tidymicrodata 📊 in R and CSV format.

Usage

For a given census, such as theChilean Census 2017, the following options are equivalent.

Desktop app

Open Redatam GUI

Command line

redatam input-dir/dictionary.dicx output-dir

The REDATAM database will be exported to CSV files and an XML summary of the tables and variables.

Installation

From binaries

Ubuntu

Download theDEB file and install it. This will installredatam andredatamgui in/usr/local/bin/ with the necessary dependencies and a desktop entry. The installer creates an entry in the Applications folder for Open Redatam GUI and you can also use the command line tool from the Terminal by callingredatam.

Mac

Download theDMG file. The image containsredatam andredatamgui, which you can copy to "Applications". The app is not verified because it does not make sense for us to pay 200 USD/year just to sign one image. You can install it anyways this by going to the System Settings in the Apple menu and then:

  1. Select Privacy & Security.
  2. Scroll down to the Security section.
  3. Click "Open Anyway" beneath the message "RedatamGUI was blocked for use because it is not from an identified developer."

Windows

Download theEXE file and install it. The installer creates an entry in the Start Menu for Open Redatam GUI and you can also use the command line tool from Power Shell by callingC:\Program Files (x86)\Open Redatam\redatam.exe.

From source

The software requires C++11 or higher to compile.

On Linux, run the following commands:

git clone https://github.com/pachadotdev/open-redatam.gitsudo apt-get updatesudo apt-get install -y qtbase5-dev qtbase5-dev-tools qt5-qmakemake

Then run./redatam or./redatamgui.

On Mac, run the following commands:

git clone https://github.com/pachadotdev/open-redatam.gitbrew install qt@5export PATH="/opt/homebrew/opt/qt@5/bin:$PATH"export LDFLAGS="-L/opt/homebrew/opt/qt@5/lib"export CPPFLAGS="-I/opt/homebrew/opt/qt@5/include"export PKG_CONFIG_PATH="/opt/homebrew/opt/qt@5/lib/pkgconfig"make

Then run./redatam or./redatamgui.

On Windows, you needVisual Studio Code 2019 with C++ development tools andQt 5 for MSVC 2019 64-bit.

Then run the following commands:

git clone https://github.com/pachadotdev/open-redatam.gitcd redatamwindowscmake -G"Visual Studio 16 2019".cmake --build. --config Releasecmake --install. --config Releasecd redatamguiwindowscmake. -G"Visual Studio 16 2019"cmake --build. --config Release"C:\Qt\5.15.2\msvc2019_64\bin\windeployqt.exe" --release .\Release\redatamgui.execd ..

Validation against IPUMS data

A simple validation exercise is to compare counts and percentages by sex and age groups against theIPUMS data to determine if the data was converted correctly.

Some census files feature a sample. For this exercise, we used the dictionaries for the countries containing the universe in DIC and DICX formats.

Bolivia 2012

DIC and DICX parsing leads to the same results.

sexn_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [Male]5020766.49.9501944749.91319.0.0083922 [Female]5040041.50.1504040950.1-368.-0.00839
age2n_ipumspct_ipumsn_rdtmpct_rdtm<int+lbl><dbl><dbl><int><dbl>11 [0to4]1091540.10.8108994810.822 [5to9]988984.9.839926549.8733 [10to14]1076853.10.7107816410.744 [15to19]1107708.11.0110628411.0512 [20to24]981961.9.769786069.73613 [25to29]812924.8.088173958.13714 [30to34]754287.7.507538317.49815 [35to39]631461.6.286310326.27916 [40to44]543112.5.405447015.411017 [45to49]461436.4.594619844.591118 [50to54]404489.4.024032204.011219 [55to59]322182.3.203240253.221320 [60to64]280089.2.782798672.781421 [65to69]207331.2.062045292.031522 [70to74]152851.1.521524231.521623 [75to79]99799.0.992992760.9871724 [80to84]81862.0.814810950.8061825 [85+]61937.0.616608220.605

Chile 2017

DIC and DICX parsing leads to the same results.

sexn_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [Male]860757049.0860198948.955810.045722 [Female]896142051.0897201451.1-10594-0.0457
age2n_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [0to4]11575306.5911661466.64-8616-0.047122 [5to9]12159606.9212101896.8957710.034833 [10to14]11476106.5311474156.531950.0029744 [15to19]12368007.0412446977.08-7897-0.0429512 [20to24]13843207.8813878227.90-3502-0.0177613 [25to29]14755008.4014741508.3913500.0101714 [30to34]12944807.3712936377.368430.00690815 [35to39]12026706.8512077776.87-5107-0.0271916 [40to44]11954606.8011985036.82-3043-0.01541017 [45to49]11623806.6211607636.6116170.01111118 [50to54]11862106.7511849546.7412560.009071219 [55to59]10479105.9610477795.961310.002451320 [60to64]8494704.848469154.8225550.01591421 [65to69]6567003.746530023.7236980.02211522 [70to74]5181202.955159092.9422110.01341623 [75to79]3653502.083635892.0717610.01061724 [80to84]2406401.372394461.3611940.007181825 [85+]2318801.322313101.325700.00362

Dominican Republic 2002

DIC and DICX parsing leads to the same results.

sexn_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [Male]427067049.8426521549.85455-0.014922 [Female]430539050.2429732650.280640.0149
age2n_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<dbl+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [0to4]97762011.497364411.439760.028422 [5to9]97468011.497188111.427990.014733 [10to14]96041011.295933811.21072-0.0051644 [15to19]8399809.798382399.7917410.00487512 [20to24]7849409.157858029.18-862-0.0245613 [25to29]6874408.026877858.03-345-0.0167714 [30to34]6498207.586461127.5537080.0313815 [35to39]5914706.905907506.90720-0.00248916 [40to44]4751805.544766475.57-1467-0.02591017 [45to49]3794604.423800284.44-568-0.01361118 [50to54]3312103.863307133.86497-0.0002931219 [55to59]2339102.732339762.73-66-0.005081320 [60to64]2098002.452079332.4318670.01791421 [65to69]1579401.841583651.85-425-0.007871522 [70to74]1353301.581360681.59-738-0.01111623 [75to79]779900.909778710.909119-0.00004601724 [80to84]549600.641544020.6355580.005501825 [85+]536600.626529870.6196730.006871998 [Unknown]2600.00303NANANANA

Ecuador 2010

DIC and DICX parsing leads to the same results.

sexn_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [Male]717820049.6717768349.65170.0075722 [Female]730413050.4730581650.4-1686-0.00757
age2n_ipumspct_ipumsn_rdtmpct_rdtm<int+lbl><dbl><dbl><int><dbl>11 [0to4]145998010.1146227710.122 [5to9]153038010.6152680610.533 [10to14]154217010.6153934210.644 [15to19]14214909.8214195379.80512 [20to24]12885608.9012921268.92613 [25to29]12030308.3112005648.29714 [30to34]10668707.3710672897.37815 [35to39]9418706.509387266.48916 [40to44]8194705.668190025.651017 [45to49]7530605.207501415.181118 [50to54]6083704.206101324.211219 [55to59]5139103.555158933.561320 [60to64]3970502.744007592.771421 [65to69]3229702.233238172.241522 [70to74]2382801.652400911.661623 [75to79]1645501.141652181.141724 [80to84]1157500.7991155520.7981825 [85+]945700.653962270.664

El Salvador 2007

DIC and DICX parsing leads to the same results

sexn_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [Male]271859047.3271937147.3-781-0.0097022 [Female]302505052.7302474252.73080.00970
age2n_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<dbl+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [0to4]5569409.705558939.6810470.019022 [5to9]68341011.968472711.9-1317-0.021933 [10to14]70660012.370634712.32530.0054244 [15to19]59988010.460056510.5-685-0.0111512 [20to24]4851008.454865428.47-1442-0.0244613 [25to29]4596608.004578907.9717700.0315714 [30to34]4001006.974022497.00-2149-0.0368815 [35to39]3558506.203531476.1527030.0476916 [40to44]3039205.293036315.292890.005471017 [45to49]2516204.382521224.39-502-0.008381118 [50to54]2176603.792157343.7619260.03381219 [55to59]1829203.181830753.19-155-0.002441320 [60to64]1509302.631518642.64-934-0.01601421 [65to69]1255902.191251572.184330.007721522 [70to74]971101.69974571.70-347-0.005901623 [75to79]754001.31759841.32-584-0.01011724 [80to84]467700.814468700.816-100-0.001671825 [85+]441800.769448590.781-679-0.0118

Peru 2017

DIC and DICX parsing leads to the same results.

sexn_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [Male]1365945049.71362264049.7368100.049422 [Female]1379950050.31378951750.39983-0.0494
age2n_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<dbl+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [0to4]27307409.9427246209.9461200.0053522 [5to9]26890609.7926839289.7951320.0020033 [10to14]294732010.7294898510.8-1665-0.024444 [15to19]27326909.9527307859.961905-0.0100512 [20to24]25425509.2625315549.24109960.0243613 [25to29]23048108.3922918658.36129450.0329714 [30to34]20737307.5520746917.57-961-0.0164815 [35to39]18807106.8518718526.8388580.0206916 [40to44]16531306.0216420595.99110710.03011017 [45to49]13744405.0113713855.0030550.002601118 [50to54]11524304.2011526474.20-217-0.007961219 [55to59]8851603.228921433.25-6983-0.03101320 [60to64]7313102.667309562.67354-0.003251421 [65to69]5793102.115793022.118-0.003571522 [70to74]4521301.654529981.65-868-0.005981623 [75to79]3427501.253439991.25-1249-0.006691724 [80to84]2007100.7312036360.743-2926-0.01191825 [85+]1859700.6771847520.67412180.00329

Uruguay 2011

DIC and DICX parsing leads to the same results.

sexn_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [Male]157770048.0157741648.02840.032422 [Female]170655052.0170846152.0-1911-0.0324
age2n_ipumspct_ipumsn_rdtmpct_rdtmn_diffpct_diff<int+lbl><dbl><dbl><int><dbl><dbl><dbl>11 [0to4]2198606.692203456.71-485-0.011422 [5to9]2360807.192380687.25-1988-0.056933 [10to14]2555707.782565527.81-982-0.026044 [15to19]2630908.012616917.9613990.0465512 [20to24]2407007.332410067.33-306-0.00568613 [25to29]2298207.002283856.9514350.0471714 [30to34]2329307.092333657.10-435-0.00973815 [35to39]2216906.752225216.77-831-0.0219916 [40to44]2037406.202030986.186420.02261017 [45to49]1981606.031987736.05-613-0.01571118 [50to54]1950405.941945655.924750.01741219 [55to59]1725605.251730075.27-447-0.01101320 [60to64]1510504.601507754.592750.01061421 [65to69]1320904.021315634.005270.01801522 [70to74]1115003.391123953.42-895-0.02561623 [75to79]948102.89936592.8511510.03651724 [80to84]698802.13705052.15-625-0.01801825 [85+]556801.70556041.69760.00315

Code structure

Readers

ByteArrayReader:

  • Provides basic file reading and byte manipulation.
  • Used by:BitArrayReader,FuzzyEntityParser,Variable.

BitArrayReader:

  • Splits binary data into variable-sized chunks.
  • Used by:Variable.

FuzzyEntityParser:

  • Reads and parses.dic files.
  • Depends on:ByteArrayReader.
  • Used by:RedatamDatabase.

XMLParser:- Reads and parses.dicx` files.

  • Depends on: pugixml (src/vendor).
  • Used by:RedatamDatabase.

Entities

Variable:

  • Represents a data structure within the system.
  • Depends on:ByteArrayReader,BitArrayReader.
  • Used by:Entity.

Entity:

  • A higher-level representation groupingVariable and other metadata.
  • Depends on:Variable.
  • Used by:RedatamDatabase.

Exporters

CSVExporter:

  • Converts entities to CSV-compatible structures.
  • Depends on:Entity,ParentIDCalculator andutils.
  • Used by:RedatamDatabase.

XMLExporter:

  • Converts entities and their variables into an XML-compatible structure.
  • Depends on:Entity,utils, and pugixml (src/vendor).
  • Used by:RedatamDatabase.

ParentIDCalculator:

  • Calculates parent IDs for a childEntity based on row data.
  • Depends on:Entity.
  • Used by:CSVExporter.

RListExporter (R package):

  • Converts entities to R-compatible structures.
  • Depends on:Entity.
  • Used by:RedatamDatabase.

PyDictExporter (Python package):

  • Converts entities to Python-compatible structures.
  • Depends on:Entity.
  • Used by:RedatamDatabase.

Database

RedatamDatabase:

  • The central orchestrator that manages entities and interacts with parsers and exporters.
  • Depends on:Entity,FuzzyEntityParser,XMLParser,RListExporter,PyDictExporter.

Donating

If you find this software useful, please consider donating. You can donate viaBuyMeACoffee.

References

De Grande, Pablo. 2016. “El formato Redatam.” Estudios demográficos y urbanos 31 (3): 811–32.Ruggles, Steven, Lara Cleveland, Rodrigo Lovaton, Sula Sarkar, Matthew Sobek, Derek Burk, Dan Ehrlich, Quinn Heimann, and Jane Lee. 2024. “Integrated Public Use Microdata Series (IPUMS).” 2024.https://international.ipums.org/international/.

Credits

Open Redatam was created and is supported by Lital Barkai (barkailital@gmail.com).

The tests, installation instructions and R and Python package were created by Mauricio "Pacha" Vargas Sepulveda (m.sepulveda@mail.utoronto.ca)

The original converter was created by Pablo De Grande. Seehere for more information.

This project usespugixml created by Arseny Kapoulkine to structure a part of the output data.

The author wishes to acknowledge the statistical offices that provided the underlying data used for the validation: National Institute of Statistics, Bolivia; National Institute of Statistics, Chile; National Institute of Statistics and Censuses, Ecuador; National Institute of Statistics, Uruguay.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors3

  •  
  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp