- Notifications
You must be signed in to change notification settings - Fork3
Reader of R datasets in .rda format, in Python
License
vnmabus/rdata
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A Python library for R datasets.
The package rdata offers a lightweight way to import R datasets/objects storedin the ".rda" and ".rds" formats into Python.Its main advantages are:
- It is a pure Python implementation, with no dependencies on the R language orrelated libraries.Thus, it can be used anywhere where Python is supported, including the webusingPyodide.
- It attempt to support all R objects that can be meaningfully translated.As opposed to other solutions, you are no limited to import dataframes ordata with a particular structure.
- It allows users to easily customize the conversion of R classes to Pythonones.Does your data use custom R classes?Worry no longer, as it is possible to define custom conversions to the Pythonclasses of your choosing.
- It has a permissive license (MIT). As opposed to other packages that dependon R libraries and thus need to adhere to the GPL license, you can use rdataas a dependency on MIT, BSD or even closed source projects.
rdata is on PyPi and can be installed usingpip
:
pip install rdata
It is also available forconda
using theconda-forge
channel:
conda install -c conda-forge rdata
The current version from the develop branch can be installed as
pip install git+https://github.com/vnmabus/rdata.git@develop
The documentation of rdata is inReadTheDocs.
Examples of use are available inReadTheDocs.
Please, if you find this software useful in your work, reference it citing the following paper:
@article{ramos-carreno+rossi_2024_rdata, author = {Ramos-Carreño, Carlos and Rossi, Tuomas}, doi = {10.21105/joss.07540}, journal = {Journal of Open Source Software}, month = dec, number = {104}, pages = {1--4}, title = {{rdata: A Python library for R datasets}}, url = {https://joss.theoj.org/papers/10.21105/joss.07540#}, volume = {9}, year = {2024}}
You can additionally cite the software repository itself using:
@misc{ramos-carreno++_2024_rdata-repo, author = {The rdata developers}, doi = {10.5281/zenodo.6382237}, month = dec, title = {rdata: A Python library for R datasets}, url = {https://github.com/vnmabus/rdata}, year = {2024}}
If you want to reference a particular version for reproducibility, check the version-specific DOIs available in Zenodo.
The common way of reading an R dataset is the following one:
importrdataconverted=rdata.read_rda(rdata.TESTDATA_PATH/"test_vector.rda")converted
which results in
{'test_vector': array([1., 2., 3.])}
Under the hood, this is equivalent to the following code:
importrdataparsed=rdata.parser.parse_file(rdata.TESTDATA_PATH/"test_vector.rda")converted=rdata.conversion.convert(parsed)converted
This consists on two steps:
- First, the file is parsed using the functionrdata.parser.parse_file.This provides a literal description of thefile contents as a hierarchy of Python objects representing the basic Robjects. This step is unambiguous and always the same.
- Then, each object must be converted to an appropriate Python object. In thisstep there are several choices on which Python type is the most appropriateas the conversion for a given R object. Thus, we provide a defaultrdata.conversion.convertroutine, which tries to select Python objects that preserve most informationof the original R object. For custom R classes, it is also possible tospecify conversion routines to Python objects.
The basicconvertroutine only constructs aSimpleConverterobject and calls itsconvertmethod. All arguments ofconvertare directly passed to theSimpleConverterinitialization method.
It is possible, although not trivial, to make a customConverterobject to change the way in which thebasic R objects are transformed to Python objects. However, a more commonsituation is that one does not want to change how basic R objects areconverted, but instead wants to provide conversions for specific R classes.This can be done by passing a dictionary to theSimpleConverterinitialization method, containingas keys the names of R classes and as values, callables that convert aR object of that class to a Python object. By default, the dictionary usedisDEFAULT_CLASS_MAP,which can convert commonly used R classes such asdata.frameandfactor.
As an example, here is how we would implement a conversion routine for thefactor class tobytesobjects, instead of the default conversion toPandasCategorical objects:
importrdatadeffactor_constructor(obj,attrs):values= [bytes(attrs['levels'][i-1],'utf8')ifi>=0elseNoneforiinobj]returnvaluesnew_dict= {**rdata.conversion.DEFAULT_CLASS_MAP,"factor":factor_constructor}converted=rdata.read_rda(rdata.TESTDATA_PATH/"test_dataframe.rda",constructor_dict=new_dict,)converted
which has the following result:
{'test_dataframe': class value 1 b'a' 1 2 b'b' 2 3 b'b' 3}
Additional examples illustrating the functionalities of this package can befound in theReadTheDocs documentation.
About
Reader of R datasets in .rda format, in Python