Movatterモバイル変換

vnmabus/rdataPublic

NotificationsYou must be signed in to change notification settings
Fork3
Star53

Reader of R datasets in .rda format, in Python

rdata.readthedocs.io

License

MIT license

53 stars 3 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 583 Commits
.github		.github
asv_benchmarks		asv_benchmarks
docs		docs
examples		examples
paper		paper
rdata		rdata
.all-contributorsrc		.all-contributorsrc
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
pyproject.toml		pyproject.toml
readthedocs.yml		readthedocs.yml

Repository files navigation

rdata

A Python library for R datasets.

The package rdata offers a lightweight way in Python to import and export R datasets/objects storedin the ".rda" and ".rds" formats.Its main advantages are:

It is a pure Python implementation, with no dependencies on the R language orrelated libraries.Thus, it can be used anywhere where Python is supported, including the webusingPyodide.
It attempts to support all objects that can be meaningfully translated between R and Python.As opposed to other solutions, you are no limited to import dataframes ordata with a particular structure.
It allows users to easily customize the conversion of R classes to Pythonones and vice versa.Does your data use custom R classes?Worry no longer, as it is possible to define custom conversions to the Pythonclasses of your choosing.
It has a permissive license (MIT). As opposed to other packages that dependon R libraries and thus need to adhere to the GPL license, you can use rdataas a dependency on MIT, BSD or even closed source projects.

Installation

Installing a stable release

The rdata package is on PyPi and can be installed usingpip:

pip install rdata

The package is also available forconda using theconda-forge channel:

conda install -c conda-forge rdata

Installing a develop version

The current version from the develop branch can be installed as

pip install git+https://github.com/vnmabus/rdata.git@develop

Documentation

The documentation of rdata is inReadTheDocs.

Examples

Examples of use are available inReadTheDocs.

Citing rdata

Please, if you find this software useful in your work, reference it citing the following paper:

@article{ramos-carreno+rossi_2024_rdata,    author = {Ramos-Carreño, Carlos and Rossi, Tuomas},    doi = {10.21105/joss.07540},    journal = {Journal of Open Source Software},    month = dec,    number = {104},    pages = {1--4},    title = {{rdata: A Python library for R datasets}},    url = {https://joss.theoj.org/papers/10.21105/joss.07540#},    volume = {9},    year = {2024}}

You can additionally cite the software repository itself using:

@misc{ramos-carreno++_2024_rdata-repo,  author = {The rdata developers},  doi = {10.5281/zenodo.6382237},  month = dec,  title = {rdata: A Python library for R datasets},  url = {https://github.com/vnmabus/rdata},  year = {2024}}

If you want to reference a particular version for reproducibility, check the version-specific DOIs available in Zenodo.

Usage

Read an R dataset

The common way of reading an rds file is:

importrdataconverted=rdata.read_rds(rdata.TESTDATA_PATH/"test_dataframe.rds")print(converted)

which returns the read dataframe:

  class  value1     a      12     b      23     b      3

The analog rda file can be read in a similar way:

importrdataconverted=rdata.read_rda(rdata.TESTDATA_PATH/"test_dataframe.rda")print(converted)

which returns a dictionary mapping the variable name defined in the file (test_dataframe) to the dataframe:

{'test_dataframe':   class  value1     a      12     b      23     b      3}

Under the hood, these reading functions are equivalent to the following two-step code:

importrdataparsed=rdata.parser.parse_file(rdata.TESTDATA_PATH/"test_dataframe.rda")converted=rdata.conversion.convert(parsed)print(converted)

This consists of two steps:

First, the file is parsed using the functionrdata.parser.parse_file.This provides a literal description of thefile contents as a hierarchy of Python objects representing the basic Robjects. This step is unambiguous and always the same.
Then, each object must be converted to an appropriate Python object. In thisstep there are several choices on which Python type is the most appropriateas the conversion for a given R object. Thus, we provide a defaultrdata.conversion.convertroutine, which tries to select Pythonobjects that preserve most information of the original R object. For customR classes, it is also possible to specify conversion routines to Pythonobjects as exemplified inthe documentation.

Write an R dataset

The common way of writing data to an rds file is:

importpandasaspdimportrdatadf=pd.DataFrame({"class":pd.Categorical(["a","b","b"]),"value": [1,2,3]})print(df)rdata.write_rds("data.rds",df)

which writes the dataframe to filedata.rds:

  class  value0     a      11     b      22     b      3

Similarly, the dataframe can be written to an rda file with a given variable name:

importpandasaspdimportrdatadf=pd.DataFrame({"class":pd.Categorical(["a","b","b"]),"value": [1,2,3]})data= {"my_dataframe":df}print(data)rdata.write_rda("data.rda",data)

which writes the name-dataframe dictionary to filedata.rda:

{'my_dataframe':   class  value0     a      11     b      22     b      3}

Under the hood, these writing functions are equivalent to the following two-step code:

importpandasaspdimportrdatadf=pd.DataFrame({"class":pd.Categorical(["a","b","b"]),"value": [1,2,3]})data= {"my_dataframe":df}r_data=rdata.conversion.convert_python_to_r_data(data,file_type="rda")rdata.unparser.unparse_file("data.rda",r_data,file_type="rda")

This consists of two steps (reverse to reading):

First, each Python object is converted to an appropriate R object.Like in reading, there are several choices, and the defaultrdata.conversion.convert_python_to_r_data.routine tries to selectR objects that preserve most information of the original Python object.For Python classes, it is also possible to specify custom conversion routinesto R classes as exemplified inthe documentation.
Then, the created RData representation is unparsed to a file using the functionrdata.unparser.unparse_file.

Additional examples

Additional examples illustrating the functionalities of this package can befound in theReadTheDocs documentation.

About

Reader of R datasets in .rda format, in Python

rdata.readthedocs.io

Releases8

Version 0.11.2 Latest

Mar 4, 2024

+ 7 releases

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

rdata

Installation

Installing a stable release

Installing a develop version

Documentation

Examples

Citing rdata

Usage

Read an R dataset

Write an R dataset

Additional examples

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases8

Uh oh!

Contributors2

Uh oh!

Languages