- Notifications
You must be signed in to change notification settings - Fork73
Added functions to support IO for Parquet files.#562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Added test_parquet_io.py.
Fixed minor issue.
codecovbot commentedApr 28, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@## main #562 +/- ##========================================= Coverage 100.00% 100.00% ========================================= Files 32 32 Lines 1786 1856 +70 =========================================+ Hits 1786 1856 +70 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@niksirbi Can you please review this? |
Thanks for your work on this@ShigrafS. Sorry we didn't have time to get to this earlier, as we are busy with other development priorities (which are more urgent) and, in my case, being away attending conferences. I will give you feedback on this when I manage to review it in detail. |
for more information, seehttps://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for working on this,@ShigrafS.
This is a solid start, but I propose narrowing the scope of this PR to make it more manageable.
Specifically, let’s first focus solely on thefrom_tidy_df andto_tidy_df functions. I envisage these functions as providing a one-to-one mapping between ourxarray dataset format and a “tidy” pandas DataFrame.
The columns I’d expect in such a tidy DataFrame—closely followinganimovement’s format—are:
time: derived directly from thetimedimension of thexarraydataset, in whatever units are used there (you’re currently usingframe). This means that in thefrom_tidy_dffunction, thefpsparameter will no longer be used to populate thetimecolumn. Instead, it will only be stored in the dataset’sattrsif provided.individual: this is missing from the animovement docs, but we definitely need it to handle multi-animal data. You currently call thistrack_id, but it would be good to match thexarraydimension name.keypointxyz(only if the data is 3D)confidence
For now, please omit thefrom_animovement_file andto_animovement_file functions. I’d like to discuss some details with animovement’s developer, who is currently on extended leave. We can easily add those functions later once we have the tidy DataFrame functions sorted.
Additionally, please remove any unrelated changes that have crept into other functions (such as the loaders for SLEAP, Anipose, etc.). Perhaps these were inadvertently introduced when merging from themain branch?
The easiest approach might be to close this PR and open a fresh one—starting from the latestmain branch—implementing only the necessary tidy DataFrame functions and their corresponding tests.
Thanks again for your effort on this!
Thank you for your review@niksirbi I'll follow up on this with a new PR incorporating your suggestions. |
Thanks for the update@ShigrafS. In that case, I'll convert this PR to draft for now. Make sure to reference this when you open your new PR. |



Uh oh!
There was an error while loading.Please reload this page.
Closes#307
Description
What is this PR
Summary
This PR addsParquet file support to the
movementpackage, enabling it to read and write pose tracking data in thetidy DataFrame format used by the[animovement](https://github.com/roaldarbol/animovement)R package. It enhances interoperability, supports efficient data storage, and simplifies integration with modern data analysis tools.🧩 Related Issue
#307
✨ What's New
✅ Load Functions (
movement/io/load_poses.py)from_tidy_df: Converts a tidy pandas DataFrame into anxarray.Dataset.from_animovement_file: Reads a.parquetfile and converts it usingfrom_tidy_df.from_fileto supportsource_software="animovement".✅ Save Functions (
movement/io/save_poses.py)to_tidy_df: Converts anxarray.Datasetto a tidy DataFrame with optional confidence values.to_animovement_file: Saves a dataset to a.parquetfile viato_tidy_df.✅ Dependency Update
pyarrowtopyproject.tomlto support Parquet I/O via pandas.✅ Tests (
tests/test_parquet_io.py)💡 Why This Matters
animovementpackage.movementcloser to data science best practices.How has this PR been tested?
Local pytest and CI tests.
Is this a breaking change?
If this PR breaks any existing functionality, please explain how and why.
Does this PR require an update to the documentation?
If any features have changed, or have been added. Please explain how the
documentation has been updated.
Checklist: