This PR addsParquet file support to themovement package, enabling it to read and write pose tracking data in thetidy DataFrame format used by the[animovement](https://github.com/roaldarbol/animovement) R package. It enhances interoperability, supports efficient data storage, and simplifies integration with modern data analysis tools.

🧩 Related Issue

#307

Support tidy dataframe and Parquet I/O to facilitate data exchange withanimovement

✨ What's New

✅ Load Functions (`movement/io/load_poses.py`)

Addedfrom_tidy_df: Converts a tidy pandas DataFrame into anxarray.Dataset.
Addedfrom_animovement_file: Reads a.parquet file and converts it usingfrom_tidy_df.
Updatedfrom_file to supportsource_software="animovement".

✅ Save Functions (`movement/io/save_poses.py`)

Addedto_tidy_df: Converts anxarray.Dataset to a tidy DataFrame with optional confidence values.
Addedto_animovement_file: Saves a dataset to a.parquet file viato_tidy_df.

✅ Dependency Update

Addedpyarrow topyproject.toml to support Parquet I/O via pandas.

✅ Tests (`tests/test_parquet_io.py`)

Added a new test suite covering:
- Conversion between tidy DataFrames and datasets
- Round-trip accuracy (DataFrame → dataset → DataFrame, and Parquet file round-trips)
- Edge cases like missing data, no confidence, and invalid inputs

💡 Why This Matters

Interoperability: Enables seamless exchange with theanimovement package.
Performance: Parquet provides efficient columnar storage and compression.
Usability: Tidy format is ideal for plotting, statistics, and tabular exploration.
Reliability: Comprehensive test coverage ensures stable, correct behavior.
Modernization: Bringsmovement closer to data science best practices.

How has this PR been tested?

Local pytest and CI tests.

Is this a breaking change?

If this PR breaks any existing functionality, please explain how and why.

Does this PR require an update to the documentation?

If any features have changed, or have been added. Please explain how the
documentation has been updated.

Checklist:

The code has been tested locally
Tests have been added to cover all new functionality
The documentation has been updated to reflect any changes
The code has been formatted withpre-commit

ShigrafSand others added5 commits

April 27, 2025 17:09

Added functions to support IO for Parquet files.

6acd053

Added test_parquet_io.py.

Fixed failing mypy errors.

c2e396c

Replaced np.where with np.nonzero load_poses.py.

f2ee358

Replaced np.random.rand with a numpy.random.default_rng in test_parqu…

4513df6

…et_io.py

Updated pyproject.toml to include pyarrow.

b3558e6

Fixed minor issue.

ShigrafS marked this pull request as ready for review

April 27, 2025 17:40

Copy link

ContributorAuthor

ShigrafS commentedApr 28, 2025

@niksirbi @sfmig This PR is ready to be merged.
Kindly review it.

Copy link

codecovbot commentedApr 28, 2025•
edited
Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base(fe28a5f) to head(4c8f835).

Additional details and impacted files

@@            Coverage Diff            @@##              main      #562   +/-   ##=========================================  Coverage   100.00%   100.00%           =========================================  Files           32        32             Lines         1786      1856   +70     =========================================+ Hits          1786      1856   +70

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report?Share it here.

🚀 New features to boost your workflow:

❄️Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Added tests to increase coverage to 100%.

a8c6c6c

Copy link

ContributorAuthor

ShigrafS commentedApr 28, 2025

@niksirbi @sfmig I've added a few tests to increase the coverage to 100%.
This should solve the Codecov issue.

ShigrafS added2 commits

April 30, 2025 21:21

Merge branch 'main' into parquet

642ad7c

Merge branch 'main' into parquet

f5a75fd

niksirbi self-requested a review

May 2, 2025 07:34

ShigrafS added5 commits

May 8, 2025 06:49

Merge branch 'main' into parquet

e20964c

Merge branch 'main' into parquet

d41b46f

Merge branch 'main' into parquet

6d35843

Merge branch 'main' into parquet

cc3a97a

Merge branch 'main' into parquet

88937f5

Copy link

ContributorAuthor

ShigrafS commentedMay 14, 2025

@niksirbi Can you please review this?

Merge branch 'main' into parquet

dfe4935

Copy link

Member

niksirbi commentedMay 19, 2025

@niksirbi Can you please review this?

Thanks for your work on this@ShigrafS. Sorry we didn't have time to get to this earlier, as we are busy with other development priorities (which are more urgent) and, in my case, being away attending conferences. I will give you feedback on this when I manage to review it in detail.

ShigrafSand others added7 commits

May 19, 2025 23:44

Merge branch 'main' into parquet

52c17d0

Merge branch 'main' into parquet

6b8c246

Merge branch 'main' into parquet

6e25a7b

Fixed merge conflicts.

b30e542

[pre-commit.ci] auto fixes from pre-commit.com hooks

c9c775b

for more information, seehttps://pre-commit.ci

Fixed CI errors.

0a087da

Merge branch 'main' into parquet

4c8f835

Copy link

sonarqubecloudbot commentedJun 8, 2025

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

niksirbi requested changes

Jun 11, 2025

View reviewed changes

Copy link

Member

niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for working on this,@ShigrafS.

This is a solid start, but I propose narrowing the scope of this PR to make it more manageable.

Specifically, let’s first focus solely on thefrom_tidy_df andto_tidy_df functions. I envisage these functions as providing a one-to-one mapping between ourxarray dataset format and a “tidy” pandas DataFrame.

The columns I’d expect in such a tidy DataFrame—closely followinganimovement’s format—are:

time: derived directly from thetime dimension of thexarray dataset, in whatever units are used there (you’re currently usingframe). This means that in thefrom_tidy_df function, thefps parameter will no longer be used to populate thetime column. Instead, it will only be stored in the dataset’sattrs if provided.
individual: this is missing from the animovement docs, but we definitely need it to handle multi-animal data. You currently call thistrack_id, but it would be good to match thexarray dimension name.
keypoint
x
y
z (only if the data is 3D)
confidence

For now, please omit thefrom_animovement_file andto_animovement_file functions. I’d like to discuss some details with animovement’s developer, who is currently on extended leave. We can easily add those functions later once we have the tidy DataFrame functions sorted.

Additionally, please remove any unrelated changes that have crept into other functions (such as the loaders for SLEAP, Anipose, etc.). Perhaps these were inadvertently introduced when merging from themain branch?

The easiest approach might be to close this PR and open a fresh one—starting from the latestmain branch—implementing only the necessary tidy DataFrame functions and their corresponding tests.

Thanks again for your effort on this!

Copy link

ContributorAuthor

ShigrafS commentedJul 6, 2025

Thank you for your review@niksirbi
I had been a little preoccupied with other commitments.
I greatly appreciate your review.

I'll follow up on this with a new PR incorporating your suggestions.
Thanks again.

Copy link

Member

niksirbi commentedJul 7, 2025

Thanks for the update@ShigrafS. In that case, I'll convert this PR to draft for now. Make sure to reference this when you open your new PR.

niksirbi marked this pull request as draft

July 7, 2025 19:18

Labels

None yet

Movatterモバイル変換

Added functions to support IO for Parquet files.#562

Are you sure you want to change the base?

Added functions to support IO for Parquet files.#562

Uh oh!

Conversation

ShigrafS commentedApr 27, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Description

Summary

🧩 Related Issue

✨ What's New

✅ Load Functions (movement/io/load_poses.py)

✅ Save Functions (movement/io/save_poses.py)

✅ Dependency Update

✅ Tests (tests/test_parquet_io.py)

💡 Why This Matters

How has this PR been tested?

Is this a breaking change?

Does this PR require an update to the documentation?

Checklist:

Uh oh!

ShigrafS commentedApr 28, 2025

Uh oh!

codecovbot commentedApr 28, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ShigrafS commentedApr 28, 2025

Uh oh!

ShigrafS commentedMay 14, 2025

Uh oh!

niksirbi commentedMay 19, 2025

Uh oh!

sonarqubecloudbot commentedJun 8, 2025

Quality Gate passed

Uh oh!

niksirbi left a comment

Choose a reason for hiding this comment

Uh oh!

ShigrafS commentedJul 6, 2025

Uh oh!

niksirbi commentedJul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ShigrafS commentedApr 27, 2025•
edited
Loading

✅ Load Functions (`movement/io/load_poses.py`)

✅ Save Functions (`movement/io/save_poses.py`)

✅ Tests (`tests/test_parquet_io.py`)

codecovbot commentedApr 28, 2025•
edited
Loading