Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Added functions to support IO for Parquet files.#562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
ShigrafS wants to merge21 commits intoneuroinformatics-unit:main
base:main
Choose a base branch
Loading
fromShigrafS:parquet

Conversation

@ShigrafS
Copy link
Contributor

@ShigrafSShigrafS commentedApr 27, 2025
edited
Loading

Closes#307

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Summary

This PR addsParquet file support to themovement package, enabling it to read and write pose tracking data in thetidy DataFrame format used by the[animovement](https://github.com/roaldarbol/animovement) R package. It enhances interoperability, supports efficient data storage, and simplifies integration with modern data analysis tools.


🧩 Related Issue

#307

Support tidy dataframe and Parquet I/O to facilitate data exchange withanimovement


✨ What's New

✅ Load Functions (movement/io/load_poses.py)

  • Addedfrom_tidy_df: Converts a tidy pandas DataFrame into anxarray.Dataset.
  • Addedfrom_animovement_file: Reads a.parquet file and converts it usingfrom_tidy_df.
  • Updatedfrom_file to supportsource_software="animovement".

✅ Save Functions (movement/io/save_poses.py)

  • Addedto_tidy_df: Converts anxarray.Dataset to a tidy DataFrame with optional confidence values.
  • Addedto_animovement_file: Saves a dataset to a.parquet file viato_tidy_df.

✅ Dependency Update

  • Addedpyarrow topyproject.toml to support Parquet I/O via pandas.

✅ Tests (tests/test_parquet_io.py)

  • Added a new test suite covering:
    • Conversion between tidy DataFrames and datasets
    • Round-trip accuracy (DataFrame → dataset → DataFrame, and Parquet file round-trips)
    • Edge cases like missing data, no confidence, and invalid inputs

💡 Why This Matters

  • Interoperability: Enables seamless exchange with theanimovement package.
  • Performance: Parquet provides efficient columnar storage and compression.
  • Usability: Tidy format is ideal for plotting, statistics, and tabular exploration.
  • Reliability: Comprehensive test coverage ensures stable, correct behavior.
  • Modernization: Bringsmovement closer to data science best practices.

How has this PR been tested?

Local pytest and CI tests.

Is this a breaking change?

If this PR breaks any existing functionality, please explain how and why.

Does this PR require an update to the documentation?

If any features have changed, or have been added. Please explain how the
documentation has been updated.

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted withpre-commit

niksirbi reacted with thumbs up emoji
@ShigrafSShigrafS marked this pull request as ready for reviewApril 27, 2025 17:40
@ShigrafS
Copy link
ContributorAuthor

@niksirbi@sfmig This PR is ready to be merged.
Kindly review it.

@codecov
Copy link

codecovbot commentedApr 28, 2025
edited
Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base(fe28a5f) to head(4c8f835).

Additional details and impacted files
@@            Coverage Diff            @@##              main      #562   +/-   ##=========================================  Coverage   100.00%   100.00%           =========================================  Files           32        32             Lines         1786      1856   +70     =========================================+ Hits          1786      1856   +70

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report?Share it here.

🚀 New features to boost your workflow:
  • ❄️Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ShigrafS
Copy link
ContributorAuthor

@niksirbi@sfmig I've added a few tests to increase the coverage to 100%.
This should solve the Codecov issue.

@niksirbiniksirbi self-requested a reviewMay 2, 2025 07:34
@ShigrafS
Copy link
ContributorAuthor

@niksirbi Can you please review this?

@niksirbi
Copy link
Member

@niksirbi Can you please review this?

Thanks for your work on this@ShigrafS. Sorry we didn't have time to get to this earlier, as we are busy with other development priorities (which are more urgent) and, in my case, being away attending conferences. I will give you feedback on this when I manage to review it in detail.

@sonarqubecloud
Copy link

Copy link
Member

@niksirbiniksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for working on this,@ShigrafS.

This is a solid start, but I propose narrowing the scope of this PR to make it more manageable.

Specifically, let’s first focus solely on thefrom_tidy_df andto_tidy_df functions. I envisage these functions as providing a one-to-one mapping between ourxarray dataset format and a “tidy” pandas DataFrame.

The columns I’d expect in such a tidy DataFrame—closely followinganimovement’s format—are:

  • time: derived directly from thetime dimension of thexarray dataset, in whatever units are used there (you’re currently usingframe). This means that in thefrom_tidy_df function, thefps parameter will no longer be used to populate thetime column. Instead, it will only be stored in the dataset’sattrs if provided.
  • individual: this is missing from the animovement docs, but we definitely need it to handle multi-animal data. You currently call thistrack_id, but it would be good to match thexarray dimension name.
  • keypoint
  • x
  • y
  • z (only if the data is 3D)
  • confidence

For now, please omit thefrom_animovement_file andto_animovement_file functions. I’d like to discuss some details with animovement’s developer, who is currently on extended leave. We can easily add those functions later once we have the tidy DataFrame functions sorted.

Additionally, please remove any unrelated changes that have crept into other functions (such as the loaders for SLEAP, Anipose, etc.). Perhaps these were inadvertently introduced when merging from themain branch?

The easiest approach might be to close this PR and open a fresh one—starting from the latestmain branch—implementing only the necessary tidy DataFrame functions and their corresponding tests.

Thanks again for your effort on this!

@ShigrafS
Copy link
ContributorAuthor

Thank you for your review@niksirbi
I had been a little preoccupied with other commitments.
I greatly appreciate your review.

I'll follow up on this with a new PR incorporating your suggestions.
Thanks again.

niksirbi reacted with thumbs up emoji

@niksirbi
Copy link
Member

Thanks for the update@ShigrafS. In that case, I'll convert this PR to draft for now. Make sure to reference this when you open your new PR.

ShigrafS reacted with thumbs up emoji

@niksirbiniksirbi marked this pull request as draftJuly 7, 2025 19:18
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@niksirbiniksirbiniksirbi requested changes

Requested changes must be addressed to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Implement I/O for parquet files

2 participants

@ShigrafS@niksirbi

[8]ページ先頭

©2009-2025 Movatter.jp