Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Efficient Pandas representation for nested associated datasets.

License

NotificationsYou must be signed in to change notification settings

lincc-frameworks/nested-pandas

Repository files navigation

Template

PyPIConda

GitHub Workflow StatuscodecovRead the Docsbenchmarks

An extension of pandas for efficient representation of nestedassociated datasets.

Nested-Pandas extends thepandas package withtooling and support for nested dataframes packed into values of top-leveldataframe columns.Pyarrowis used internally to aid in scalability and performance.

Nested-Pandas allows data like this:

pandas dataframes

To instead be represented like this:

nestedframe

Where the nested data is represented as nested dataframes:

# Each row of "object_nf" now has it's own sub-dataframe of matched rows from "source_df"object_nf.loc[0]["nested_sources"]

sub-dataframe

Allowing powerful and straightforward operations, like:

# Compute the mean flux for each row of "object_nf"importnumpyasnpobject_nf.reduce(np.mean,"nested_sources.flux")

using reduce

Nested-Pandas is motivated by time-domain astronomy use cases, where we seetypically two levels of information, information about astronomical objects andthen an associated set ofN measurements of those objects. Nested-Pandas offersa performant and memory-efficient package for working with these types of datasets.

Core advantages being:

  • hierarchical column access
  • efficient packing of nested information into inputs to custom user functions
  • avoiding costly groupby operations

This is a LINCC Frameworks project - find more information about LINCC Frameworkshere.

Acknowledgements

This project is supported by Schmidt Sciences.


[8]ページ先頭

©2009-2025 Movatter.jp