- Notifications
You must be signed in to change notification settings - Fork1
Efficient Pandas representation for nested associated datasets.
License
lincc-frameworks/nested-pandas
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
An extension of pandas for efficient representation of nestedassociated datasets.
Nested-Pandas extends thepandas package withtooling and support for nested dataframes packed into values of top-leveldataframe columns.Pyarrowis used internally to aid in scalability and performance.
Nested-Pandas allows data like this:
To instead be represented like this:
Where the nested data is represented as nested dataframes:
# Each row of "object_nf" now has it's own sub-dataframe of matched rows from "source_df"object_nf.loc[0]["nested_sources"]
Allowing powerful and straightforward operations, like:
# Compute the mean flux for each row of "object_nf"importnumpyasnpobject_nf.reduce(np.mean,"nested_sources.flux")
Nested-Pandas is motivated by time-domain astronomy use cases, where we seetypically two levels of information, information about astronomical objects andthen an associated set ofN
measurements of those objects. Nested-Pandas offersa performant and memory-efficient package for working with these types of datasets.
Core advantages being:
- hierarchical column access
- efficient packing of nested information into inputs to custom user functions
- avoiding costly groupby operations
This is a LINCC Frameworks project - find more information about LINCC Frameworkshere.
This project is supported by Schmidt Sciences.
About
Efficient Pandas representation for nested associated datasets.
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors8
Uh oh!
There was an error while loading.Please reload this page.