- Notifications
You must be signed in to change notification settings - Fork431
A lightweight data processing framework built on DuckDB and 3FS.
License
NotificationsYou must be signed in to change notification settings
deepseek-ai/smallpond
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
A lightweight data processing framework built onDuckDB and3FS.
- 🚀 High-performance data processing powered by DuckDB
- 🌍 Scalable to handle PB-scale datasets
- 🛠️ Easy operations with no long-running services
Python 3.8 to 3.12 is supported.
pip install smallpond
# Download example datawget https://duckdb.org/data/prices.parquetimportsmallpond# Initialize sessionsp=smallpond.init()# Load datadf=sp.read_parquet("prices.parquet")# Process datadf=df.repartition(3,hash_by="ticker")df=sp.partial_sql("SELECT ticker, min(price), max(price) FROM {0} GROUP BY ticker",df)# Save resultsdf.write_parquet("output/")# Show resultsprint(df.to_pandas())
For detailed guides and API reference:
We evaluated smallpond using theGraySort benchmark (script) on a cluster comprising 50 compute nodes and 25 storage nodes running3FS. The benchmark sorted 110.5TiB of data in 30 minutes and 14 seconds, achieving an average throughput of 3.66TiB/min.
Details can be found in3FS - Gray Sort.
pip install .[dev]# run unit testspytest -v tests/test*.py# build documentationpip install .[docs]cd docsmake htmlpython -m http.server --directory build/html
This project is licensed under theMIT License.
About
A lightweight data processing framework built on DuckDB and 3FS.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.