Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A lightweight data processing framework built on DuckDB and 3FS.

License

NotificationsYou must be signed in to change notification settings

deepseek-ai/smallpond

Repository files navigation

CIPyPIDocsLicense

A lightweight data processing framework built onDuckDB and3FS.

Features

  • 🚀 High-performance data processing powered by DuckDB
  • 🌍 Scalable to handle PB-scale datasets
  • 🛠️ Easy operations with no long-running services

Installation

Python 3.8 to 3.12 is supported.

pip install smallpond

Quick Start

# Download example datawget https://duckdb.org/data/prices.parquet
importsmallpond# Initialize sessionsp=smallpond.init()# Load datadf=sp.read_parquet("prices.parquet")# Process datadf=df.repartition(3,hash_by="ticker")df=sp.partial_sql("SELECT ticker, min(price), max(price) FROM {0} GROUP BY ticker",df)# Save resultsdf.write_parquet("output/")# Show resultsprint(df.to_pandas())

Documentation

For detailed guides and API reference:

Performance

We evaluated smallpond using theGraySort benchmark (script) on a cluster comprising 50 compute nodes and 25 storage nodes running3FS. The benchmark sorted 110.5TiB of data in 30 minutes and 14 seconds, achieving an average throughput of 3.66TiB/min.

Details can be found in3FS - Gray Sort.

Development

pip install .[dev]# run unit testspytest -v tests/test*.py# build documentationpip install .[docs]cd docsmake htmlpython -m http.server --directory build/html

License

This project is licensed under theMIT License.

About

A lightweight data processing framework built on DuckDB and 3FS.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp