Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat: primitive parquet reader with page pruning#3199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
kszucs wants to merge1 commit intohuggingface:main
base:main
Choose a base branch
Loading
fromkszucs:libviewer

Conversation

kszucs
Copy link
Member

@kszucskszucs commentedJun 10, 2025
edited
Loading

Prototype implementation for an arrow-rs based page pruning parquet reader for low latency limit/offset queries.

It is a standalone library for now, haven't been integrated to the viewer yet.

Install

cd libs/libviewerpip install maturinmaturin develop -r

Index Dataset

dv --use-cache nvidia/OpenCodeReasoning index

This useshuggingface_hub to download and cache the dataset files.
Then creates a metadata file for each parque file in the dataset with
offset index included.

Remove--use-cache to directly download the files from the hub.

Execute a limit/offset query

dv --use-cache nvidia/OpenCodeReasoning query --limit 10 --offset 0

This will query the dataset using the local metadata index files.
The scanner only reads the necessary parquet pages to minimize the
network traffic.

Remove--use-cache to directly query data from the hub.

Integration and testing

Before covering it with tests, it would be nice to see the necessary API for integration.

lhoestq reacted with heart emoji
@lhoestq
Copy link
Member

back to this PR - sorry for the delay

@lhoestq
Copy link
Member

I created#3213 to continue this PR and integrate this in the /rows service :)

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@kszucs@lhoestq

[8]ページ先頭

©2009-2025 Movatter.jp