Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[Xet] Basic shard creation#1633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
coyotte508 wants to merge7 commits intomain
base:main
Choose a base branch
Loading
fromshard-creation
Open

[Xet] Basic shard creation#1633

coyotte508 wants to merge7 commits intomainfromshard-creation

Conversation

coyotte508
Copy link
Member

@coyotte508coyotte508 commentedJul 16, 2025
edited
Loading

cc@Kakulukian@assafvayner for viz, follow up to#1616

Based onhttps://github.com/huggingface/xet-core/blob/7e41fb0dd7cfb276222b9668d0b97a984647721e/spec/shard.md

Need to handle:

  • split into multiple shards when xorb or file info grows too big
  • uploading xorbs & shards (and we need to upload xorbs before shards referencing them)

Comment on lines +4 to +5
export function compute_range_verification_hash(chunkHashes: string[]): string;
export function compute_file_hash(chunks_array: Array<{ hash: string; length: number }>): string;
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@assafvayner need those two functions from the wasm :)

(also , versions of those two or at least the last one with.update would be nice eventually)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

what do you mean by.update for those functions?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

where you can feed it data progressively before callingfinalize() to get the hash.

Copy link
Collaborator

@assafvaynerassafvaynerJul 17, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

let's keep the xorb and range hash computation simple and take just an array of items since those have roughly reasonable limit of ~1K items

the file hash I can see the value but we don't have this feature imlpemented in xet-core yet, and might be a while (it's not simple). For now there's just acompute_file_hash function that takes all the chunks at once but we may be able to update that later

Copy link
MemberAuthor

@coyotte508coyotte508Jul 17, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hmm I don't see the difference between range hash & file hash, they both have all the chunk hashes for a file no? (the only diff is that file hash has chunk lengths too)

the file hash I can see the value but we don't have this feature imlpemented in xet-core yet, and might be a while (it's not simple)

yes no problem

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

the range hash is at most 1 xorb's worth of hashes (this is a bit odd to explain, that's why we need to write the whole spec).

let's say a file has the following structure:

xorb A chunks 0-1024 (out of 1024)xorb B chunks 0-500 (out of 1024)xorb A chunks 1-44

Then the range hashes for the verification section of the shard containing this file info will need to have:

range_hash(xorb_A.chunks_hashes.slice(0, 1025))range_hash(xorb_B.chunks_hashes.slice(0, 501))range_hash(xorb_A.chunks_hashes.slice(1, 45))

notice that all the reasonable parameters to the range_hash function are <= number of chunks in a xorb

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@assafvaynerassafvaynerassafvayner left review comments

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@coyotte508@assafvayner

[8]ページ先頭

©2009-2025 Movatter.jp