Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

GatherBlockQuantized: Fix 4 bit uint8 case#26506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
jambayk wants to merge3 commits intomain
base:main
Choose a base branch
Loading
fromjambayk/gbq
Draft

Conversation

@jambayk
Copy link
Contributor

@jambaykjambayk commentedNov 5, 2025
edited
Loading

Description

  • When uint8 packing is used for 4 bits, the packing happens along the quantization axis.
    • For cases where the number of blocks is odd, there is an additional padding block per row of the zero-point tensor. The indexing for zero points is updated to handle this.
    • For the data tensor, there appears to be some assumption that the quantization dim is divisible by block size (this packing is supported to share weights with the lm head which uses matmulnbits but that only works if quant dim is divisible by blocksize. otherwise, there is extra padding in the final block per data row). Since block size is a power of 2, there is no padding. Without this assumption, the data indexing logic would need to be updated as well.
      • even if the above assumption is not true, there is an assumption that the quantization dim is even
  • Fixed the default zero-point value for uint8 case in CUDA implementation

Motivation and Context

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@xiaomsftxiaomsftAwaiting requested review from xiaomsft

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@jambayk

[8]ページ先頭

©2009-2025 Movatter.jp