Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

add support for 64 block size on 32 warp size supported amd gpus#1748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
matthewdouglas merged 14 commits intobitsandbytes-foundation:mainfromelectron271:main
Nov 13, 2025

Conversation

@electron271
Copy link
Contributor

https://rocm.docs.amd.com/en/latest/reference/gpu-arch-specs.html most non instinct gpus support 32 warp size

tested on RX 9070 XT, looking into getting this tested on amd instinct accelerators to ensure gpus with 64 warp size still work

@matthewdouglas
Copy link
Member

Thanks for the PR! I don't have the bandwidth to test this personally at the moment, so will defer to AMD team. Also I do not have any RDNA GPUs on hand.

cc:@pnunna93

electron271 reacted with laugh emoji

@github-actions
Copy link

The docs for this PR livehere. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@pnunna93pnunna93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for the PR! It's good to go once warp size change is made.

@matthewdouglas
Copy link
Member

Hi@electron271
There's still a couple conflicts, mostly because we removed all of the imports related to IPEX. If you don't mind fixing those I think we can merge after that! Thanks!

matthewdouglas
matthewdouglas previously approved these changesOct 3, 2025
@matthewdouglasmatthewdouglas added this to thev0.49.0 milestoneOct 3, 2025
@electron271
Copy link
ContributorAuthor

will look through all this soon, sorry have been somewhat busy

@matthewdouglas
Copy link
Member

Hi,
It looks like this breaks build compatibility for ROCm 6.1. I would be OK with dropping ROCm 6.1 compatibility if@pnunna93 agrees, but otherwise we would need to fix that build as well.

Apart from that, just a few linting issues to fix.

@pnunna93
Copy link
Contributor

Hi, It looks like this breaks build compatibility for ROCm 6.1. I would be OK with dropping ROCm 6.1 compatibility if@pnunna93 agrees, but otherwise we would need to fix that build as well.

Apart from that, just a few linting issues to fix.

I agree, we can deprecate 6.1 compatibility

@matthewdouglas
Copy link
Member

I've opened#1788 which removes the ROCm 6.1 build.

pnunna93 reacted with thumbs up emoji

@sstamenk
Copy link
Contributor

sstamenk commentedNov 5, 2025
edited
Loading

Did some regression testing compared to the main branch on W7900 (gfx1100), R9700 (gfx1201) and MI300x (gfx942) using the rocm/vllm:latest Docker image. There don't seem to be any regressions. Out of the 804 newly enabled tests on gfx1100 and gfx1201, 156 fail due to accuracy issues while the other 648 pass. Attaching some logs:

@matthewdouglas
Copy link
Member

Thanks@sstamenk - that's quite useful! The failing tests seem to be mostly gemv with fp32. I think that's OK for now and can be addressed separately.

@electron271 If we fix the lint issues and merge conflict I'm happy to merge this in!

sstamenk reacted with thumbs up emoji



ROCM_GPU_ARCH=get_rocm_gpu_arch()
ROCM_WARP_SIZE_64=Trueifget_rocm_warpsize()==64elseFalse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Should we rename ROCM_WARP_SIZE_64 and get_rocm_warpsize() to something generic like WARP_SIZE_64 and get_warpsize() since it technically covers both the cases for HIP and CUDA? Would also make more sense for the unit test skip conditions.@matthewdouglas

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I understand the point but practically speaking, warp size is always 32 on CUDA, so I'm ok with the naming as it is.

sstamenk reacted with thumbs up emoji
@matthewdouglasmatthewdouglas merged commit3f9f6f3 intobitsandbytes-foundation:mainNov 13, 2025
53 checks passed
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@matthewdouglasmatthewdouglasmatthewdouglas left review comments

@pnunna93pnunna93Awaiting requested review from pnunna93

+1 more reviewer

@sstamenksstamenksstamenk left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

Projects

None yet

Milestone

v0.49.0

Development

Successfully merging this pull request may close these issues.

4 participants

@electron271@matthewdouglas@pnunna93@sstamenk

[8]ページ先頭

©2009-2025 Movatter.jp