Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork800
add support for 64 block size on 32 warp size supported amd gpus#1748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
matthewdouglas commentedSep 8, 2025
Thanks for the PR! I don't have the bandwidth to test this personally at the moment, so will defer to AMD team. Also I do not have any RDNA GPUs on hand. cc:@pnunna93 |
The docs for this PR livehere. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
pnunna93 left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for the PR! It's good to go once warp size change is made.
Uh oh!
There was an error while loading.Please reload this page.
matthewdouglas commentedOct 3, 2025
Hi@electron271 |
electron271 commentedOct 20, 2025
will look through all this soon, sorry have been somewhat busy |
Reuse BNB_WARP_SIZE macro
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
matthewdouglas commentedOct 27, 2025
Hi, Apart from that, just a few linting issues to fix. |
pnunna93 commentedOct 28, 2025
I agree, we can deprecate 6.1 compatibility |
matthewdouglas commentedOct 28, 2025
I've opened#1788 which removes the ROCm 6.1 build. |
sstamenk commentedNov 5, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Did some regression testing compared to the main branch on W7900 (gfx1100), R9700 (gfx1201) and MI300x (gfx942) using the rocm/vllm:latest Docker image. There don't seem to be any regressions. Out of the 804 newly enabled tests on gfx1100 and gfx1201, 156 fail due to accuracy issues while the other 648 pass. Attaching some logs:
|
matthewdouglas commentedNov 5, 2025
Thanks@sstamenk - that's quite useful! The failing tests seem to be mostly gemv with fp32. I think that's OK for now and can be addressed separately. @electron271 If we fix the lint issues and merge conflict I'm happy to merge this in! |
Uh oh!
There was an error while loading.Please reload this page.
| ROCM_GPU_ARCH=get_rocm_gpu_arch() | ||
| ROCM_WARP_SIZE_64=Trueifget_rocm_warpsize()==64elseFalse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Should we rename ROCM_WARP_SIZE_64 and get_rocm_warpsize() to something generic like WARP_SIZE_64 and get_warpsize() since it technically covers both the cases for HIP and CUDA? Would also make more sense for the unit test skip conditions.@matthewdouglas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I understand the point but practically speaking, warp size is always 32 on CUDA, so I'm ok with the naming as it is.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
3f9f6f3 intobitsandbytes-foundation:mainUh oh!
There was an error while loading.Please reload this page.
https://rocm.docs.amd.com/en/latest/reference/gpu-arch-specs.html most non instinct gpus support 32 warp size
tested on RX 9070 XT, looking into getting this tested on amd instinct accelerators to ensure gpus with 64 warp size still work