- Notifications
You must be signed in to change notification settings - Fork26.3k
Disable KAI int4mm kernels when no bf16 HW support.#170788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:gh/robert-hardwick/12/base
Are you sure you want to change the base?
Conversation
pytorch-botbot commentedDec 18, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
🔗 Helpful Links🧪 See artifacts and rendered test results athud.pytorch.org/pr/170788
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commitba0a9fe with merge base3854d69 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a |
robert-hardwick commentedDec 18, 2025
@pytorchbot label "topic: not user facing" |
robert-hardwick commentedDec 18, 2025
This should be added to 2.10 patch release, as it's a regression. But will wait for CI tests to pass first. |
| ifplatform.machine().lower()notin ("arm64","aarch64"): | ||
| returnFalse | ||
| try: | ||
| withopen("/proc/cpuinfo")asf: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
clean way to expose an API:https://github.com/pytorch/pytorch/blob/main/torch/_C/_cpu.pyi and usecpuinfo_has_arm_bf16() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Ah ok, yeah i didn't know about that file, I was looking for direct exposure of cpuinfo functions but didn't realise we had some existing wrappers. Yeah we should add a wrapper function here
https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/cpu/Utils.cpp and inhttps://github.com/pytorch/pytorch/blob/main/torch/_C/_cpu.pyi, i will make that change tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can we perhaps separate eager and compile regressions and fix them in separate PRs?
And also, it feels a bit weird that for Meta registrations one needs to check some runtime rather than a compile time capabilities. May be we need to do rerouting in the lower level
malfet left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Eager part looks good to me, let's move and discuss compiler part as part of different PR(s), i.e. may be I should finally write a long overdue PR that exposes CPU capabilities as dictionary
| ifplatform.machine().lower()notin ("arm64","aarch64"): | ||
| returnFalse | ||
| try: | ||
| withopen("/proc/cpuinfo")asf: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can we perhaps separate eager and compile regressions and fix them in separate PRs?
And also, it feels a bit weird that for Meta registrations one needs to check some runtime rather than a compile time capabilities. May be we need to do rerouting in the lower level
Uh oh!
There was an error while loading.Please reload this page.
Stack fromghstack (oldest at bottom):
Ideally, we would have just disabled this for bf16, and kept the float32 going through kleidi kernels, however because the weights are packed differently in kleidi compared to the fallback kernel this wasn't possible. We don't know the input dtype at weight packing time, so we had to disable kleidi entirely when no bf16 harware support.
Fixes failing unit test on AArch64 CPU without bf16
Fixes#170787
cc@jgong5@mingfeima@XiaobingSuper@sanchitintel@ashokei@jingxu10@jerryzh168@aditew01