Ideally, we would have just disabled this for bf16, and kept the float32 going through kleidi kernels, however because the weights are packed differently in kleidi compared to the fallback kernel this wasn't possible. We don't know the input dtype at weight packing time, so we had to disable kleidi entirely when no bf16 harware support.

Fixes failing unit test on AArch64 CPU without bf16

Fixes#170787

cc@jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01

Update

ba0a9fe

[ghstack-poisoned]

Copy link

pytorch-botbot commentedDec 18, 2025•
edited
Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results athud.pytorch.org/pr/170788

📄 PreviewPython docs built from this PR
📄 PreviewC++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit thebot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commitba0a9fe with merge base3854d69 ():

NEW FAILURE - The following job has failed:

Check Labels / Check labels (gh)
RuntimeError: Error checking labels: PR does not have required labels

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-botbot added the module: cpuCPU specific problem (e.g., perf, algorithm) label

Dec 18, 2025

robert-hardwick added a commit that referenced this pull request

Dec 18, 2025

Disable KAI int4mm kernels when no bf16 HW support.

80928ec

Fixes failing unit test on AArch64 CPU without bf16ghstack-source-id:78e112bPull-Request:#170788

Copy link

Contributor

github-actionsbot commentedDec 18, 2025

This PR needs a`release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting withrelease notes:.

If not, please add thetopic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link

CollaboratorAuthor

robert-hardwick commentedDec 18, 2025

@pytorchbot label "topic: not user facing"

pytorch-botbot added the topic: not user facingtopic category label

Dec 18, 2025

robert-hardwick added ciflow/linux-aarch64linux aarch64 CI workflow and removed topic: not user facingtopic category labels

Dec 18, 2025

Copy link

CollaboratorAuthor

robert-hardwick commentedDec 18, 2025

This should be added to 2.10 patch release, as it's a regression. But will wait for CI tests to pass first.

robert-hardwick requested review fromaditew01,malfet andnikhil-arm

December 18, 2025 17:38

aditew01 requested changes

Dec 18, 2025

View reviewed changes

torch/_meta_registrations.py

		ifplatform.machine().lower()notin ("arm64","aarch64"):
		returnFalse
		try:
		withopen("/proc/cpuinfo")asf:

Copy link

Collaborator

aditew01Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

clean way to expose an API:https://github.com/pytorch/pytorch/blob/main/torch/_C/_cpu.pyi and usecpuinfo_has_arm_bf16() ?

Copy link

CollaboratorAuthor

robert-hardwickDec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ah ok, yeah i didn't know about that file, I was looking for direct exposure of cpuinfo functions but didn't realise we had some existing wrappers. Yeah we should add a wrapper function here
https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/cpu/Utils.cpp and inhttps://github.com/pytorch/pytorch/blob/main/torch/_C/_cpu.pyi, i will make that change tomorrow.

Copy link

Contributor

malfetDec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we perhaps separate eager and compile regressions and fix them in separate PRs?

And also, it feels a bit weird that for Meta registrations one needs to check some runtime rather than a compile time capabilities. May be we need to do rerouting in the lower level

pytorchbot added the open source label

Dec 18, 2025

malfet requested changes

Dec 18, 2025

View reviewed changes

Copy link

Contributor

malfet left a comment•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Eager part looks good to me, let's move and discuss compiler part as part of different PR(s), i.e. may be I should finally write a long overdue PR that exposes CPU capabilities as dictionary

torch/_meta_registrations.py

		ifplatform.machine().lower()notin ("arm64","aarch64"):
		returnFalse
		try:
		withopen("/proc/cpuinfo")asf:

Copy link

Contributor

malfetDec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we perhaps separate eager and compile regressions and fix them in separate PRs?

And also, it feels a bit weird that for Meta registrations one needs to check some runtime rather than a compile time capabilities. May be we need to do rerouting in the lower level