- Notifications
You must be signed in to change notification settings - Fork26.3k
[Intel GPU] OneDNN primitive cache support for Int4 WOQ gemm on XPU#147693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
pytorch-botbot commentedFeb 23, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
🔗 Helpful Links🧪 See artifacts and rendered test results athud.pytorch.org/pr/147693
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit562af36 with merge base54f1f29 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
linux-foundation-easyclabot commentedFeb 23, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
| constc10::optional<Tensor>& bias, | |
| conststd::optional<Tensor>& bias, |
EikanWang commentedFeb 24, 2025
@baodii , have you signed the EasyCLA? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
(nit) Do you have plans to usesrc_desc(), ... queries elsewhere?
If not, why not callquery_md() directly in methods that create memory objects (like here)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Hi, we usesrc_desc() asprimitive_desc class to make the code consistent with oneDNN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
(readability) maybe use a better name thanm? (e.g. mem_arg_cache? )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Also, if you decide to hoist make_args calls out of the execute function, you might need to make this an unordered_map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
mem_arg_cache is better thenm. I'll take it.
We adoptedunordered_map at first. But when we tested on a BMG client with a low-end CPU,unordered_map would affect the performance. So, we chose the simpler array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
what are the slots 0 and 1 reserved for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Since we just provide int4 gemm this time. The slots 0 and 1 are reserved forscale andzp for int4 gemm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
For that test to work, there is an assumption that memory::desc is constant across multiple calls to execute. Just two notes on that:
- this will no more be true if you start using oneDNN runtime_dimension feature (in that case, you will have to call make_args with proper shape at every execute call).
- if you don't use oneDNN runtime_dimension feature, you actually can hoist make_args calls out of this execute call (e.g. in primitive_ext constructor), and only use
set_data_handlehere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
why not take transpose and types as parameter, and hash them as part of the key?
This would allow to have just a single cache instead of separate cache for each combination (and it would be harder to control total cache size).
Also, do you plan to have separate cache for each primitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yes, cache only rely on information of shapes but nothing else. Other factors can be processed by switch cases. Which will greatly shrink hash conflict and speed up lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
using unordered_map inprimitive_ext would help hiding this implementation detai.
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
| a=a_bf16.to(dtype=dtype) | ||
| b=b_bf16.to(dtype=dtype) | ||
| b_scales=b_scales.to(dtype=dtype) | ||
| ref=torch.mm(a,b) |
ZhiweiYan-96Apr 24, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Please do not change the test logic of int4 gemm ut.
- GPTQ XPU is not supported at AO now. The ut is only intended for RTN recipe.
- Using
dqweight to construct ref is not recommended. We formerly found that it would hide bugs in weight prepack. - The UT follows the usage of pytorch designed cases, we want keep a same logic.
ZhiweiYan-96 commentedApr 24, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
The PR overall looks like deprecate current in-tree int4 gemm and use ipex-style int4 gemm(used at hf and other fwk) instead. I thought we want to introduce the prim cache in current int4 gemm specifically intended for torchAO. Besides, mainting two set of int4 gemm may results in further maintenance efforts. |
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
guangyey commentedMay 28, 2025
@pytorchbot merge |
pytorchmergebot commentedMay 28, 2025
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in thewiki. Questions? Feedback? Please reach out to thePyTorch DevX Team |
pytorchmergebot commentedMay 28, 2025
Merge failedReason: 1 jobs have failed, first few of them are:xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 4, 6, linux.idc.xpu) Details for Dev Infra teamRaised byworkflow job |
ZhiweiYan-96 commentedMay 28, 2025
hi,@etaf, is this new error in track? thanks |
etaf commentedMay 29, 2025
liangan1 commentedMay 30, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@pytorchbot label ciflow/xpu |
❌ 🤖 pytorchbot command failed: Try |
liangan1 commentedMay 30, 2025
@pytorchbot label ciflow/xpu |
liangan1 commentedMay 30, 2025
@pytorchbot merge |
pytorchmergebot commentedMay 30, 2025
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in thewiki. Questions? Feedback? Please reach out to thePyTorch DevX Team |
…ytorch#147693)* add onednn primitive cache for int4 gemm for xpuPull Requestresolved:pytorch#147693Approved by:https://github.com/EikanWang,https://github.com/liangan1,https://github.com/guangyey,https://github.com/ZhiweiYan-96Co-authored-by: Yan, Zhiwei <zhiwei.yan@intel.com>Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>
Uh oh!
There was an error while loading.Please reload this page.
cc@jgong5@mingfeima@XiaobingSuper@sanchitintel@ashokei@jingxu10@jerryzh168@gujinghui@PenghuiCheng@jianyuh@min-jean-cho@yanbing-j@Guobing-Chen@Xia-Weiwen@snadampal@xmfan@EikanWang@fengyuan14@guangyey@H-Huang@awgu@wanchaol@fegin@fduwjj@wz337@wconstab@d4l3k@mcarilli@ptrblck@leslie-fang-intel@voznesenskym@penguinwu@zhuhaozhe@blzheng@wenzhe-nrv@jiayisunx@ipiszy@chenyang78@kadeng@muchulee8@amjames@chauhang@aakhundov@CaoZhongZ@rogerxfeng8