- Notifications
You must be signed in to change notification settings - Fork26.3k
[Intel GPU] Enable safe softmax for XPU SDPA#151999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
pytorch-botbot commentedApr 23, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
🔗 Helpful Links🧪 See artifacts and rendered test results athud.pytorch.org/pr/151999
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 3 PendingAs of commit7ba3ce8 with merge base9f5153b ( NEW FAILURES - The following jobs have failed:This comment was automatically generated by Dr. CI and updates every 15 minutes. |
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
guangyey commentedApr 24, 2025
Could you elaborate on the issues we would encounter if this PR were not applied in PR description? And give a test case if possible. |
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
LuFinch commentedApr 24, 2025
@guangyey Updated PR description and added UT. |
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
guangyey commentedApr 25, 2025
Thanks for update. |
LuFinch commentedJun 4, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@guangyey OneDNN has been upgraded to v3.8. This PR is ready to merge. Could you help review and trigger CI? |
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
guangyey commentedJun 13, 2025
@pytorchbot merge |
pytorchmergebot commentedJun 13, 2025
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in thewiki. Questions? Feedback? Please reach out to thePyTorch DevX Team |
pytorchmergebot commentedJun 13, 2025
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper byviewing the failures on hud |
guangyey commentedJun 13, 2025
@pytorchbot merge |
pytorchmergebot commentedJun 13, 2025
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in thewiki. Questions? Feedback? Please reach out to thePyTorch DevX Team |
pytorchmergebot commentedJun 13, 2025
Merge failedReason: 3 mandatory check(s) failed. The first few are:
Dig deeper byviewing the failures on hud |
guangyey commentedJun 13, 2025
@pytorchbot merge -f "lint is green, XPU CI pass, ignore unrelated failure and queuing rocm CI" |
pytorchmergebot commentedJun 13, 2025
Merge startedYour change will be merged immediately since you used the force (-f) flag,bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in thewiki. Questions? Feedback? Please reach out to thePyTorch DevX Team |
Uh oh!
There was an error while loading.Please reload this page.
Fixintel/torch-xpu-ops#1432 (comment)
When one row of Q*K attention score is masked with
-inf,softmax(score)would outputNaNfor whole row which would cause model corruption.With this new flag, it would output
0for whole row which is aligned with Pytorch CPU/CUDA's behavior.cc@jgong5@mingfeima@XiaobingSuper@sanchitintel@ashokei@jingxu10@jerryzh168@voznesenskym@penguinwu@EikanWang@Guobing-Chen@zhuhaozhe@blzheng@wenzhe-nrv@jiayisunx@ipiszy@chenyang78@kadeng@muchulee8@amjames@chauhang@aakhundov@gujinghui@fengyuan14@guangyey