- Notifications
You must be signed in to change notification settings - Fork5.2k
Use AVX512 to zero locals#91166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
ghost commentedAug 27, 2023
Tagging subscribers to this area:@JulieLeeMSFT,@jakobbotsch Issue DetailsExtends#32538 to use AVX-512 (and AVX1) to zero locals for non-loop path. I am going to slightly refactor it to use AVX in the loop path too but later, this seems to be a low-hanging fruit with nice diffs.
|
EgorBo commentedAug 27, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@dotnet/jit-contrib PTAL, simple change withnice diffs (-122kb for benchmarks.pgo collection, -0.13% TP for the same collection). The logic has plenty of opportunities to optimize futher, e.g. use AVX in the loop - I didn't change it here because for that we need to align data to 32/64 bytes + remainder can be handled with overlapping -- but I am leaving it for future follow ups. I was mostly interested in removing loops by allowing up to 6*64=384 bytes to be zeroed directly with avx512 where previously we switched to the loop for >96 bytes. |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Extends#32538 to use AVX-512 (and AVX1) to zero locals for non-loop path. I am going to slightly refactor it to use AVX in the loop path too but later, this seems to be a low-hanging fruit withnice diffs.
Diff example:
(apparently this collection has no avx-512, but still looks better)