Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Use AVX512 to zero locals#91166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
EgorBo merged 2 commits intodotnet:mainfromEgorBo:zero-locals-avx512
Aug 28, 2023
Merged

Conversation

@EgorBo
Copy link
Member

@EgorBoEgorBo commentedAug 27, 2023
edited
Loading

Extends#32538 to use AVX-512 (and AVX1) to zero locals for non-loop path. I am going to slightly refactor it to use AVX in the loop path too but later, this seems to be a low-hanging fruit withnice diffs.

Diff example:

@@ -17,17 +17,13 @@ G_M59697_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,        push     rbx        sub      rsp, 160        vxorps   xmm4, xmm4, xmm4-       vmovdqa  xmmword ptr [rsp+0x20], xmm4-       vmovdqa  xmmword ptr [rsp+0x30], xmm4-       mov      rax, -96-       vmovdqa  xmmword ptr [rsp+rax+0xA0], xmm4-       vmovdqa  xmmword ptr [rsp+rax+0xB0], xmm4-       vmovdqa  xmmword ptr [rsp+rax+0xC0], xmm4-       add      rax, 48-       jne      SHORT  -5 instr+       vmovdqu  ymmword ptr [rsp+0x20], ymm4+       vmovdqu  ymmword ptr [rsp+0x40], ymm4+       vmovdqu  ymmword ptr [rsp+0x60], ymm4+       vmovdqu  ymmword ptr [rsp+0x80], ymm4        mov      rbx, rcx        ; gcrRegs +[rbx]-;; size=70 bbWeight=1 PerfScore 13.33+;; size=42 bbWeight=1 PerfScore 9.83 G_M59697_IG02:        ; bbWeight=1, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref        lea      rcx, [rsp+0x20]        call     [<unknown method>]@@ -46,7 +42,7 @@ G_M59697_IG03:        ; bbWeight=1, epilog, nogc, extend        ret       ;; size=9 bbWeight=1 PerfScore 1.75

(apparently this collection has no avx-512, but still looks better)

@ghostghost added the area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labelAug 27, 2023
@ghostghost assignedEgorBoAug 27, 2023
@ghost
Copy link

Tagging subscribers to this area:@JulieLeeMSFT,@jakobbotsch
See info inarea-owners.md if you want to be subscribed.

Issue Details

Extends#32538 to use AVX-512 (and AVX1) to zero locals for non-loop path. I am going to slightly refactor it to use AVX in the loop path too but later, this seems to be a low-hanging fruit with nice diffs.

Author:EgorBo
Assignees:-
Labels:

area-CodeGen-coreclr

Milestone:-

@EgorBo
Copy link
MemberAuthor

EgorBo commentedAug 27, 2023
edited
Loading

@dotnet/jit-contrib PTAL, simple change withnice diffs (-122kb for benchmarks.pgo collection, -0.13% TP for the same collection).

The logic has plenty of opportunities to optimize futher, e.g. use AVX in the loop - I didn't change it here because for that we need to align data to 32/64 bytes + remainder can be handled with overlapping -- but I am leaving it for future follow ups. I was mostly interested in removing loops by allowing up to 6*64=384 bytes to be zeroed directly with avx512 where previously we switched to the loop for >96 bytes.

@EgorBoEgorBo merged commit3a1570f intodotnet:mainAug 28, 2023
@EgorBoEgorBo deleted the zero-locals-avx512 branchAugust 28, 2023 16:29
@EgorBoEgorBo mentioned this pull requestSep 3, 2023
56 tasks
@ghostghost locked asresolvedand limited conversation to collaboratorsOct 5, 2023
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.

Reviewers

@tannergoodingtannergoodingtannergooding approved these changes

Assignees

@EgorBoEgorBo

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIavx512Related to the AVX-512 architecture

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@EgorBo@tannergooding

[8]ページ先頭

©2009-2025 Movatter.jp