Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Faster Vector128/64 compare on arm64#75864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
EgorBo merged 2 commits intodotnet:mainfromEgorBo:arm64-faster-vector-cmp
Sep 20, 2022

Conversation

@EgorBo
Copy link
Member

@EgorBoEgorBo commentedSep 19, 2022
edited
Loading

Apply@TamarChristinaArm's suggestions for faster vector comparison in#75849

boolTest1(Vector128<int>a,Vector128<int>b)=>a==b;boolTest2(Vector64<float>a,Vector64<float>b)=>a!=b;

Now emits:

; Method Tests:Test1G_M48391_IG01:                      A9BF7BFD          stp     fp, lr, [sp, #-0x10]!        910003FD          mov     fp, spG_M48391_IG02:                      6EA18C10          cmeq    v16.4s, v0.4s, v1.4s-       6E31AA10          uminv   b16, v16.16b-       0E013E00          umov    w0, v16.b[0]-       7100001F          cmp     w0, #0-       9A9F07E0          cset    x0, ne+       6EB0AE10          uminp   v16.4s, v16.4s, v16.4s+       4E083E00          umov    x0, v16.d[0]+       B100041F          cmn     x0, #1+       9A9F17E0          cset    x0, eqG_M48391_IG03:                      A8C17BFD          ldp     fp, lr, [sp], #0x10        D65F03C0          ret     lr; Total bytes of code: 36; Method Tests:Test2G_M64388_IG01:                      A9BF7BFD          stp     fp, lr, [sp, #-0x10]!        910003FD          mov     fp, sp;; size=8 bbWeight=1    PerfScore 1.50G_M64388_IG02:                      0E21E410          fcmeq   v16.2s, v0.2s, v1.2s-       2E31AA10          uminv   b16, v16.8b-       0E013E00          umov    w0, v16.b[0]-       7100001F          cmp     w0, #0-       9A9F17E0          cset    x0, eq+       4E083E00          umov    x0, v16.d[0]+       B100041F          cmn     x0, #1+       9A9F07E0          cset    x0, neG_M64388_IG03:                      A8C17BFD          ldp     fp, lr, [sp], #0x10        D65F03C0          ret     lr-; Total bytes of code: 36+; Total bytes of code: 32

PaulusParssinen, nietras, and thushan reacted with hooray emoji
@ghostghost added the area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labelSep 19, 2022
@ghostghost assignedEgorBoSep 19, 2022
@ghost
Copy link

Tagging subscribers to this area:@JulieLeeMSFT,@jakobbotsch
See info inarea-owners.md if you want to be subscribed.

Issue Details

Apply@TamarChristinaArm's suggestions for faster vector comparison in#75849

boolTest1(Vector128<int>a,Vector128<int>b)=>a==b;

Now emits:

; Method Test1G_M3164_IG01:        A9BF7BFD          stp     fp, lr,[sp, #-0x10]!        910003FDmov     fp,spG_M3164_IG02:        6EA18C10          cmeq    v16.4s, v0.4s, v1.4s        6EB0AE10          uminp   v16.4s, v16.4s, v16.4s        4E083E00          umov    x0, v16.d[0]        B100041F          cmn     x0, #1        9A9F17E0          cset    x0, eqG_M3164_IG03:        A8C17BFD          ldp     fp, lr,[sp], #0x10        D65F03C0ret     lr; Total bytes of code: 36
Author:EgorBo
Assignees:-
Labels:

area-CodeGen-coreclr

Milestone:-

@TamarChristinaArm
Copy link
Contributor

G_M64388_IG02:        0E21E410          fcmeq   v16.2s, v0.2s, v1.2s        0E043E00          umov    w0, v16.s[0]12800001          movn    w1, #0        6B01001Fcmp     w0, w1        9A9F07E0          cset    x0, ne

That doesn't look right, The 64-bit case should be transferring the entire register, so I'm expecting the samed[0] transfer here to anx register and the same compare as the 128-bit case, just without the compression step. Looks like this has ignored the top 32-bits.

@EgorBo
Copy link
MemberAuthor

G_M64388_IG02:        0E21E410          fcmeq   v16.2s, v0.2s, v1.2s        0E043E00          umov    w0, v16.s[0]12800001          movn    w1, #0        6B01001Fcmp     w0, w1        9A9F07E0          cset    x0, ne

That doesn't look right, The 64-bit case should be transferring the entire register, so I'm expecting the samed[0] transfer here to anx register and the same compare as the 128-bit case, just without the compression step. Looks like this has ignored the top 32-bits.

Ah, so I thought, let me fix it 🙂

@TamarChristinaArm
Copy link
Contributor

I assume the diff for the 128-bit one is reversed? but otherwise that looks good to me now.

@EgorBo
Copy link
MemberAuthor

I assume the diff for the 128-bit one is reversed? but otherwise that looks good to me now.

Yes, should be correct now, thanks!

TamarChristinaArm reacted with thumbs up emoji

@EgorBo
Copy link
MemberAuthor

@kunalspathak PTAL this one too please

Copy link
Contributor

@kunalspathakkunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM

@EgorBoEgorBo merged commitbff967b intodotnet:mainSep 20, 2022
@EgorBoEgorBo deleted the arm64-faster-vector-cmp branchSeptember 20, 2022 15:57
@EgorBo
Copy link
MemberAuthor

EgorBo commentedSep 21, 2022
edited
Loading

@TamarChristinaArm it's too early to estimate the whole impact but I already see nice improvements from this, e.g.:
image

And even more here:

image

TamarChristinaArm, creyke, PaulusParssinen, martincostello, thushan, and neon-sunset reacted with rocket emoji

@TamarChristinaArm
Copy link
Contributor

Naise!

davidfowl reacted with heart emoji

@DrewScoggins
Copy link
Member

Improvements:dotnet/perf-autofiling-issues#8660

Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.

Reviewers

1 more reviewer

@kunalspathakkunalspathakkunalspathak approved these changes

Reviewers whose approvals may not affect merge requirements

Assignees

@EgorBoEgorBo

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@EgorBo@TamarChristinaArm@DrewScoggins@kunalspathak

[8]ページ先頭

©2009-2025 Movatter.jp