NotificationsYou must be signed in to change notification settings
Fork5.2k
Star17.2k

Faster Vector128/64 compare on arm64#75864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

EgorBo merged 2 commits intodotnet:mainfromEgorBo:arm64-faster-vector-cmp

Sep 20, 2022

Merged

Faster Vector128/64 compare on arm64#75864

EgorBo merged 2 commits intodotnet:mainfromEgorBo:arm64-faster-vector-cmp

Sep 20, 2022

Conversation

Copy link

Member

EgorBo commentedSep 19, 2022•
edited
Loading

Apply@TamarChristinaArm's suggestions for faster vector comparison in#75849

boolTest1(Vector128<int>a,Vector128<int>b)=>a==b;boolTest2(Vector64<float>a,Vector64<float>b)=>a!=b;

Now emits:

; Method Tests:Test1G_M48391_IG01:                      A9BF7BFD          stp     fp, lr, [sp, #-0x10]!        910003FD          mov     fp, spG_M48391_IG02:                      6EA18C10          cmeq    v16.4s, v0.4s, v1.4s-       6E31AA10          uminv   b16, v16.16b-       0E013E00          umov    w0, v16.b[0]-       7100001F          cmp     w0, #0-       9A9F07E0          cset    x0, ne+       6EB0AE10          uminp   v16.4s, v16.4s, v16.4s+       4E083E00          umov    x0, v16.d[0]+       B100041F          cmn     x0, #1+       9A9F17E0          cset    x0, eqG_M48391_IG03:                      A8C17BFD          ldp     fp, lr, [sp], #0x10        D65F03C0          ret     lr; Total bytes of code: 36; Method Tests:Test2G_M64388_IG01:                      A9BF7BFD          stp     fp, lr, [sp, #-0x10]!        910003FD          mov     fp, sp;; size=8 bbWeight=1    PerfScore 1.50G_M64388_IG02:                      0E21E410          fcmeq   v16.2s, v0.2s, v1.2s-       2E31AA10          uminv   b16, v16.8b-       0E013E00          umov    w0, v16.b[0]-       7100001F          cmp     w0, #0-       9A9F17E0          cset    x0, eq+       4E083E00          umov    x0, v16.d[0]+       B100041F          cmn     x0, #1+       9A9F07E0          cset    x0, neG_M64388_IG03:                      A8C17BFD          ldp     fp, lr, [sp], #0x10        D65F03C0          ret     lr-; Total bytes of code: 36+; Total bytes of code: 32

Faster Vector compare on arm64

97b8eba

ghost added the area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label

Sep 19, 2022

ghost assignedEgorBo

Sep 19, 2022

Copy link

ghost commentedSep 19, 2022

Tagging subscribers to this area:@JulieLeeMSFT,@jakobbotsch
See info inarea-owners.md if you want to be subscribed.

Issue Details

Apply@TamarChristinaArm's suggestions for faster vector comparison in#75849

boolTest1(Vector128<int>a,Vector128<int>b)=>a==b;

Now emits:

; Method Test1G_M3164_IG01:        A9BF7BFD          stp     fp, lr,[sp, #-0x10]!        910003FDmov     fp,spG_M3164_IG02:        6EA18C10          cmeq    v16.4s, v0.4s, v1.4s        6EB0AE10          uminp   v16.4s, v16.4s, v16.4s        4E083E00          umov    x0, v16.d[0]        B100041F          cmn     x0, #1        9A9F17E0          cset    x0, eqG_M3164_IG03:        A8C17BFD          ldp     fp, lr,[sp], #0x10        D65F03C0ret     lr; Total bytes of code: 36

Author:	EgorBo
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

Copy link

Contributor

TamarChristinaArm commentedSep 19, 2022

G_M64388_IG02:        0E21E410          fcmeq   v16.2s, v0.2s, v1.2s        0E043E00          umov    w0, v16.s[0]12800001          movn    w1, #0        6B01001Fcmp     w0, w1        9A9F07E0          cset    x0, ne

That doesn't look right, The 64-bit case should be transferring the entire register, so I'm expecting the samed[0] transfer here to anx register and the same compare as the 128-bit case, just without the compression step. Looks like this has ignored the top 32-bits.

Copy link

MemberAuthor

EgorBo commentedSep 19, 2022

G_M64388_IG02:        0E21E410          fcmeq   v16.2s, v0.2s, v1.2s        0E043E00          umov    w0, v16.s[0]12800001          movn    w1, #0        6B01001Fcmp     w0, w1        9A9F07E0          cset    x0, ne
That doesn't look right, The 64-bit case should be transferring the entire register, so I'm expecting the samed[0] transfer here to anx register and the same compare as the 128-bit case, just without the compression step. Looks like this has ignored the top 32-bits.