Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

JIT: Faster vector == Vector128.Zero on arm64#65632

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
EgorBo merged 11 commits intodotnet:mainfromEgorBo:arm-fast-cmp-zero-vec
Feb 23, 2022

Conversation

EgorBo
Copy link
Member

@EgorBoEgorBo commentedFeb 20, 2022
edited
Loading

Closes#63829

staticboolIsZero(Vector128<int>vec)=>vec==Vector128<int>.Zero;

Codegen diff:

; Assembly listing for method IsZero(System.Runtime.Intrinsics.Vector128`1[Int32]):bool    stp     fp, lr, [sp,#-16]!    mov     fp, sp-   cmeq    v16.4s, v0.4s, #0-   uminv   b16, v16.16b-   umov    w0, v16.b[0]+   umaxv   b16, v0.16b+   umov    w0, v16.s[0]    cmp     w0, #0-   cset    x0, ne+   cset    x0, eq    ldp     fp, lr, [sp],#16    ret     lr-; Total bytes of code 36+; Total bytes of code 32

This is needed for faster IndexOf from#63285. Also,#65288 relies on it.

Perf_Regex_Industry_RustLang_SherlockBenchmark:

MethodToolchainPatternMeanErrorStdDevRatio
Count/Core_Root_PR/corerunSherlock Holmes68.51 us0.331 us0.294 us1.00
Count/Core_Root_base/corerunSherlock Holmes72.90 us1.352 us1.265 us1.06
Count/Core_Root_PR/corerunsherlock56.52 us0.226 us0.200 us1.00
Count/Core_Root_base/corerunsherlock59.63 us0.211 us0.198 us1.06
Count/Core_Root_PR/corerunzqj54.03 us0.257 us0.241 us1.00
Count/Core_Root_base/corerunzqj57.08 us0.188 us0.176 us1.06

Diffs

am11 reacted with thumbs up emoji
@ghostghost added the area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labelFeb 20, 2022
@ghostghost assignedEgorBoFeb 20, 2022
@ghost
Copy link

Tagging subscribers to this area:@JulieLeeMSFT
See info inarea-owners.md if you want to be subscribed.

Issue Details

Closes#63829

staticboolIsZero(Vector128<int>vec)=>vec==Vector128<int>.Zero;

Codegen diff:

; Assembly listing for method IsZero(System.Runtime.Intrinsics.Vector128`1[Int32]):bool    stp     fp, lr, [sp,#-16]!    mov     fp, sp-   cmeq    v16.4s, v0.4s, #0-   uminv   b16, v16.16b+   umaxv   b16, v0.16b    umov    w0, v16.b[0]    cmp     w0, #0-   cset    x0, ne+   cset    x0, eq    ldp     fp, lr, [sp],#16    ret     lr-; Total bytes of code 36+; Total bytes of code 32

This is needed for faster IndexOf from#63285. Also,#65288 relies on it.

Perf_Regex_Industry_RustLang_SherlockBenchmark:

MethodToolchainPatternMeanErrorStdDevRatio
Count/Core_Root_PR/corerunSherlock Holmes68.51 us0.331 us0.294 us1.00
Count/Core_Root_base/corerunSherlock Holmes72.90 us1.352 us1.265 us1.06
Count/Core_Root_PR/corerunsherlock56.52 us0.226 us0.200 us1.00
Count/Core_Root_base/corerunsherlock59.63 us0.211 us0.198 us1.06
Count/Core_Root_PR/corerunzqj54.03 us0.257 us0.241 us1.00
Count/Core_Root_base/corerunzqj57.08 us0.188 us0.176 us1.06
Author:EgorBo
Assignees:-
Labels:

area-CodeGen-coreclr

Milestone:-

@EgorBo
Copy link
MemberAuthor

PTAL @echesakovMSFT@TIHan

@EgorBo
Copy link
MemberAuthor

cc@vargaz@fanyang-mono the Test I added in this PR crashes onMono llvmaot Pri0 Runtime Tests Run Linux x64 release:

aot-compile: compiling /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/JIT/HardwareIntrinsics/General/HwiOp/CompareVectorWithZero/CompareVectorWithZero.dll; MONO_PATH: /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/JIT/HardwareIntrinsics/General/HwiOp/CompareVectorWithZero:/__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root2022-02-20T17:19:30.2982792Z   Mono Ahead of Time compiler - compiling assembly /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/JIT/HardwareIntrinsics/General/HwiOp/CompareVectorWithZero/CompareVectorWithZero.dll2022-02-20T17:19:30.2986165Z   AOTID A5710361-7321-642D-C172-A16090734D6F2022-02-20T17:19:30.2988419Z   * Assertion: should not be reached at /__w/1/s/src/mono/mono/mini/simd-intrinsics.c:3442022-02-20T17:19:30.2990056Z   2022-02-20T17:19:30.3025150Z   2022-02-20T17:19:30.3026584Z   =================================================================2022-02-20T17:19:30.3061763Z   Native Crash Reporting2022-02-20T17:19:30.3083915Z   =================================================================2022-02-20T17:19:30.3104269Z   Got a SIGABRT while executing native code. This usually indicates2022-02-20T17:19:30.3910283Z   a fatal error in the mono runtime or one of the native libraries 2022-02-20T17:19:30.3912329Z   2022-02-20T17:19:30.3913499Z   =================================================================2022-02-20T17:19:30.3914698Z   External Debugger Dump:2022-02-20T17:19:30.3916100Z   =================================================================2022-02-20T17:19:30.4109204Z   used by your application.2022-02-20T17:19:30.4111015Z   =================================================================2022-02-20T17:19:30.4112230Z   2022-02-20T17:19:30.4113797Z   =================================================================2022-02-20T17:19:30.4115082Z   Native stacktrace:2022-02-20T17:19:30.4116339Z   =================================================================2022-02-20T17:19:30.4118693Z   0x7ff0567b89c2 - Unknown2022-02-20T17:19:30.4120408Z   0x7ff05675936e - Unknown2022-02-20T17:19:30.4122083Z   0x7ff0567b8298 - Unknown2022-02-20T17:19:30.4138832Z   0x7ff058a31630 - Unknown2022-02-20T17:19:30.4140970Z   0x7ff057e6a387 - Unknown2022-02-20T17:19:30.4142772Z   0x7ff057e6ba78 - Unknown2022-02-20T17:19:30.4144476Z   0x7ff056834875 - Unknown2022-02-20T17:19:30.4146183Z   0x7ff056646a33 - Unknown2022-02-20T17:19:30.4147862Z   0x7ff056834cdd - Unknown2022-02-20T17:19:30.4149610Z   0x7ff056834e45 - Unknown2022-02-20T17:19:30.4151197Z   0x7ff056834ea4 - Unknown2022-02-20T17:19:30.4152791Z   0x7ff05676c466 - Unknown2022-02-20T17:19:30.4154346Z   0x7ff05678bda3 - Unknown2022-02-20T17:19:30.4156012Z   0x7ff0566ddace - Unknown2022-02-20T17:19:30.4159329Z   0x7ff0566afec1 - Unknown2022-02-20T17:19:30.4161158Z   0x7ff056736e39 - Unknown2022-02-20T17:19:30.4162645Z   0x7ff056727b72 - Unknown2022-02-20T17:19:30.4164267Z   0x7ff0567195d2 - Unknown2022-02-20T17:19:30.4165995Z   0x7ff05679480e - Unknown2022-02-20T17:19:30.4167524Z   0x55dcb433a5aa - Unknown2022-02-20T17:19:30.4169123Z   0x7ff057e56555 - Unknown2022-02-20T17:19:30.4170801Z   0x55dcb4338029 - Unknown2022-02-20T17:19:30.6293373Z   [New LWP 10203]2022-02-20T17:19:30.6295390Z   [Thread debugging using libthread_db enabled]2022-02-20T17:19:30.6296826Z   Using host libthread_db library "/lib64/libthread_db.so.1".2022-02-20T17:19:31.2218749Z EXEC : warning : the debug information found in "/__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so.dbg" does not match "/__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so" (CRC mismatch). [/__w/1/s/src/mono/msbuild/aot-compile.proj]2022-02-20T17:19:31.2222043Z   2022-02-20T17:19:31.2239353Z   Missing separate debuginfo for /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2242226Z   Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/92/5d8bab89f15c6990ba944db4dfd44d746cfdb8.debug2022-02-20T17:19:31.2395130Z   0x00007ff058a311d9 in waitpid () from /lib64/libpthread.so.02022-02-20T17:19:31.2412985Z     Id   Target Id         Frame 2022-02-20T17:19:31.2414838Z     2    Thread 0x7ff054fff700 (LWP 10203) "SGen worker" 0x00007ff058a2da35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.02022-02-20T17:19:31.2432285Z   * 1    Thread 0x7ff059056740 (LWP 10202) "corerun" 0x00007ff058a311d9 in waitpid () from /lib64/libpthread.so.02022-02-20T17:19:31.2579202Z   2022-02-20T17:19:31.2590000Z   Thread 2 (Thread 0x7ff054fff700 (LWP 10203)):2022-02-20T17:19:31.2591755Z   #0  0x00007ff058a2da35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.02022-02-20T17:19:31.2593635Z   #1  0x00007ff05669f7f3 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2609971Z   #2  0x00007ff058a29ea5 in start_thread () from /lib64/libpthread.so.02022-02-20T17:19:31.2611736Z   #3  0x00007ff057f329fd in clone () from /lib64/libc.so.62022-02-20T17:19:31.2623338Z   2022-02-20T17:19:31.2633759Z   Thread 1 (Thread 0x7ff059056740 (LWP 10202)):2022-02-20T17:19:31.2635970Z   #0  0x00007ff058a311d9 in waitpid () from /lib64/libpthread.so.02022-02-20T17:19:31.2637754Z   #1  0x00007ff0567b8b07 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2649895Z   #2  0x00007ff05675936e in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2659790Z   #3  0x00007ff0567b8298 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2661758Z   #4  <signal handler called>2022-02-20T17:19:31.2663461Z   #5  0x00007ff057e6a387 in raise () from /lib64/libc.so.62022-02-20T17:19:31.2664790Z   #6  0x00007ff057e6ba78 in abort () from /lib64/libc.so.62022-02-20T17:19:31.2666396Z   #7  0x00007ff056834875 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2678513Z   #8  0x00007ff056646a33 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2685693Z   #9  0x00007ff056834cdd in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2698988Z   #10 0x00007ff056834e45 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2714534Z   #11 0x00007ff056834ea4 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2728485Z   #12 0x00007ff05676c466 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2742435Z   #13 0x00007ff05678bda3 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2755997Z   #14 0x00007ff0566ddace in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2769258Z   #15 0x00007ff0566afec1 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2783243Z   #16 0x00007ff056736e39 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2796491Z   #17 0x00007ff056727b72 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2809466Z   #18 0x00007ff0567195d2 in mono_main () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2823406Z   #19 0x00007ff05679480e in monovm_execute_assembly () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.3348964Z   #20 0x000055dcb433a5aa in run (config=...) at /__w/1/s/src/coreclr/hosts/corerun/corerun.cpp:3682022-02-20T17:19:31.3363557Z   #21 main (argc=<optimized out>, argv=<optimized out>) at /__w/1/s/src/coreclr/hosts/corerun/corerun.cpp:5632022-02-20T17:19:31.3425311Z   [Inferior 1 (process 10202) detached]2022-02-20T17:19:31.3507046Z   2022-02-20T17:19:31.3508333Z   =================================================================2022-02-20T17:19:31.3509529Z   Basic Fault Address Reporting2022-02-20T17:19:31.3510661Z   =================================================================2022-02-20T17:19:31.3512250Z   Memory around native instruction pointer (0x7ff057e6a387):0x7ff057e6a377  48 63 d7 48 63 f6 48 63 f9 b8 ea 00 00 00 0f 05  Hc.Hc.Hc........2022-02-20T17:19:31.3513887Z   0x7ff057e6a387  48 3d 00 f0 ff ff 77 1e f3 c3 0f 1f 80 00 00 00  H=....w.........2022-02-20T17:19:31.3515279Z   0x7ff057e6a397  00 85 c9 7f db 89 c8 f7 d8 81 e1 ff ff ff 7f 0f  ................2022-02-20T17:19:31.3519503Z   0x7ff057e6a3a7  44 c6 89 c1 eb ca 48 8b 15 9c 0a 39 00 f7 d8 64  D.....H....9...d2022-02-20T17:19:31.9097999Z   Mono Ahead of Time compiler - compiling assembly /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/JIT/HardwareIntrinsics/Arm/Rdm/Rdm_ro/Rdm_ro.dll2022-02-20T17:19:31.9101723Z   AOTID 496F54B8-0F3D-AA06-C32F-9EB82E5678702022-02-20T17:19:31.9104767Z   Executing opt: "/__w/1/s/artifacts/bin/mono/Linux.x64.Release/opt" -f -O2 -disable-tail-calls -place-safepoints -spp-all-backedges -mattr=sse4.2,popcnt,lzcnt,bmi,bmi2,pclmul,aes -o "mono_aot_6QfcWC/temp.opt.bc" "mono_aot_6QfcWC/temp.bc"2022-02-20T17:19:31.9109761Z   Executing llc: "/__w/1/s/artifacts/bin/mono/Linux.x64.Release/llc"  -march=x86-64 -mcpu=generic -enable-implicit-null-checks -disable-fault-maps -asm-verbose=false -disable-gnu-eh-frame -enable-mono-eh-frame -mono-eh-frame-symbol=mono_aot_Rdm_ro_eh_frame -disable-tail-calls -no-x86-call-frame-opt -relocation-model=pic -filetype=obj -mattr=sse4.2,popcnt,lzcnt,bmi,bmi2,pclmul,aes -o "mono_aot_6QfcWC/temp-llvm.o" "mono_aot_6QfcWC/temp.opt.bc"2022-02-20T17:19:31.9113118Z   Compiled: 2033/20332022-02-20T17:19:31.9115090Z   Executing the native assembler: "as" --64  -o /tmp/mono_aot_Af32J0.o /tmp/mono_aot_Af32J02022-02-20T17:19:31.9118024Z   Executing the native linker: "ld" -shared -o /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/JIT/HardwareIntrinsics/Arm/Rdm/Rdm_ro/Rdm_ro.dll.so.tmp "mono_aot_6QfcWC/temp-llvm.o" /tmp/mono_aot_Af32J0.o

@vargaz
Copy link
Contributor

This will fix it:
https://gist.github.com/vargaz/78f0b5d0710c5de7a0131b0f9a6ea5d3

EgorBo reacted with thumbs up emoji

Copy link
Contributor

@TIHanTIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks good! Only a few comments.

@fanyang-mono
Copy link
Member

fanyang-mono commentedFeb 22, 2022
edited
Loading

I had created an issue to clean up the code for type checks of vector elements. Haven't get to it yet. (#65318)

TIHan reacted with thumbs up emoji

Copy link
Contributor

@echesakovechesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Left some comments

if (!varTypeIsFloating(simdBaseType) && (op != nullptr))
{
GenTree* cmp =
comp->gtNewSimdHWIntrinsicNode(simdType, op, NI_AdvSimd_Arm64_MaxAcross, CORINFO_TYPE_UBYTE, simdSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

According toArm® Cortex®-A76 Software Optimization Guide:
UMAXV, 16B has Exec latency6 and Execution throughput1/2
whileUMAXV, 4H/4S has Exec latency3 and Execution throughput1

Do we wantCORINFO_TYPE_USHORT/CORINFO_TYPE_UINT as a base type instead?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sure, let's change, Although, I had this in mind when I was benchmarking it and saw zero difference, but I now I see why:
image
(Apple M1, Firestorm core)

@kunalspathak
Copy link
Contributor

kunalspathak commentedMar 3, 2022
edited
Loading

Improvements indotnet/perf-autofiling-issues#3833 anddotnet/perf-autofiling-issues#3829

EgorBo reacted with heart emoji

@EgorBo
Copy link
MemberAuthor

wow, it's more than I expected

@EgorBoEgorBo deleted the arm-fast-cmp-zero-vec branchMarch 3, 2022 17:56
@ghostghost locked asresolvedand limited conversation to collaboratorsApr 2, 2022
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@TIHanTIHanTIHan approved these changes

@echesakovechesakovechesakov approved these changes

@vargazvargazAwaiting requested review from vargaz

@lambdageeklambdageekAwaiting requested review from lambdageek

@SamMonoRTSamMonoRTAwaiting requested review from SamMonoRT

@imhameedimhameedAwaiting requested review from imhameed

Assignees

@EgorBoEgorBo

Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

JIT: Faster comparison against Vector128<>.Zero
6 participants
@EgorBo@vargaz@fanyang-mono@kunalspathak@TIHan@echesakov

[8]ページ先頭

©2009-2025 Movatter.jp