NotificationsYou must be signed in to change notification settings
Fork5.1k
Star16.6k

JIT: Faster vector == Vector128.Zero on arm64#65632

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

EgorBo merged 11 commits intodotnet:mainfromEgorBo:arm-fast-cmp-zero-vec

Feb 23, 2022

Merged

JIT: Faster vector == Vector128.Zero on arm64#65632

EgorBo merged 11 commits intodotnet:mainfromEgorBo:arm-fast-cmp-zero-vec

Feb 23, 2022

Conversation

Copy link

Member

EgorBo commentedFeb 20, 2022•
edited
Loading

Closes#63829

staticboolIsZero(Vector128<int>vec)=>vec==Vector128<int>.Zero;

Codegen diff:

; Assembly listing for method IsZero(System.Runtime.Intrinsics.Vector128`1[Int32]):bool    stp     fp, lr, [sp,#-16]!    mov     fp, sp-   cmeq    v16.4s, v0.4s, #0-   uminv   b16, v16.16b-   umov    w0, v16.b[0]+   umaxv   b16, v0.16b+   umov    w0, v16.s[0]    cmp     w0, #0-   cset    x0, ne+   cset    x0, eq    ldp     fp, lr, [sp],#16    ret     lr-; Total bytes of code 36+; Total bytes of code 32

This is needed for faster IndexOf from#63285. Also,#65288 relies on it.

Perf_Regex_Industry_RustLang_SherlockBenchmark:

Method	Toolchain	Pattern	Mean	Error	StdDev	Ratio
Count	/Core_Root_PR/corerun	Sherlock Holmes	68.51 us	0.331 us	0.294 us	1.00
Count	/Core_Root_base/corerun	Sherlock Holmes	72.90 us	1.352 us	1.265 us	1.06

Count	/Core_Root_PR/corerun	sherlock	56.52 us	0.226 us	0.200 us	1.00
Count	/Core_Root_base/corerun	sherlock	59.63 us	0.211 us	0.198 us	1.06

Count	/Core_Root_PR/corerun	zqj	54.03 us	0.257 us	0.241 us	1.00
Count	/Core_Root_base/corerun	zqj	57.08 us	0.188 us	0.176 us	1.06

Diffs

EgorBo added2 commits

February 20, 2022 14:48

Optimize vec == Vector.Zero for arm64

466afe6

Clean up

3139218

ghost added the area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label

Feb 20, 2022

ghost assignedEgorBo

Feb 20, 2022

Copy link

ghost commentedFeb 20, 2022

Tagging subscribers to this area:@JulieLeeMSFT
See info inarea-owners.md if you want to be subscribed.

Issue Details

Closes#63829

staticboolIsZero(Vector128<int>vec)=>vec==Vector128<int>.Zero;

Codegen diff:

; Assembly listing for method IsZero(System.Runtime.Intrinsics.Vector128`1[Int32]):bool    stp     fp, lr, [sp,#-16]!    mov     fp, sp-   cmeq    v16.4s, v0.4s, #0-   uminv   b16, v16.16b+   umaxv   b16, v0.16b    umov    w0, v16.b[0]    cmp     w0, #0-   cset    x0, ne+   cset    x0, eq    ldp     fp, lr, [sp],#16    ret     lr-; Total bytes of code 36+; Total bytes of code 32

This is needed for faster IndexOf from#63285. Also,#65288 relies on it.

Perf_Regex_Industry_RustLang_SherlockBenchmark:

Method	Toolchain	Pattern	Mean	Error	StdDev	Ratio
Count	/Core_Root_PR/corerun	Sherlock Holmes	68.51 us	0.331 us	0.294 us	1.00
Count	/Core_Root_base/corerun	Sherlock Holmes	72.90 us	1.352 us	1.265 us	1.06

Count	/Core_Root_PR/corerun	sherlock	56.52 us	0.226 us	0.200 us	1.00
Count	/Core_Root_base/corerun	sherlock	59.63 us	0.211 us	0.198 us	1.06

Count	/Core_Root_PR/corerun	zqj	54.03 us	0.257 us	0.241 us	1.00
Count	/Core_Root_base/corerun	zqj	57.08 us	0.188 us	0.176 us	1.06

Author:	EgorBo
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

EgorBo added3 commits

February 20, 2022 16:54

Update lowerarmarch.cpp

a4143c4

Update lowerarmarch.cpp

f9d45e0

Update lowerarmarch.cpp

51f727f

Copy link

MemberAuthor

EgorBo commentedFeb 20, 2022

PTAL @echesakovMSFT@TIHan

Copy link

MemberAuthor

EgorBo commentedFeb 20, 2022

cc@vargaz @fanyang-mono the Test I added in this PR crashes onMono llvmaot Pri0 Runtime Tests Run Linux x64 release:

aot-compile: compiling /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/JIT/HardwareIntrinsics/General/HwiOp/CompareVectorWithZero/CompareVectorWithZero.dll; MONO_PATH: /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/JIT/HardwareIntrinsics/General/HwiOp/CompareVectorWithZero:/__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root2022-02-20T17:19:30.2982792Z   Mono Ahead of Time compiler - compiling assembly /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/JIT/HardwareIntrinsics/General/HwiOp/CompareVectorWithZero/CompareVectorWithZero.dll2022-02-20T17:19:30.2986165Z   AOTID A5710361-7321-642D-C172-A16090734D6F2022-02-20T17:19:30.2988419Z   * Assertion: should not be reached at /__w/1/s/src/mono/mono/mini/simd-intrinsics.c:3442022-02-20T17:19:30.2990056Z   2022-02-20T17:19:30.3025150Z   2022-02-20T17:19:30.3026584Z   =================================================================2022-02-20T17:19:30.3061763Z   Native Crash Reporting2022-02-20T17:19:30.3083915Z   =================================================================2022-02-20T17:19:30.3104269Z   Got a SIGABRT while executing native code. This usually indicates2022-02-20T17:19:30.3910283Z   a fatal error in the mono runtime or one of the native libraries 2022-02-20T17:19:30.3912329Z   2022-02-20T17:19:30.3913499Z   =================================================================2022-02-20T17:19:30.3914698Z   External Debugger Dump:2022-02-20T17:19:30.3916100Z   =================================================================2022-02-20T17:19:30.4109204Z   used by your application.2022-02-20T17:19:30.4111015Z   =================================================================2022-02-20T17:19:30.4112230Z   2022-02-20T17:19:30.4113797Z   =================================================================2022-02-20T17:19:30.4115082Z   Native stacktrace:2022-02-20T17:19:30.4116339Z   =================================================================2022-02-20T17:19:30.4118693Z   0x7ff0567b89c2 - Unknown2022-02-20T17:19:30.4120408Z   0x7ff05675936e - Unknown2022-02-20T17:19:30.4122083Z   0x7ff0567b8298 - Unknown2022-02-20T17:19:30.4138832Z   0x7ff058a31630 - Unknown2022-02-20T17:19:30.4140970Z   0x7ff057e6a387 - Unknown2022-02-20T17:19:30.4142772Z   0x7ff057e6ba78 - Unknown2022-02-20T17:19:30.4144476Z   0x7ff056834875 - Unknown2022-02-20T17:19:30.4146183Z   0x7ff056646a33 - Unknown2022-02-20T17:19:30.4147862Z   0x7ff056834cdd - Unknown2022-02-20T17:19:30.4149610Z   0x7ff056834e45 - Unknown2022-02-20T17:19:30.4151197Z   0x7ff056834ea4 - Unknown2022-02-20T17:19:30.4152791Z   0x7ff05676c466 - Unknown2022-02-20T17:19:30.4154346Z   0x7ff05678bda3 - Unknown2022-02-20T17:19:30.4156012Z   0x7ff0566ddace - Unknown2022-02-20T17:19:30.4159329Z   0x7ff0566afec1 - Unknown2022-02-20T17:19:30.4161158Z   0x7ff056736e39 - Unknown2022-02-20T17:19:30.4162645Z   0x7ff056727b72 - Unknown2022-02-20T17:19:30.4164267Z   0x7ff0567195d2 - Unknown2022-02-20T17:19:30.4165995Z   0x7ff05679480e - Unknown2022-02-20T17:19:30.4167524Z   0x55dcb433a5aa - Unknown2022-02-20T17:19:30.4169123Z   0x7ff057e56555 - Unknown2022-02-20T17:19:30.4170801Z   0x55dcb4338029 - Unknown2022-02-20T17:19:30.6293373Z   [New LWP 10203]2022-02-20T17:19:30.6295390Z   [Thread debugging using libthread_db enabled]2022-02-20T17:19:30.6296826Z   Using host libthread_db library "/lib64/libthread_db.so.1".2022-02-20T17:19:31.2218749Z EXEC : warning : the debug information found in "/__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so.dbg" does not match "/__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so" (CRC mismatch). [/__w/1/s/src/mono/msbuild/aot-compile.proj]2022-02-20T17:19:31.2222043Z   2022-02-20T17:19:31.2239353Z   Missing separate debuginfo for /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2242226Z   Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/92/5d8bab89f15c6990ba944db4dfd44d746cfdb8.debug2022-02-20T17:19:31.2395130Z   0x00007ff058a311d9 in waitpid () from /lib64/libpthread.so.02022-02-20T17:19:31.2412985Z     Id   Target Id         Frame 2022-02-20T17:19:31.2414838Z     2    Thread 0x7ff054fff700 (LWP 10203) "SGen worker" 0x00007ff058a2da35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.02022-02-20T17:19:31.2432285Z   * 1    Thread 0x7ff059056740 (LWP 10202) "corerun" 0x00007ff058a311d9 in waitpid () from /lib64/libpthread.so.02022-02-20T17:19:31.2579202Z   2022-02-20T17:19:31.2590000Z   Thread 2 (Thread 0x7ff054fff700 (LWP 10203)):2022-02-20T17:19:31.2591755Z   #0  0x00007ff058a2da35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.02022-02-20T17:19:31.2593635Z   #1  0x00007ff05669f7f3 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2609971Z   #2  0x00007ff058a29ea5 in start_thread () from /lib64/libpthread.so.02022-02-20T17:19:31.2611736Z   #3  0x00007ff057f329fd in clone () from /lib64/libc.so.62022-02-20T17:19:31.2623338Z   2022-02-20T17:19:31.2633759Z   Thread 1 (Thread 0x7ff059056740 (LWP 10202)):2022-02-20T17:19:31.2635970Z   #0  0x00007ff058a311d9 in waitpid () from /lib64/libpthread.so.02022-02-20T17:19:31.2637754Z   #1  0x00007ff0567b8b07 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2649895Z   #2  0x00007ff05675936e in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2659790Z   #3  0x00007ff0567b8298 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2661758Z   #4  <signal handler called>2022-02-20T17:19:31.2663461Z   #5  0x00007ff057e6a387 in raise () from /lib64/libc.so.62022-02-20T17:19:31.2664790Z   #6  0x00007ff057e6ba78 in abort () from /lib64/libc.so.62022-02-20T17:19:31.2666396Z   #7  0x00007ff056834875 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2678513Z   #8  0x00007ff056646a33 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2685693Z   #9  0x00007ff056834cdd in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2698988Z   #10 0x00007ff056834e45 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2714534Z   #11 0x00007ff056834ea4 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2728485Z   #12 0x00007ff05676c466 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2742435Z   #13 0x00007ff05678bda3 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2755997Z   #14 0x00007ff0566ddace in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2769258Z   #15 0x00007ff0566afec1 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2783243Z   #16 0x00007ff056736e39 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2796491Z   #17 0x00007ff056727b72 in ?? () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2809466Z   #18 0x00007ff0567195d2 in mono_main () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.2823406Z   #19 0x00007ff05679480e in monovm_execute_assembly () from /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/Tests/Core_Root/libcoreclr.so2022-02-20T17:19:31.3348964Z   #20 0x000055dcb433a5aa in run (config=...) at /__w/1/s/src/coreclr/hosts/corerun/corerun.cpp:3682022-02-20T17:19:31.3363557Z   #21 main (argc=<optimized out>, argv=<optimized out>) at /__w/1/s/src/coreclr/hosts/corerun/corerun.cpp:5632022-02-20T17:19:31.3425311Z   [Inferior 1 (process 10202) detached]2022-02-20T17:19:31.3507046Z   2022-02-20T17:19:31.3508333Z   =================================================================2022-02-20T17:19:31.3509529Z   Basic Fault Address Reporting2022-02-20T17:19:31.3510661Z   =================================================================2022-02-20T17:19:31.3512250Z   Memory around native instruction pointer (0x7ff057e6a387):0x7ff057e6a377  48 63 d7 48 63 f6 48 63 f9 b8 ea 00 00 00 0f 05  Hc.Hc.Hc........2022-02-20T17:19:31.3513887Z   0x7ff057e6a387  48 3d 00 f0 ff ff 77 1e f3 c3 0f 1f 80 00 00 00  H=....w.........2022-02-20T17:19:31.3515279Z   0x7ff057e6a397  00 85 c9 7f db 89 c8 f7 d8 81 e1 ff ff ff 7f 0f  ................2022-02-20T17:19:31.3519503Z   0x7ff057e6a3a7  44 c6 89 c1 eb ca 48 8b 15 9c 0a 39 00 f7 d8 64  D.....H....9...d2022-02-20T17:19:31.9097999Z   Mono Ahead of Time compiler - compiling assembly /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/JIT/HardwareIntrinsics/Arm/Rdm/Rdm_ro/Rdm_ro.dll2022-02-20T17:19:31.9101723Z   AOTID 496F54B8-0F3D-AA06-C32F-9EB82E5678702022-02-20T17:19:31.9104767Z   Executing opt: "/__w/1/s/artifacts/bin/mono/Linux.x64.Release/opt" -f -O2 -disable-tail-calls -place-safepoints -spp-all-backedges -mattr=sse4.2,popcnt,lzcnt,bmi,bmi2,pclmul,aes -o "mono_aot_6QfcWC/temp.opt.bc" "mono_aot_6QfcWC/temp.bc"2022-02-20T17:19:31.9109761Z   Executing llc: "/__w/1/s/artifacts/bin/mono/Linux.x64.Release/llc"  -march=x86-64 -mcpu=generic -enable-implicit-null-checks -disable-fault-maps -asm-verbose=false -disable-gnu-eh-frame -enable-mono-eh-frame -mono-eh-frame-symbol=mono_aot_Rdm_ro_eh_frame -disable-tail-calls -no-x86-call-frame-opt -relocation-model=pic -filetype=obj -mattr=sse4.2,popcnt,lzcnt,bmi,bmi2,pclmul,aes -o "mono_aot_6QfcWC/temp-llvm.o" "mono_aot_6QfcWC/temp.opt.bc"2022-02-20T17:19:31.9113118Z   Compiled: 2033/20332022-02-20T17:19:31.9115090Z   Executing the native assembler: "as" --64  -o /tmp/mono_aot_Af32J0.o /tmp/mono_aot_Af32J02022-02-20T17:19:31.9118024Z   Executing the native linker: "ld" -shared -o /__w/1/s/artifacts/tests/coreclr/Linux.x64.Release/JIT/HardwareIntrinsics/Arm/Rdm/Rdm_ro/Rdm_ro.dll.so.tmp "mono_aot_6QfcWC/temp-llvm.o" /tmp/mono_aot_Af32J0.o

Copy link

Contributor

vargaz commentedFeb 20, 2022

This will fix it:
https://gist.github.com/vargaz/78f0b5d0710c5de7a0131b0f9a6ea5d3

EgorBo added2 commits

February 21, 2022 15:21

Apply Zoltan's patch

70e18b7

Merge branch 'main' ofhttps://github.com/dotnet/runtimeinto arm-fas…

1550865

…t-cmp-zero-vec

EgorBo requested review fromvargaz,lambdageek,SamMonoRT andimhameed ascode owners

February 21, 2022 12:21

TIHan reviewed

Feb 21, 2022

View reviewed changes

src/coreclr/jit/lowerarmarch.cpp OutdatedShow resolvedHide resolved

TIHan reviewed

Feb 21, 2022

View reviewed changes

src/coreclr/jit/lowerarmarch.cpp OutdatedShow resolvedHide resolved

TIHan approved these changes

Feb 21, 2022

View reviewed changes

Copy link

Contributor

TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks good! Only a few comments.

Address feedback

c1c7831

Copy link

Member

fanyang-mono commentedFeb 22, 2022•
edited
Loading

I had created an issue to clean up the code for type checks of vector elements. Haven't get to it yet. (#65318)

echesakov reviewed

Feb 22, 2022

View reviewed changes

Copy link

Contributor

echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Left some comments

src/coreclr/jit/lowerarmarch.cpp OutdatedShow resolvedHide resolved

src/coreclr/jit/lowerarmarch.cpp Outdated

		if (!varTypeIsFloating(simdBaseType) && (op != nullptr))
		{
		GenTree* cmp =
		comp->gtNewSimdHWIntrinsicNode(simdType, op, NI_AdvSimd_Arm64_MaxAcross, CORINFO_TYPE_UBYTE, simdSize);

Copy link

Contributor

echesakovFeb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

According toArm® Cortex®-A76 Software Optimization Guide:
UMAXV, 16B has Exec latency6 and Execution throughput1/2
whileUMAXV, 4H/4S has Exec latency3 and Execution throughput1

Do we wantCORINFO_TYPE_USHORT/CORINFO_TYPE_UINT as a base type instead?