- Notifications
You must be signed in to change notification settings - Fork5.1k
JIT: Faster vector == Vector128.Zero on arm64#65632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
ghost commentedFeb 20, 2022
Tagging subscribers to this area:@JulieLeeMSFT Issue DetailsCloses#63829 staticboolIsZero(Vector128<int>vec)=>vec==Vector128<int>.Zero; Codegen diff: ; Assembly listing for method IsZero(System.Runtime.Intrinsics.Vector128`1[Int32]):bool stp fp, lr, [sp,#-16]! mov fp, sp- cmeq v16.4s, v0.4s, #0- uminv b16, v16.16b+ umaxv b16, v0.16b umov w0, v16.b[0] cmp w0, #0- cset x0, ne+ cset x0, eq ldp fp, lr, [sp],#16 ret lr-; Total bytes of code 36+; Total bytes of code 32 This is needed for faster IndexOf from#63285. Also,#65288 relies on it. Perf_Regex_Industry_RustLang_SherlockBenchmark:
|
PTAL @echesakovMSFT@TIHan |
cc@vargaz@fanyang-mono the Test I added in this PR crashes on
|
This will fix it: |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looks good! Only a few comments.
fanyang-mono commentedFeb 22, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I had created an issue to clean up the code for type checks of vector elements. Haven't get to it yet. (#65318) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Left some comments
Uh oh!
There was an error while loading.Please reload this page.
src/coreclr/jit/lowerarmarch.cpp Outdated
if (!varTypeIsFloating(simdBaseType) && (op != nullptr)) | ||
{ | ||
GenTree* cmp = | ||
comp->gtNewSimdHWIntrinsicNode(simdType, op, NI_AdvSimd_Arm64_MaxAcross, CORINFO_TYPE_UBYTE, simdSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
According toArm® Cortex®-A76 Software Optimization Guide:UMAXV, 16B
has Exec latency6
and Execution throughput1/2
whileUMAXV, 4H/4S
has Exec latency3
and Execution throughput1
Do we wantCORINFO_TYPE_USHORT
/CORINFO_TYPE_UINT
as a base type instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
kunalspathak commentedMar 3, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Improvements indotnet/perf-autofiling-issues#3833 anddotnet/perf-autofiling-issues#3829 |
wow, it's more than I expected |
Uh oh!
There was an error while loading.Please reload this page.
Closes#63829
Codegen diff:
This is needed for faster IndexOf from#63285. Also,#65288 relies on it.
Perf_Regex_Industry_RustLang_SherlockBenchmark:
Diffs