Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Use ldp/stp with SIMD registers on Arm64#84135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
kunalspathak merged 2 commits intodotnet:mainfromSwapnilGaikwad:github-float-ldpStp
Mar 31, 2023

Conversation

@SwapnilGaikwad
Copy link
Contributor

@SwapnilGaikwadSwapnilGaikwad commentedMar 30, 2023
edited by kunalspathak
Loading

Use pairwise load/stores for

  1. the instructions using SIMD registers
ldr     q1, [x0, #0x20]ldr     q2, [x0, #0x30]     =>  ldp     q1, q2, [x0, #0x20]

(Fixes#83773)

  1. the instructions using base and base plus immediate offset format
ldr     w1, [x20]ldr     w2, [x20, #0x04]    =>  ldp     w1, w2, [x20]ldr     q1, [x0]ldr     q2, [x0, #0x10]     =>  ldp     q1, q2, [x0]

(Fixes#35133) Contributes to#35133. We still need tofix#81278 to cover all the cases.

neon-sunset reacted with rocket emoji
Use pairwise load/stores for1. the instructions using SIMD registers```ldr     q1, [x0, #0x20]ldr     q2, [x0, #0x30]     =>  ldp     q1, q2, [x0, #0x20]```2. the instructions using base and base plus immediate offset format```ldr     w1, [x20]ldr     w2, [x20, #0x04]    =>  ldp     w1, w2, [x20]ldr     q1, [x0]ldr     q2, [x0, #0x10]     =>  ldp     q1, q2, [x0]```
@ghostghost added area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contributionIndicates that the PR has been added by a community member labelsMar 30, 2023
@ghost
Copy link

Tagging subscribers to this area:@JulieLeeMSFT,@jakobbotsch,@kunalspathak
See info inarea-owners.md if you want to be subscribed.

Issue Details

Use pairwise load/stores for

  1. the instructions using SIMD registers
ldr     q1, [x0, #0x20]ldr     q2, [x0, #0x30]     =>  ldp     q1, q2, [x0, #0x20]

(Fixes#83773)

  1. the instructions using base and base plus immediate offset format
ldr     w1, [x20]ldr     w2, [x20, #0x04]    =>  ldp     w1, w2, [x20]ldr     q1, [x0]ldr     q2, [x0, #0x10]     =>  ldp     q1, q2, [x0]

(Fixes#35133)

Author:SwapnilGaikwad
Assignees:-
Labels:

area-CodeGen-coreclr

Milestone:-

@SwapnilGaikwad
Copy link
ContributorAuthor

Not sure about the potential GC holes or how to confirm that yet.

The following spmi asmdiffs summary shows multiple matches as expected.

Diffs are based on1,469,735 contexts (402,470 MinOpts,1,067,265 FullOpts).

MISSED contexts:3 (0.00%)

Overall (-769,712 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch19,307,636-4,632
libraries_tests.pmi.linux.arm64.checked.mch160,979,832-46,408
libraries.crossgen2.linux.arm64.checked.mch42,310,160-6,536
libraries.pmi.linux.arm64.checked.mch65,290,988-24,572
coreclr_tests.run.linux.arm64.checked.mch535,732,276-687,564
MinOpts (+0 bytes)
CollectionBase size (bytes)Diff size (bytes)
libraries_tests.pmi.linux.arm64.checked.mch5,439,544+0
coreclr_tests.run.linux.arm64.checked.mch363,182,616+0
FullOpts (-769,712 bytes)
CollectionBase size (bytes)Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch18,185,072-4,632
libraries_tests.pmi.linux.arm64.checked.mch155,540,288-46,408
libraries.crossgen2.linux.arm64.checked.mch42,308,524-6,536
libraries.pmi.linux.arm64.checked.mch63,778,580-24,572
coreclr_tests.run.linux.arm64.checked.mch172,549,660-687,564
Example diffs
benchmarks.run.linux.arm64.checked.mch
-4 (-16.67%) : 2709.dasm - System.ValueTuple`2[long,System.DateTime]:.ctor(long,System.DateTime):this
@@ -20,15 +20,14 @@ G_M30325_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=8 bbWeight=1 PerfScore 1.50 G_M30325_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref             ; byrRegs +[x0]-            str     x1, [x0]-            str     x2, [x0, #0x08]-;; size=8 bbWeight=1 PerfScore 2.00+            stp     x1, x2, [x0]+;; size=4 bbWeight=1 PerfScore 1.00 G_M30325_IG03:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr ;; size=8 bbWeight=1 PerfScore 2.00-; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=b3c1898a) for method System.ValueTuple`2[long,System.DateTime]:.ctor(long,System.DateTime):this+; Total bytes of code 20, prolog size 8, PerfScore 6.50, instruction count 5, allocated bytes for code 20 (MethodHash=b3c1898a) for method System.ValueTuple`2[long,System.DateTime]:.ctor(long,System.DateTime):this ; ============================================================  Unwind Info:@@ -39,7 +38,7 @@ Unwind Info:   E bit             : 0   X bit             : 0   Vers              : 0-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)   ---- Epilog scopes ----   ---- Scope 0   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-16.67%) : 925.dasm - System.Reflection.Emit.OpCode:.ctor(int,int):this
@@ -19,15 +19,14 @@ G_M55742_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=8 bbWeight=1 PerfScore 1.50 G_M55742_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref             ; byrRegs +[x0]-            str     w1, [x0]-            str     w2, [x0, #0x04]-;; size=8 bbWeight=1 PerfScore 2.00+            stp     w1, w2, [x0]+;; size=4 bbWeight=1 PerfScore 1.00 G_M55742_IG03:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr ;; size=8 bbWeight=1 PerfScore 2.00-; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=9e0a2641) for method System.Reflection.Emit.OpCode:.ctor(int,int):this+; Total bytes of code 20, prolog size 8, PerfScore 6.50, instruction count 5, allocated bytes for code 20 (MethodHash=9e0a2641) for method System.Reflection.Emit.OpCode:.ctor(int,int):this ; ============================================================  Unwind Info:@@ -38,7 +37,7 @@ Unwind Info:   E bit             : 0   X bit             : 0   Vers              : 0-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)   ---- Epilog scopes ----   ---- Scope 0   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-8 (-16.67%) : 25440.dasm - System.Numerics.Tests.Perf_Matrix4x4:CreateRotationXWithCenterBenchmark():System.Numerics.Matrix4x4:this
@@ -38,11 +38,9 @@ G_M63428_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0100 {x8}, byre             ldr     q17, [@RWD16]             ldr     q18, [@RWD32]             ldr     q19, [@RWD48]-            str     q19, [x8]-            str     q16, [x8, #0x10]-            str     q17, [x8, #0x20]-            str     q18, [x8, #0x30]-;; size=32 bbWeight=1 PerfScore 12.00+            stp     q19, q16, [x8]+            stp     q17, q18, [x8, #0x20]+;; size=24 bbWeight=1 PerfScore 10.00 G_M63428_IG03:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr@@ -53,7 +51,7 @@ RWD32  dq0000000000000000h, 3F80000000000000h RWD48  dq000000003F800000h, 0000000000000000h-; Total bytes of code 48, prolog size 8, PerfScore 20.30, instruction count 12, allocated bytes for code 48 (MethodHash=0a46083b) for method System.Numerics.Tests.Perf_Matrix4x4:CreateRotationXWithCenterBenchmark():System.Numerics.Matrix4x4:this+; Total bytes of code 40, prolog size 8, PerfScore 17.50, instruction count 10, allocated bytes for code 40 (MethodHash=0a46083b) for method System.Numerics.Tests.Perf_Matrix4x4:CreateRotationXWithCenterBenchmark():System.Numerics.Matrix4x4:this ; ============================================================  Unwind Info:@@ -64,7 +62,7 @@ Unwind Info:   E bit             : 0   X bit             : 0   Vers              : 0-  Function Length   : 12 (0x0000c) Actual length = 48 (0x000030)+  Function Length   : 10 (0x0000a) Actual length = 40 (0x000028)   ---- Epilog scopes ----   ---- Scope 0   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 21886.dasm - System.Xml.XmlBinaryWriter:SetOutput(System.IO.Stream,System.Xml.IXmlDictionary,System.Xml.XmlBinaryWriterSession,bool):this
@@ -86,8 +86,8 @@ G_M34423_IG04:        ; bbWeight=1, gcrefRegs=780000 {x19 x20 x21 x22}, byrefReg             strb    wzr, [x0, #0x26]             add     x14, x0, #64             ; byrRegs +[x14]-            str     xzr, [x14]-            stp     xzr, xzr, [x14, #0x08]+            stp     xzr, xzr, [x14]+            str     xzr, [x14, #0x10]             movn    w14, #0             ; byrRegs -[x14]             str     w14, [x0, #0x38]
+0 (0.00%) : 20223.dasm - System.Text.Json.JsonDocument:TryGetValue(int,byref):bool:this
@@ -150,8 +150,8 @@ G_M19143_IG05:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=300000 {x20             blr     x1             cmp     w22, #12             blt     G_M19143_IG15-            ldr     w22, [x21]-            ldp     w23, w1, [x21, #0x04]+            ldp     w22, w23, [x21]+            ldr     w1, [x21, #0x08]             lsr     w1, w1, #28             uxtb    w1, w1             cmp     w1, #8
+0 (0.00%) : 32319.dasm - Microsoft.CodeAnalysis.CSharp.MethodCompiler:CompileSynthesizedMethods(Microsoft.CodeAnalysis.CSharp.TypeCompilationState):this
@@ -188,10 +188,10 @@ G_M26982_IG06:        ; bbWeight=4, gcVars=00000000000000400000000401000010 {V00             add     x14, x14, #16             add     x14, x15, x14             ; byrRegs +[x14]-            ldr     x20, [x14]-            ; gcrRegs +[x20]-            ldp     x21, x22, [x14, #0x08]-            ; gcrRegs +[x21-x22]+            ldp     x20, x21, [x14]+            ; gcrRegs +[x20-x21]+            ldr     x22, [x14, #0x10]+            ; gcrRegs +[x22]             add     x14, x4, #40             mov     x15, x22             bl      CORINFO_HELP_ASSIGN_REF
libraries_tests.pmi.linux.arm64.checked.mch
-8 (-20.00%) : 128351.dasm - Microsoft.CodeAnalysis.Checksum+HashData:FromPointer(ulong):Microsoft.CodeAnalysis.Checksum+HashData
@@ -26,19 +26,17 @@ G_M44009_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=8 bbWeight=1 PerfScore 1.50 G_M44009_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0100 {x8}, byref             ; byrRegs +[x8]-            ldr     x1, [x0]-            ldr     x2, [x0, #0x08]+            ldp     x1, x2, [x0]             ldr     w0, [x0, #0x10]-            str     x1, [x8]-            str     x2, [x8, #0x08]+            stp     x1, x2, [x8]             str     w0, [x8, #0x10]-;; size=24 bbWeight=1 PerfScore 12.00+;; size=16 bbWeight=1 PerfScore 9.00 G_M44009_IG03:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr ;; size=8 bbWeight=1 PerfScore 2.00-; Total bytes of code 40, prolog size 8, PerfScore 19.50, instruction count 10, allocated bytes for code 40 (MethodHash=a99b5416) for method Microsoft.CodeAnalysis.Checksum+HashData:FromPointer(ulong):Microsoft.CodeAnalysis.Checksum+HashData+; Total bytes of code 32, prolog size 8, PerfScore 15.70, instruction count 8, allocated bytes for code 32 (MethodHash=a99b5416) for method Microsoft.CodeAnalysis.Checksum+HashData:FromPointer(ulong):Microsoft.CodeAnalysis.Checksum+HashData ; ============================================================  Unwind Info:@@ -49,7 +47,7 @@ Unwind Info:   E bit             : 0   X bit             : 0   Vers              : 0-  Function Length   : 10 (0x0000a) Actual length = 40 (0x000028)+  Function Length   : 8 (0x00008) Actual length = 32 (0x000020)   ---- Epilog scopes ----   ---- Scope 0   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-16.67%) : 3265.dasm - System.Text.Json.Serialization.Tests.Point_2D_Struct_WithMultipleAttributes_OneNonPublic:.ctor(int):this
@@ -19,15 +19,14 @@ G_M61621_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=8 bbWeight=1 PerfScore 1.50 G_M61621_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref             ; byrRegs +[x0]-            str     w1, [x0]-            str     wzr, [x0, #0x04]-;; size=8 bbWeight=1 PerfScore 2.00+            stp     w1, wzr, [x0]+;; size=4 bbWeight=1 PerfScore 1.00 G_M61621_IG03:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr ;; size=8 bbWeight=1 PerfScore 2.00-; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=8aeb0f4a) for method System.Text.Json.Serialization.Tests.Point_2D_Struct_WithMultipleAttributes_OneNonPublic:.ctor(int):this+; Total bytes of code 20, prolog size 8, PerfScore 6.50, instruction count 5, allocated bytes for code 20 (MethodHash=8aeb0f4a) for method System.Text.Json.Serialization.Tests.Point_2D_Struct_WithMultipleAttributes_OneNonPublic:.ctor(int):this ; ============================================================  Unwind Info:@@ -38,7 +37,7 @@ Unwind Info:   E bit             : 0   X bit             : 0   Vers              : 0-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)   ---- Epilog scopes ----   ---- Scope 0   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-16.67%) : 156801.dasm - SerializationTestTypes.KeyValue`2[long,System.Nullable`1[int]]:.ctor(long,System.Nullable`1[int]):this
@@ -19,15 +19,14 @@ G_M24332_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=8 bbWeight=1 PerfScore 1.50 G_M24332_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref             ; byrRegs +[x0]-            str     x1, [x0]-            str     x2, [x0, #0x08]-;; size=8 bbWeight=1 PerfScore 2.00+            stp     x1, x2, [x0]+;; size=4 bbWeight=1 PerfScore 1.00 G_M24332_IG03:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr ;; size=8 bbWeight=1 PerfScore 2.00-; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=e0c2a0f3) for method SerializationTestTypes.KeyValue`2[long,System.Nullable`1[int]]:.ctor(long,System.Nullable`1[int]):this+; Total bytes of code 20, prolog size 8, PerfScore 6.50, instruction count 5, allocated bytes for code 20 (MethodHash=e0c2a0f3) for method SerializationTestTypes.KeyValue`2[long,System.Nullable`1[int]]:.ctor(long,System.Nullable`1[int]):this ; ============================================================  Unwind Info:@@ -38,7 +37,7 @@ Unwind Info:   E bit             : 0   X bit             : 0   Vers              : 0-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)   ---- Epilog scopes ----   ---- Scope 0   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 166976.dasm - Microsoft.CodeQuality.Analyzers.ApiDesignGuidelines.IdentifiersShouldHaveCorrectSuffixAnalyzer:.ctor():this
@@ -40,10 +40,10 @@ G_M60409_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by             movz    x0, #0xD1FFAB1E      // data for <unknown class>:<unknown field>             movk    x0, #0xD1FFAB1E LSL #16             movk    x0, #0xD1FFAB1E LSL #32-            ldr     x20, [x0]-            ; gcrRegs +[x20]-            ldp     x21, x22, [x0, #0x08]-            ; gcrRegs +[x21-x22]+            ldp     x20, x21, [x0]+            ; gcrRegs +[x20-x21]+            ldr     x22, [x0, #0x10]+            ; gcrRegs +[x22]             movz    x0, #0xD1FFAB1E             movk    x0, #0xD1FFAB1E LSL #16             movk    x0, #0xD1FFAB1E LSL #32
+0 (0.00%) : 210048.dasm - System.Net.Http.Tests.StreamToStreamCopyTest+d__5:MoveNext():this
@@ -605,8 +605,8 @@ G_M59861_IG08:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=80000 {x19}, by G_M59861_IG09:        ; bbWeight=1.00, gcrefRegs=0000 {}, byrefRegs=80000 {x19}, byref, isz             movn    w14, #1             str     w14, [x19, #0x18]-            str     xzr, [x19]-            stp     xzr, xzr, [x19, #0x08]+            stp     xzr, xzr, [x19]+            str     xzr, [x19, #0x10]             add     x14, x19, #32             ; byrRegs +[x14]             ldr     x15, [x14]@@ -802,8 +802,8 @@ G_M59861_IG21:        ; bbWeight=0, gcVars=0000000000000001 {V00}, gcrefRegs=000             ldr     x19, [fp, #0x10]// [V00 this]             ; byrRegs +[x19]             str     w0, [x19, #0x18]-            str     xzr, [x19]-            stp     xzr, xzr, [x19, #0x08]+            stp     xzr, xzr, [x19]+            str     xzr, [x19, #0x10]             add     x0, x19, #32             ; byrRegs +[x0]             movz    x2, #0xD1FFAB1E      // code for <unknown method>
+0 (0.00%) : 239744.dasm - System.Security.Cryptography.Pkcs.Tests.CryptographicAttributeObjectCollectionTests:CopyExceptions()
@@ -142,10 +142,10 @@ G_M45722_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,             movz    x0, #0xD1FFAB1E      // data for <unknown class>:<unknown field>             movk    x0, #0xD1FFAB1E LSL #16             movk    x0, #0xD1FFAB1E LSL #32-            ldr     x20, [x0]-            ; gcrRegs +[x20]-            ldp     x21, x22, [x0, #0x08]-            ; gcrRegs +[x21-x22]+            ldp     x20, x21, [x0]+            ; gcrRegs +[x20-x21]+            ldr     x22, [x0, #0x10]+            ; gcrRegs +[x22]             movz    x0, #0xD1FFAB1E             movk    x0, #0xD1FFAB1E LSL #16             movk    x0, #0xD1FFAB1E LSL #32
libraries.crossgen2.linux.arm64.checked.mch
-8 (-25.00%) : 34883.dasm - System.Numerics.Quaternion:.ctor(float,float,float,float):this
@@ -22,17 +22,15 @@ G_M64168_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=8 bbWeight=1 PerfScore 1.50 G_M64168_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref             ; byrRegs +[x0]-            str     s0, [x0]-            str     s1, [x0, #0x04]-            str     s2, [x0, #0x08]-            str     s3, [x0, #0x0C]-;; size=16 bbWeight=1 PerfScore 4.00+            stp     s0, s1, [x0]+            stp     s2, s3, [x0, #0x08]+;; size=8 bbWeight=1 PerfScore 2.00 G_M64168_IG03:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr ;; size=8 bbWeight=1 PerfScore 2.00-; Total bytes of code 32, prolog size 8, PerfScore 10.70, instruction count 8, allocated bytes for code 32 (MethodHash=c0090557) for method System.Numerics.Quaternion:.ctor(float,float,float,float):this+; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=c0090557) for method System.Numerics.Quaternion:.ctor(float,float,float,float):this ; ============================================================  Unwind Info:@@ -43,7 +41,7 @@ Unwind Info:   E bit             : 0   X bit             : 0   Vers              : 0-  Function Length   : 8 (0x00008) Actual length = 32 (0x000020)+  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)   ---- Epilog scopes ----   ---- Scope 0   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-8 (-25.00%) : 169952.dasm - System.Drawing.RectangleF:.ctor(float,float,float,float):this
@@ -22,17 +22,15 @@ G_M45207_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=8 bbWeight=1 PerfScore 1.50 G_M45207_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref             ; byrRegs +[x0]-            str     s0, [x0]-            str     s1, [x0, #0x04]-            str     s2, [x0, #0x08]-            str     s3, [x0, #0x0C]-;; size=16 bbWeight=1 PerfScore 4.00+            stp     s0, s1, [x0]+            stp     s2, s3, [x0, #0x08]+;; size=8 bbWeight=1 PerfScore 2.00 G_M45207_IG03:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr ;; size=8 bbWeight=1 PerfScore 2.00-; Total bytes of code 32, prolog size 8, PerfScore 10.70, instruction count 8, allocated bytes for code 32 (MethodHash=1f014f68) for method System.Drawing.RectangleF:.ctor(float,float,float,float):this+; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=1f014f68) for method System.Drawing.RectangleF:.ctor(float,float,float,float):this ; ============================================================  Unwind Info:@@ -43,7 +41,7 @@ Unwind Info:   E bit             : 0   X bit             : 0   Vers              : 0-  Function Length   : 8 (0x00008) Actual length = 32 (0x000020)+  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)   ---- Epilog scopes ----   ---- Scope 0   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-8 (-25.00%) : 169953.dasm - System.Drawing.RectangleF:.ctor(System.Drawing.PointF,System.Drawing.SizeF):this
@@ -25,17 +25,15 @@ G_M36094_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=8 bbWeight=1 PerfScore 1.50 G_M36094_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref             ; byrRegs +[x0]-            str     s0, [x0]-            str     s1, [x0, #0x04]-            str     s2, [x0, #0x08]-            str     s3, [x0, #0x0C]-;; size=16 bbWeight=1 PerfScore 4.00+            stp     s0, s1, [x0]+            stp     s2, s3, [x0, #0x08]+;; size=8 bbWeight=1 PerfScore 2.00 G_M36094_IG03:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr ;; size=8 bbWeight=1 PerfScore 2.00-; Total bytes of code 32, prolog size 8, PerfScore 10.70, instruction count 8, allocated bytes for code 32 (MethodHash=550a7301) for method System.Drawing.RectangleF:.ctor(System.Drawing.PointF,System.Drawing.SizeF):this+; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=550a7301) for method System.Drawing.RectangleF:.ctor(System.Drawing.PointF,System.Drawing.SizeF):this ; ============================================================  Unwind Info:@@ -46,7 +44,7 @@ Unwind Info:   E bit             : 0   X bit             : 0   Vers              : 0-  Function Length   : 8 (0x00008) Actual length = 32 (0x000020)+  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)   ---- Epilog scopes ----   ---- Scope 0   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 65343.dasm - Microsoft.CodeAnalysis.CSharp.ForEachStatementInfo:GetHashCode():int:this
@@ -58,12 +58,12 @@ G_M41916_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,             ; byrRegs +[x19] ;; size=28 bbWeight=1 PerfScore 6.00 G_M41916_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=80000 {x19}, byref-            ldr     x20, [x19]-            ; gcrRegs +[x20]-            ldp     x21, x22, [x19, #0x08]-            ; gcrRegs +[x21-x22]-            ldp     x23, x24, [x19, #0x18]-            ; gcrRegs +[x23-x24]+            ldp     x20, x21, [x19]+            ; gcrRegs +[x20-x21]+            ldp     x22, x23, [x19, #0x10]+            ; gcrRegs +[x22-x23]+            ldr     x24, [x19, #0x20]+            ; gcrRegs +[x24] ;; size=12 bbWeight=1 PerfScore 11.00 G_M41916_IG03:        ; bbWeight=1, nogc, extend             add     x0, x19, #24
+0 (0.00%) : 136256.dasm - System.Text.RegularExpressions.RegexParser:.ctor(System.String,int,System.Globalization.CultureInfo,System.Collections.Hashtable,int,System.Collections.Hashtable,System.Span`1[int]):this
@@ -96,9 +96,9 @@ G_M19169_IG04:        ; bbWeight=1, extend             blr     x12             ldr     x12, [x13], #0x08             str     x12, [x14], #0x08-            str     xzr, [x0]-            stp     xzr, xzr, [x0, #0x08]-            stp     xzr, xzr, [x0, #0x18]+            stp     xzr, xzr, [x0]+            stp     xzr, xzr, [x0, #0x10]+            str     xzr, [x0, #0x20]             str     wzr, [x0, #0x58]             stp     wzr, wzr, [x0, #0x60]             str     wzr, [x0, #0x68]
+0 (0.00%) : 173248.dasm - ILCompiler.Diagnostics.PerfMapWriter+PerfmapTokensForTarget:Equals(System.Object):bool:this
@@ -65,9 +65,9 @@ G_M34908_IG04:        ; bbWeight=0.25, gcrefRegs=80000 {x19}, byrefRegs=100000 { G_M34908_IG05:        ; bbWeight=0.50, gcrefRegs=80000 {x19}, byrefRegs=100000 {x20}, byref, isz             add     x11, x19, #8             ; byrRegs +[x11]-            ldr     w19, [x11]+            ldp     w19, w21, [x11]             ; gcrRegs -[x19]-            ldp     w21, w22, [x11, #0x04]+            ldr     w22, [x11, #0x08]             adrp    x11, [HIGH RELOC #0xD1FFAB1E]      // function address             ; byrRegs -[x11]             add     x11, x11, [LOW RELOC #0xD1FFAB1E]
libraries.pmi.linux.arm64.checked.mch
-8 (-25.00%) : 250607.dasm - System.Drawing.RectangleF:.ctor(float,float,float,float):this
@@ -21,17 +21,15 @@ G_M45207_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=8 bbWeight=1 PerfScore 1.50 G_M45207_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref             ; byrRegs +[x0]-            str     s0, [x0]-            str     s1, [x0, #0x04]-            str     s2, [x0, #0x08]-            str     s3, [x0, #0x0C]-;; size=16 bbWeight=1 PerfScore 4.00+            stp     s0, s1, [x0]+            stp     s2, s3, [x0, #0x08]+;; size=8 bbWeight=1 PerfScore 2.00 G_M45207_IG03:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr ;; size=8 bbWeight=1 PerfScore 2.00-; Total bytes of code 32, prolog size 8, PerfScore 10.70, instruction count 8, allocated bytes for code 32 (MethodHash=1f014f68) for method System.Drawing.RectangleF:.ctor(float,float,float,float):this+; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=1f014f68) for method System.Drawing.RectangleF:.ctor(float,float,float,float):this ; ============================================================  Unwind Info:@@ -42,7 +40,7 @@ Unwind Info:   E bit             : 0   X bit             : 0   Vers              : 0-  Function Length   : 8 (0x00008) Actual length = 32 (0x000020)+  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)   ---- Epilog scopes ----   ---- Scope 0   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-8 (-25.00%) : 250608.dasm - System.Drawing.RectangleF:.ctor(System.Drawing.PointF,System.Drawing.SizeF):this
@@ -24,17 +24,15 @@ G_M36094_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=8 bbWeight=1 PerfScore 1.50 G_M36094_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref             ; byrRegs +[x0]-            str     s0, [x0]-            str     s1, [x0, #0x04]-            str     s2, [x0, #0x08]-            str     s3, [x0, #0x0C]-;; size=16 bbWeight=1 PerfScore 4.00+            stp     s0, s1, [x0]+            stp     s2, s3, [x0, #0x08]+;; size=8 bbWeight=1 PerfScore 2.00 G_M36094_IG03:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr ;; size=8 bbWeight=1 PerfScore 2.00-; Total bytes of code 32, prolog size 8, PerfScore 10.70, instruction count 8, allocated bytes for code 32 (MethodHash=550a7301) for method System.Drawing.RectangleF:.ctor(System.Drawing.PointF,System.Drawing.SizeF):this+; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=550a7301) for method System.Drawing.RectangleF:.ctor(System.Drawing.PointF,System.Drawing.SizeF):this ; ============================================================  Unwind Info:@@ -45,7 +43,7 @@ Unwind Info:   E bit             : 0   X bit             : 0   Vers              : 0-  Function Length   : 8 (0x00008) Actual length = 32 (0x000020)+  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)   ---- Epilog scopes ----   ---- Scope 0   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-8 (-20.00%) : 246149.dasm - System.IO.Hashing.XxHash128:WriteBigEndian128(byref,System.Span`1[ubyte])
@@ -29,20 +29,18 @@ G_M11325_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=8 bbWeight=1 PerfScore 1.50 G_M11325_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0003 {x0 x1}, byref             ; byrRegs +[x0-x1]-            ldr     x2, [x0]-            ldr     x0, [x0, #0x08]+            ldp     x2, x0, [x0]             ; byrRegs -[x0]             rev     x2, x2             rev     x0, x0-            str     x0, [x1]-            str     x2, [x1, #0x08]-;; size=24 bbWeight=1 PerfScore 9.00+            stp     x0, x2, [x1]+;; size=16 bbWeight=1 PerfScore 6.00 G_M11325_IG03:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr ;; size=8 bbWeight=1 PerfScore 2.00-; Total bytes of code 40, prolog size 8, PerfScore 16.50, instruction count 10, allocated bytes for code 40 (MethodHash=ae9bd3c2) for method System.IO.Hashing.XxHash128:WriteBigEndian128(byref,System.Span`1[ubyte])+; Total bytes of code 32, prolog size 8, PerfScore 12.70, instruction count 8, allocated bytes for code 32 (MethodHash=ae9bd3c2) for method System.IO.Hashing.XxHash128:WriteBigEndian128(byref,System.Span`1[ubyte]) ; ============================================================  Unwind Info:@@ -53,7 +51,7 @@ Unwind Info:   E bit             : 0   X bit             : 0   Vers              : 0-  Function Length   : 10 (0x0000a) Actual length = 40 (0x000028)+  Function Length   : 8 (0x00008) Actual length = 32 (0x000020)   ---- Epilog scopes ----   ---- Scope 0   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 219072.dasm - Microsoft.Cci.FullMetadataWriter:CreateReferenceVisitor():Microsoft.Cci.ReferenceIndexer:this
@@ -89,10 +89,10 @@ G_M64343_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by             ; byrRegs -[x14]             add     x0, x19, #0xD1FFAB1E             ; byrRegs +[x0]-            ldr     x21, [x0]-            ; gcrRegs +[x21]-            ldp     x23, x24, [x0, #0x08]-            ; gcrRegs +[x23-x24]+            ldp     x21, x23, [x0]+            ; gcrRegs +[x21 x23]+            ldr     x24, [x0, #0x10]+            ; gcrRegs +[x24]             movz    x25, #0xD1FFAB1E             movk    x25, #0xD1FFAB1E LSL #16             movk    x25, #0xD1FFAB1E LSL #32
+0 (0.00%) : 241792.dasm - System.Formats.Cbor.CborWriter+KeyValuePairEncodingRange:.ctor(int,int,int):this
@@ -20,8 +20,8 @@ G_M54047_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=8 bbWeight=1 PerfScore 1.50 G_M54047_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref             ; byrRegs +[x0]-            str     w1, [x0]-            stp     w2, w3, [x0, #0x04]+            stp     w1, w2, [x0]+            str     w3, [x0, #0x08] ;; size=8 bbWeight=1 PerfScore 2.00 G_M54047_IG03:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10
+0 (0.00%) : 256640.dasm - System.IO.Pipelines.PipeAwaitable:ExtractCompletion(byref):this
@@ -39,10 +39,10 @@ G_M12398_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=24 bbWeight=1 PerfScore 5.50 G_M12398_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0003 {x0 x1}, byref, isz             ; byrRegs +[x0-x1]-            ldr     x14, [x0]-            ; gcrRegs +[x14]-            ldp     x13, x12, [x0, #0x08]-            ; gcrRegs +[x12-x13]+            ldp     x14, x13, [x0]+            ; gcrRegs +[x13-x14]+            ldr     x12, [x0, #0x10]+            ; gcrRegs +[x12]             cbnz    x12, G_M12398_IG04 ;; size=12 bbWeight=1 PerfScore 8.00 G_M12398_IG03:        ; bbWeight=0.50, gcrefRegs=7000 {x12 x13 x14}, byrefRegs=0003 {x0 x1}, byref@@ -71,8 +71,8 @@ G_M12398_IG07:        ; bbWeight=0.50, gcrefRegs=F000 {x12 x13 x14 x15}, byrefRe ;; size=4 bbWeight=0.50 PerfScore 1.50 G_M12398_IG08:        ; bbWeight=1, gcrefRegs=1E000 {x13 x14 x15 xip0}, byrefRegs=0003 {x0 x1}, byref, isz             ; gcrRegs -[x12]-            str     xzr, [x0]-            stp     xzr, xzr, [x0, #0x08]+            stp     xzr, xzr, [x0]+            str     xzr, [x0, #0x10]             cbnz    x14, G_M12398_IG10 ;; size=12 bbWeight=1 PerfScore 3.00 G_M12398_IG09:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0002 {x1}, byref
coreclr_tests.run.linux.arm64.checked.mch
-20 (-33.33%) : 243626.dasm - testout1+VT_0_4_4:.ctor(int):this
@@ -19,23 +19,18 @@ G_M41861_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, G_M41861_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref             ; byrRegs +[x0]             fmov    d16, #1.0000-            str     d16, [x0]-            str     d16, [x0, #0x08]-            str     d16, [x0, #0x10]-            str     d16, [x0, #0x18]-            str     d16, [x0, #0x20]-            str     d16, [x0, #0x28]-            str     d16, [x0, #0x30]-            str     d16, [x0, #0x38]-            str     d16, [x0, #0x40]-            str     d16, [x0, #0x48]-;; size=44 bbWeight=1 PerfScore 10.50+            stp     d16, d16, [x0]+            stp     d16, d16, [x0, #0x10]+            stp     d16, d16, [x0, #0x20]+            stp     d16, d16, [x0, #0x30]+            stp     d16, d16, [x0, #0x40]+;; size=24 bbWeight=1 PerfScore 5.50 G_M41861_IG03:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr ;; size=8 bbWeight=1 PerfScore 2.00-; Total bytes of code 60, prolog size 8, PerfScore 20.00, instruction count 15, allocated bytes for code 60 (MethodHash=e0945c7a) for method testout1+VT_0_4_4:.ctor(int):this+; Total bytes of code 40, prolog size 8, PerfScore 13.00, instruction count 10, allocated bytes for code 40 (MethodHash=e0945c7a) for method testout1+VT_0_4_4:.ctor(int):this ; ============================================================  Unwind Info:@@ -46,7 +41,7 @@ Unwind Info:   E bit             : 0   X bit             : 0   Vers              : 0-  Function Length   : 15 (0x0000f) Actual length = 60 (0x00003c)+  Function Length   : 10 (0x0000a) Actual length = 40 (0x000028)   ---- Epilog scopes ----   ---- Scope 0   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-16 (-30.77%) : 243598.dasm - testout1+VT_0_7_8:.ctor(int):this
@@ -19,21 +19,17 @@ G_M55818_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, G_M55818_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref             ; byrRegs +[x0]             fmov    d16, #1.0000-            str     d16, [x0]-            str     d16, [x0, #0x08]-            str     d16, [x0, #0x10]-            str     d16, [x0, #0x18]-            str     d16, [x0, #0x20]-            str     d16, [x0, #0x28]-            str     d16, [x0, #0x30]-            str     d16, [x0, #0x38]-;; size=36 bbWeight=1 PerfScore 8.50+            stp     d16, d16, [x0]+            stp     d16, d16, [x0, #0x10]+            stp     d16, d16, [x0, #0x20]+            stp     d16, d16, [x0, #0x30]+;; size=20 bbWeight=1 PerfScore 4.50 G_M55818_IG03:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr ;; size=8 bbWeight=1 PerfScore 2.00-; Total bytes of code 52, prolog size 8, PerfScore 17.20, instruction count 13, allocated bytes for code 52 (MethodHash=a1af25f5) for method testout1+VT_0_7_8:.ctor(int):this+; Total bytes of code 36, prolog size 8, PerfScore 11.60, instruction count 9, allocated bytes for code 36 (MethodHash=a1af25f5) for method testout1+VT_0_7_8:.ctor(int):this ; ============================================================  Unwind Info:@@ -44,7 +40,7 @@ Unwind Info:   E bit             : 0   X bit             : 0   Vers              : 0-  Function Length   : 13 (0x0000d) Actual length = 52 (0x000034)+  Function Length   : 9 (0x00009) Actual length = 36 (0x000024)   ---- Epilog scopes ----   ---- Scope 0   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-16 (-28.57%) : 243629.dasm - testout1+VT_0_4_1:.ctor(int):this
@@ -19,22 +19,18 @@ G_M56448_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, G_M56448_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref             ; byrRegs +[x0]             fmov    d16, #1.0000-            str     d16, [x0]-            str     d16, [x0, #0x08]-            str     d16, [x0, #0x10]-            str     d16, [x0, #0x18]-            str     d16, [x0, #0x20]-            str     d16, [x0, #0x28]-            str     d16, [x0, #0x30]-            str     d16, [x0, #0x38]+            stp     d16, d16, [x0]+            stp     d16, d16, [x0, #0x10]+            stp     d16, d16, [x0, #0x20]+            stp     d16, d16, [x0, #0x30]             str     d16, [x0, #0x40]-;; size=40 bbWeight=1 PerfScore 9.50+;; size=24 bbWeight=1 PerfScore 5.50 G_M56448_IG03:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr ;; size=8 bbWeight=1 PerfScore 2.00-; Total bytes of code 56, prolog size 8, PerfScore 18.60, instruction count 14, allocated bytes for code 56 (MethodHash=c6a1237f) for method testout1+VT_0_4_1:.ctor(int):this+; Total bytes of code 40, prolog size 8, PerfScore 13.00, instruction count 10, allocated bytes for code 40 (MethodHash=c6a1237f) for method testout1+VT_0_4_1:.ctor(int):this ; ============================================================  Unwind Info:@@ -45,7 +41,7 @@ Unwind Info:   E bit             : 0   X bit             : 0   Vers              : 0-  Function Length   : 14 (0x0000e) Actual length = 56 (0x000038)+  Function Length   : 10 (0x0000a) Actual length = 40 (0x000028)   ---- Epilog scopes ----   ---- Scope 0   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 388608.dasm - JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.SimpleTernaryOpTest__FusedMultiplyAddBySelectedScalar_Vector128_Single_Vector128_Single_3:.ctor():this
@@ -350,9 +350,10 @@ G_M34739_IG03:        ; bbWeight=4, isz, extend             cmp     w0, #3             bls     G_M34739_IG05             str     s0, [x21, #0x1C]-            ldr     x21, [x20]-            ldp     x22, x20, [x20, #0x08]-            ; gcrRegs +[x20 x22]+            ldp     x21, x22, [x20]+            ; gcrRegs +[x22]+            ldr     x20, [x20, #0x10]+            ; gcrRegs +[x20]             movz    x0, #0xD1FFAB1E             movk    x0, #0xD1FFAB1E LSL #16             movk    x0, #0xD1FFAB1E LSL #32
+0 (0.00%) : 454528.dasm - JIT.HardwareIntrinsics.Arm._AdvSimd.SimpleTernaryOpTest__MultiplyBySelectedScalarWideningUpperAndSubtract_Vector128_UInt32_Vector64_UInt32_1:.ctor():this
@@ -268,9 +268,10 @@ G_M34358_IG02:        ; bbWeight=4, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by             cmp     w1, #1             bls     G_M34358_IG05             str     w0, [x21, #0x14]-            ldr     x21, [x20]-            ldp     x22, x20, [x20, #0x08]-            ; gcrRegs +[x20 x22]+            ldp     x21, x22, [x20]+            ; gcrRegs +[x22]+            ldr     x20, [x20, #0x10]+            ; gcrRegs +[x20] ;; size=764 bbWeight=4 PerfScore 1086.00 G_M34358_IG03:        ; bbWeight=4, extend             movz    x0, #0xD1FFAB1E
+0 (0.00%) : 458240.dasm - JIT.HardwareIntrinsics.Arm._AdvSimd.SimpleTernaryOpTest__MultiplySubtractByScalar_Vector64_Int16:.ctor():this
@@ -350,9 +350,10 @@ G_M3199_IG03:        ; bbWeight=4, isz, extend             cmp     w1, #3             bls     G_M3199_IG05             strh    w0, [x21, #0x16]-            ldr     x21, [x20]-            ldp     x22, x20, [x20, #0x08]-            ; gcrRegs +[x20 x22]+            ldp     x21, x22, [x20]+            ; gcrRegs +[x22]+            ldr     x20, [x20, #0x10]+            ; gcrRegs +[x20]             movz    x0, #0xD1FFAB1E             movk    x0, #0xD1FFAB1E LSL #16             movk    x0, #0xD1FFAB1E LSL #32
Details

Improvements/regressions per collection

CollectionContexts with diffsImprovementsRegressionsSame sizeImprovements (bytes)Regressions (bytes)
benchmarks.run.linux.arm64.checked.mch593555038-4,632+0
libraries_tests.pmi.linux.arm64.checked.mch5,0644,8050259-46,408+0
libraries.crossgen2.linux.arm64.checked.mch1,2371,152085-6,536+0
libraries.pmi.linux.arm64.checked.mch3,9633,7790184-24,572+0
coreclr_tests.run.linux.arm64.checked.mch45,74445,0600684-687,564+0
56,60155,35101,250-769,712+0

Context information

CollectionDiffed contextsMinOptsFullOptsMissed, baseMissed, diff
benchmarks.run.linux.arm64.checked.mch42,1086,91235,1960 (0.00%)0 (0.00%)
libraries_tests.pmi.linux.arm64.checked.mch367,5507,902359,6480 (0.00%)0 (0.00%)
libraries.crossgen2.linux.arm64.checked.mch174,77515174,7600 (0.00%)0 (0.00%)
libraries.pmi.linux.arm64.checked.mch257,0634,760252,3030 (0.00%)0 (0.00%)
coreclr_tests.run.linux.arm64.checked.mch628,239382,881245,3583 (0.00%)3 (0.00%)
1,469,735402,4701,067,2653 (0.00%)3 (0.00%)

jit-analyze output

@EgorBo
Copy link
Member

@SwapnilGaikwad do you want us to kick various jitstress/gcstress jobs?

SwapnilGaikwad reacted with thumbs up emoji

@kunalspathakkunalspathak self-requested a reviewMarch 30, 2023 19:13
@kunalspathak
Copy link
Contributor

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kunalspathak
Copy link
Contributor

/azp run runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@kunalspathakkunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice diffs.

SwapnilGaikwad reacted with hooray emoji
}

if (lastInsFmt != fmt)
if (lastInsFmt != fmt && !(lastInsFmt == IF_LS_2B && fmt == IF_LS_2A) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Wondering why we have to add additional checks forIF_LS_2B andIF_LS_2A? Are they specifically because we are adding vector register support? Why were they not needed previously?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

So the issue is here that we useIF_LS_2A for base (no offset) andIF_LS_2B for base + offset? Presumably imm/prevImm are correctly zero for 2A?

Would it be easier to read inverted? i.e.,

const bool compatibleFmt = (lastInsFmt == fmt) || (lastInsFmt == IF_LS_2B && fmt == IF_LS_2A) || (lastInsFmt == IF_LS_2A && fmt == IF_LS_2B);if (!compatibleFmt) {... return eRO_none; }

?

SwapnilGaikwad reacted with thumbs up emoji
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Are there any general register (non-Vector) diffs from just this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

So the issue is here that we useIF_LS_2A for base (no offset) andIF_LS_2B for base + offset?

Sure, but we don't do it for GPR?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sorry that the title of the PR doesn't specifies the full functionality. Explicit format check is allowing us to catch the consecutive ldr/str where one instruction uses the offset and one without. This is applicable for both general purpose and SIMD/Vector registers.

Are there any general register (non-Vector) diffs from just this change?

Yup, there are multiple such changes. e.g.,

-4 (-16.67%) : 2709.dasm - System.ValueTuple`2[long,System.DateTime]:.ctor(long,System.DateTime):this
@@ -20,15 +20,14 @@ G_M30325_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ;; size=8 bbWeight=1 PerfScore 1.50 G_M30325_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref             ; byrRegs +[x0]-            str     x1, [x0]-            str     x2, [x0, #0x08]-;; size=8 bbWeight=1 PerfScore 2.00+            stp     x1, x2, [x0]+;; size=4 bbWeight=1 PerfScore 1.00 G_M30325_IG03:        ; bbWeight=1, epilog, nogc, extend             ldp     fp, lr, [sp], #0x10             ret     lr ;; size=8 bbWeight=1 PerfScore 2.00-; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=b3c1898a) for method System.ValueTuple`2[long,System.DateTime]:.ctor(long,System.DateTime):this+; Total bytes of code 20, prolog size 8, PerfScore 6.50, instruction count 5, allocated bytes for code 20 (MethodHash=b3c1898a) for method System.ValueTuple`2[long,System.DateTime]:.ctor(long,System.DateTime):this ; ============================================================

Sure, but we don't do it for GPR?

Yup, I think it was missed previously.

Matching the consecutive ldr/str with mixed formatting is letting us further optimise what the previous optimisation would have allowed us. e.g.,

Previously, the following sequence

str     s0, [x0]str     s1, [x0, #0x04]str     s2, [x0, #0x08]str     s3, [x0, #0x0C]

may have been optimised to

str     s0, [x0]stp     s1, s2, [x0, #0x04]str     s3, [x0, #0x0C]

but now would be optimised to

stp     s0, s1, [x0]stp     s2, s3, [x0, #0x08]

Copy link
ContributorAuthor

@SwapnilGaikwadSwapnilGaikwadMar 31, 2023
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Would it be easier to read inverted? i.e.,

Sure, this is more readable. Done 👍

@ghostghost added the needs-author-actionAn issue or pull request that requires more info or actions from the author. labelMar 30, 2023
@kunalspathak
Copy link
Contributor

all the gcstress failures are existing ones.

SwapnilGaikwad reacted with thumbs up emoji

@ghostghost removed the needs-author-actionAn issue or pull request that requires more info or actions from the author. labelMar 31, 2023
Copy link
Contributor

@kunalspathakkunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM. Thanks for your contributions!

SwapnilGaikwad reacted with hooray emoji
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.

Reviewers

2 more reviewers

@BruceForstallBruceForstallBruceForstall approved these changes

@kunalspathakkunalspathakkunalspathak approved these changes

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIcommunity-contributionIndicates that the PR has been added by a community member

Projects

None yet

Milestone

No milestone

4 participants

@SwapnilGaikwad@EgorBo@kunalspathak@BruceForstall

[8]ページ先頭

©2009-2025 Movatter.jp