Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Use multi-reg load/store for EncodeToUtf8#95513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
kunalspathak merged 5 commits intodotnet:mainfromSwapnilGaikwad:github-encode-utf8
Jan 10, 2024

Conversation

@SwapnilGaikwad
Copy link
Contributor

This implements the encode to UTF8 algorithmhere.

@ghostghost added needs-area-labelAn area label is needed to ensure this gets routed to the appropriate area owners community-contributionIndicates that the PR has been added by a community member labelsDec 1, 2023
@SwapnilGaikwad
Copy link
ContributorAuthor

Hi@kunalspathak , this is an initial version of encode to UTF8 using multi-register load/stores. This currently fails some asserts in LSRA phase while doing the crossgen for SPC. It fails while emittingStoreVector128x4AndZip. Not sure but seems it's not able to find four consecutive registers to emit ST4.

@danmoseley
Copy link
Member

Does this need an entry in their part notices file?

Change-Id: Ie56b1786cdf8ac8d2067c0ba1fdfd3924dd9ca13
@SwapnilGaikwad
Copy link
ContributorAuthor

Does this need an entry in their part notices file?

Sorry@danmoseley , I didn't understand which part notices file you're referring to.

@SwapnilGaikwad
Copy link
ContributorAuthor

Initial benchmarking on N1 system show some good performance results.

| Method                          | Toolchain                                                                    | NumberOfBytes | Mean     | Error   | StdDev  | Median   | Min      | Max      | Ratio | MannWhitney(2%) ||-------------------------------- |----------------------------------------------------------------------------- |-------------- |---------:|--------:|--------:|---------:|---------:|---------:|------:|---------------- || Base64Encode                    | /main/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun    | 1000          | 363.7 ns | 0.27 ns | 0.24 ns | 363.8 ns | 363.1 ns | 363.9 ns |  1.00 | Base            || Base64Encode                    | /runtime/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun | 1000          | 185.7 ns | 0.26 ns | 0.22 ns | 185.6 ns | 185.5 ns | 186.2 ns |  0.51 | Faster          ||                                 |                                                                              |               |          |         |         |          |          |          |       |                 || Base64EncodeDestinationTooSmall | /main/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun    | 1000          | 369.3 ns | 0.09 ns | 0.09 ns | 369.4 ns | 369.2 ns | 369.4 ns |  1.00 | Base            || Base64EncodeDestinationTooSmall | /runtime/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun | 1000          | 196.1 ns | 0.05 ns | 0.04 ns | 196.1 ns | 196.1 ns | 196.2 ns |  0.53 | Faster          ||                                 |                                                                              |               |          |         |         |          |          |          |       |                 || ConvertToBase64CharArray        | /main/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun    | 1000          | 560.9 ns | 0.29 ns | 0.23 ns | 561.0 ns | 560.4 ns | 561.1 ns |  1.00 | Base            || ConvertToBase64CharArray        | /runtime/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun | 1000          | 400.0 ns | 0.20 ns | 0.19 ns | 399.9 ns | 399.7 ns | 400.3 ns |  0.71 | Faster          |
a74nh, kunalspathak, and GerardSmit reacted with rocket emoji

}

end=srcMax-16;
if((Ssse3.IsSupported||AdvSimd.Arm64.IsSupported)&&BitConverter.IsLittleEndian&&(end>=src))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm not sure if we should remove the AdvSimd check here.

With this PR, it will use the vector128 version if the buffer length is<48 && >16.

That's probably the best option for speed, but results in a bigger library.

@kunalspathak
Copy link
Contributor

@SwapnilGaikwad - do you mind sharing the disassembly?

SwapnilGaikwad reacted with thumbs up emoji

@SwapnilGaikwad
Copy link
ContributorAuthor

SwapnilGaikwad commentedDec 13, 2023
edited
Loading

@SwapnilGaikwad - do you mind sharing the disassembly?

A separately compiledAdvSimdEncode emits following assembly.

Full assembly
; Assembly listing for method JIT.HardwareIntrinsics.Arm._AdvSimd.Program:AdvSimdEncode(byref,byref,ulong,int,int,ulong,ulong,ulong) (FullOpts); Emitting BLENDED_CODE for generic ARM64 - Unix; FullOpts code; optimized code; fp based frame; fully interruptible; No PGO data; 4 inlinees with PGO data; 13 single block inlinees; 4 inlinees without PGO data; Final local variable assignments;;  V00 arg0         [V00,T03] (  4,  4   )   byref  ->   x0         single-def;  V01 arg1         [V01,T04] (  4,  4   )   byref  ->   x1         single-def;  V02 arg2         [V02,T02] (  3, 10   )    long  ->   x2         single-def;* V03 arg3         [V03    ] (  0,  0   )     int  ->  zero-ref    single-def;* V04 arg4         [V04    ] (  0,  0   )     int  ->  zero-ref    single-def;* V05 arg5         [V05    ] (  0,  0   )    long  ->  zero-ref    single-def;* V06 arg6         [V06    ] (  0,  0   )    long  ->  zero-ref    single-def;* V07 arg7         [V07    ] (  0,  0   )    long  ->  zero-ref    single-def;  V08 loc0         [V08,T00] (  6, 34   )    long  ->   x3;  V09 loc1         [V09,T01] (  5, 26   )    long  ->   x4;* V10 loc2         [V10    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>;* V11 loc3         [V11    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>;* V12 loc4         [V12    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>;* V13 loc5         [V13    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>;* V14 loc6         [V14    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>;* V15 loc7         [V15    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>;* V16 loc8         [V16    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>;  V17 loc9         [V17,T06] (  5, 33   )  simd16  ->  d16         HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>;  V18 loc10        [V18,T07] (  5, 33   )  simd16  ->  d17         HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>;  V19 loc11        [V19,T08] (  5, 33   )  simd16  ->  d18         HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>;  V20 loc12        [V20,T09] (  5, 33   )  simd16  ->  d19         HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>;# V21 OutArgs      [V21    ] (  1,  1   )  struct ( 0) [sp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace";* V22 tmp1         [V22    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.ReadOnlySpan`1[ubyte]>;* V23 tmp2         [V23    ] (  0,  0   )  simd16  ->  zero-ref    "spilled call-like call argument";* V24 tmp3         [V24    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.ReadOnlySpan`1[ubyte]>;* V25 tmp4         [V25    ] (  0,  0   )  simd16  ->  zero-ref    "spilled call-like call argument";* V26 tmp5         [V26    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.ReadOnlySpan`1[ubyte]>;* V27 tmp6         [V27    ] (  0,  0   )  simd16  ->  zero-ref    "spilled call-like call argument";* V28 tmp7         [V28    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.ReadOnlySpan`1[ubyte]>;* V29 tmp8         [V29    ] (  0,  0   )  simd16  ->  zero-ref    "spilled call-like call argument";* V30 tmp9         [V30    ] (  0,  0   )  struct (48) zero-ref    HFA(simd16)  multireg-ret "Return value temp for multireg return" <System.ValueTuple`3[System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]]>;* V31 tmp10        [V31    ] (  0,  0   )  struct (64) zero-ref    HFA(simd16)  ld-addr-op "NewObj constructor temp" <System.ValueTuple`4[System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]]>;* V32 tmp11        [V32    ] (  0,  0   )  struct (64) zero-ref    HFA(simd16)  ld-addr-op "NewObj constructor temp" <System.ValueTuple`4[System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]]>;* V33 tmp12        [V33    ] (  0,  0   )  struct (64) zero-ref    HFA(simd16)  ld-addr-op "NewObj constructor temp" <System.ValueTuple`4[System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]]>;* V34 tmp13        [V34    ] (  0,  0   )  struct (64) zero-ref    HFA(simd16)  ld-addr-op "NewObj constructor temp" <System.ValueTuple`4[System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]]>;* V35 tmp14        [V35    ] (  0,  0   )  struct (64) zero-ref    HFA(simd16)  ld-addr-op "NewObj constructor temp" <System.ValueTuple`4[System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]]>;* V36 tmp15        [V36    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.ReadOnlySpan`1[ubyte]>;* V37 tmp16        [V37    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.ReadOnlySpan`1[ubyte]>;* V38 tmp17        [V38    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.ReadOnlySpan`1[ubyte]>;* V39 tmp18        [V39    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.ReadOnlySpan`1[ubyte]>;* V40 tmp19        [V40    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.ReadOnlySpan`1[ubyte]>;* V41 tmp20        [V41    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.ReadOnlySpan`1[ubyte]>;* V42 tmp21        [V42    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.ReadOnlySpan`1[ubyte]>;* V43 tmp22        [V43    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.ReadOnlySpan`1[ubyte]>;* V44 tmp23        [V44    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V22._reference (fldOffset=0x0)" P-INDEP;* V45 tmp24        [V45    ] (  0,  0   )     int  ->  zero-ref    single-def "field V22._length (fldOffset=0x8)" P-INDEP;* V46 tmp25        [V46    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V24._reference (fldOffset=0x0)" P-INDEP;* V47 tmp26        [V47    ] (  0,  0   )     int  ->  zero-ref    single-def "field V24._length (fldOffset=0x8)" P-INDEP;* V48 tmp27        [V48    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V26._reference (fldOffset=0x0)" P-INDEP;* V49 tmp28        [V49    ] (  0,  0   )     int  ->  zero-ref    single-def "field V26._length (fldOffset=0x8)" P-INDEP;* V50 tmp29        [V50    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V28._reference (fldOffset=0x0)" P-INDEP;* V51 tmp30        [V51    ] (  0,  0   )     int  ->  zero-ref    single-def "field V28._length (fldOffset=0x8)" P-INDEP;  V52 tmp31        [V52,T11] (  3, 24   )  simd16  ->  d21         HFA(simd16)  "field V30.Item1 (fldOffset=0x0)" P-INDEP;  V53 tmp32        [V53,T12] (  3, 24   )  simd16  ->  d22         HFA(simd16)  "field V30.Item2 (fldOffset=0x10)" P-INDEP;  V54 tmp33        [V54,T13] (  3, 24   )  simd16  ->  d23         HFA(simd16)  "field V30.Item3 (fldOffset=0x20)" P-INDEP;* V55 tmp34        [V55    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V31.Item1 (fldOffset=0x0)" P-INDEP;* V56 tmp35        [V56    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V31.Item2 (fldOffset=0x10)" P-INDEP;* V57 tmp36        [V57    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V31.Item3 (fldOffset=0x20)" P-INDEP;* V58 tmp37        [V58    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V31.Item4 (fldOffset=0x30)" P-INDEP;* V59 tmp38        [V59    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V32.Item1 (fldOffset=0x0)" P-INDEP;* V60 tmp39        [V60    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V32.Item2 (fldOffset=0x10)" P-INDEP;* V61 tmp40        [V61    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V32.Item3 (fldOffset=0x20)" P-INDEP;* V62 tmp41        [V62    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V32.Item4 (fldOffset=0x30)" P-INDEP;* V63 tmp42        [V63    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V33.Item1 (fldOffset=0x0)" P-INDEP;* V64 tmp43        [V64    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V33.Item2 (fldOffset=0x10)" P-INDEP;* V65 tmp44        [V65    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V33.Item3 (fldOffset=0x20)" P-INDEP;* V66 tmp45        [V66    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V33.Item4 (fldOffset=0x30)" P-INDEP;* V67 tmp46        [V67    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V34.Item1 (fldOffset=0x0)" P-INDEP;* V68 tmp47        [V68    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V34.Item2 (fldOffset=0x10)" P-INDEP;* V69 tmp48        [V69    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V34.Item3 (fldOffset=0x20)" P-INDEP;* V70 tmp49        [V70    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V34.Item4 (fldOffset=0x30)" P-INDEP;* V71 tmp50        [V71    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V35.Item1 (fldOffset=0x0)" P-INDEP;* V72 tmp51        [V72    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V35.Item2 (fldOffset=0x10)" P-INDEP;* V73 tmp52        [V73    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V35.Item3 (fldOffset=0x20)" P-INDEP;* V74 tmp53        [V74    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V35.Item4 (fldOffset=0x30)" P-INDEP;* V75 tmp54        [V75    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V36._reference (fldOffset=0x0)" P-INDEP;* V76 tmp55        [V76    ] (  0,  0   )     int  ->  zero-ref    single-def "field V36._length (fldOffset=0x8)" P-INDEP;* V77 tmp56        [V77    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V37._reference (fldOffset=0x0)" P-INDEP;* V78 tmp57        [V78    ] (  0,  0   )     int  ->  zero-ref    "field V37._length (fldOffset=0x8)" P-INDEP;* V79 tmp58        [V79    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V38._reference (fldOffset=0x0)" P-INDEP;* V80 tmp59        [V80    ] (  0,  0   )     int  ->  zero-ref    single-def "field V38._length (fldOffset=0x8)" P-INDEP;* V81 tmp60        [V81    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V39._reference (fldOffset=0x0)" P-INDEP;* V82 tmp61        [V82    ] (  0,  0   )     int  ->  zero-ref    "field V39._length (fldOffset=0x8)" P-INDEP;* V83 tmp62        [V83    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V40._reference (fldOffset=0x0)" P-INDEP;* V84 tmp63        [V84    ] (  0,  0   )     int  ->  zero-ref    single-def "field V40._length (fldOffset=0x8)" P-INDEP;* V85 tmp64        [V85    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V41._reference (fldOffset=0x0)" P-INDEP;* V86 tmp65        [V86    ] (  0,  0   )     int  ->  zero-ref    "field V41._length (fldOffset=0x8)" P-INDEP;* V87 tmp66        [V87    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V42._reference (fldOffset=0x0)" P-INDEP;* V88 tmp67        [V88    ] (  0,  0   )     int  ->  zero-ref    single-def "field V42._length (fldOffset=0x8)" P-INDEP;* V89 tmp68        [V89    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V43._reference (fldOffset=0x0)" P-INDEP;* V90 tmp69        [V90    ] (  0,  0   )     int  ->  zero-ref    "field V43._length (fldOffset=0x8)" P-INDEP;* V91 cse0         [V91,T05] (  0,  0   )    long  ->  zero-ref    "CSE - aggressive";  V92 cse1         [V92,T10] (  4, 25   )  simd16  ->  d20         hoist "CSE - aggressive";; Lcl frame size = 0G_M25339_IG01:  ;; offset=0x0000            stp     fp, lr, [sp, #-0x10]!            mov     fp, sp;; size=8 bbWeight=1 PerfScore 1.50G_M25339_IG02:  ;; offset=0x0008            ldr     x3, [x0]            ldr     x4, [x1]            ldr     q16, [@RWD00]            ldr     q17, [@RWD16]            ldr     q18, [@RWD32]            ldr     q19, [@RWD48]            movi    v20.16b, #0x3F            b       G_M25339_IG03            align   [0 bytes for IG03]            align   [0 bytes]            align   [0 bytes]            align   [0 bytes];; size=32 bbWeight=1 PerfScore 15.50G_M25339_IG03:  ;; offset=0x0028            ld3     {v21.16b, v22.16b, v23.16b}, [x3]            ushr    v24.16b, v21.16b, #2            tbl     v24.16b, {v16.16b, v17.16b, v18.16b, v19.16b}, v24.16b            ushr    v25.16b, v22.16b, #4            sli     v25.16b, v21.16b, #4            and     v21.16b, v25.16b, v20.16b            tbl     v21.16b, {v16.16b, v17.16b, v18.16b, v19.16b}, v21.16b            ushr    v25.16b, v23.16b, #6            sli     v25.16b, v22.16b, #2            and     v22.16b, v25.16b, v20.16b            tbl     v22.16b, {v16.16b, v17.16b, v18.16b, v19.16b}, v22.16b            and     v23.16b, v23.16b, v20.16b            tbl     v23.16b, {v16.16b, v17.16b, v18.16b, v19.16b}, v23.16b            mov     v25.16b, v21.16b            mov     v26.16b, v22.16b            mov     v27.16b, v23.16b            st4     {v24.16b, v25.16b, v26.16b, v27.16b}, [x4]            add     x3, x3, #48            add     x4, x4, #64            cmp     x3, x2            bls     G_M25339_IG03;; size=84 bbWeight=8 PerfScore 284.00G_M25339_IG04:  ;; offset=0x007C            str     x3, [x0]            str     x4, [x1];; size=8 bbWeight=1 PerfScore 2.00G_M25339_IG05:  ;; offset=0x0084            ldp     fp, lr, [sp], #0x10            ret     lr;; size=8 bbWeight=1 PerfScore 2.00RWD00  dq4847464544434241h, 504F4E4D4C4B4A49hRWD16  dq5857565554535251h, 6665646362615A59hRWD32  dq6E6D6C6B6A696867h, 767574737271706FhRWD48  dq333231307A797877h, 2F2B393837363534h; Total bytes of code 140, prolog size 8, PerfScore 319.00, instruction count 39, allocated bytes for code 140 (MethodHash=7c379d04) for method JIT.HardwareIntrinsics.Arm._AdvSimd.Program:AdvSimdEncode(byref,byref,ulong,int,int,ulong,ulong,ulong) (FullOpts); ============================================================

@kunalspathak
Copy link
Contributor

Thanks@SwapnilGaikwad for sharing the disassembly. I wanted to see the code quality when consecutive registers are involved.

SwapnilGaikwad reacted with thumbs up emoji

Vector128<byte>res4;
Vector128<byte>tbl_enc1=Vector128.Create("ABCDEFGHIJKLMNOP"u8).AsByte();
Vector128<byte>tbl_enc2=Vector128.Create("QRSTUVWXYZabcdef"u8).AsByte();
Vector128<byte>tbl_enc3=Vector128.Create("ghijklmnopqrstuv"u8).AsByte();
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We can load this encoding table fromEncodingMap fromLine #774. This could help to reduce the code size but loading from memory/cache would be slightly slower than the regs. Benchmarks didn't show signficant difference.from-mem-artifacts load data fromEncodingMap.

| Method                          | Toolchain                                                                             | NumberOfBytes | Mean     | Error   | StdDev  | Median   | Min      | Max      | Ratio | MannWhitney(2%) ||-------------------------------- |-------------------------------------------------------------------------------------- |-------------- |---------:|--------:|--------:|---------:|---------:|---------:|------:|---------------- || Base64Encode                    | /main/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun             | 1000          | 363.8 ns | 0.14 ns | 0.12 ns | 363.8 ns | 363.5 ns | 363.9 ns |  1.00 | Base            || Base64Encode                    | /runtime/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun          | 1000          | 196.8 ns | 0.02 ns | 0.02 ns | 196.8 ns | 196.8 ns | 196.9 ns |  0.54 | Faster          || Base64Encode                    | /runtime/from-mem-artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun | 1000          | 195.3 ns | 0.08 ns | 0.07 ns | 195.3 ns | 195.2 ns | 195.5 ns |  0.54 | Faster          ||                                 |                                                                                       |               |          |         |         |          |          |          |       |                 || Base64EncodeDestinationTooSmall | /main/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun             | 1000          | 368.2 ns | 0.30 ns | 0.28 ns | 368.2 ns | 367.8 ns | 368.7 ns |  1.00 | Base            || Base64EncodeDestinationTooSmall | /runtime/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun          | 1000          | 191.5 ns | 0.04 ns | 0.04 ns | 191.5 ns | 191.5 ns | 191.6 ns |  0.52 | Faster          || Base64EncodeDestinationTooSmall | /runtime/from-mem-artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun | 1000          | 194.0 ns | 0.22 ns | 0.20 ns | 193.9 ns | 193.7 ns | 194.4 ns |  0.53 | Faster          |

Would you suggest to read encoding table fromEncodingMap?

@kunalspathak
Copy link
Contributor

/azp run runtime-coreclr libraries-jitstress

@kunalspathak
Copy link
Contributor

/azp run runtime-coreclr libraries-jitstress2-jitstressregs

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kunalspathak
Copy link
Contributor

/azp run runtime-coreclr libraries-jitstressregs

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kunalspathak
Copy link
Contributor

Does this need an entry in their part notices file?

Sorry@danmoseley , I didn't understand which part notices file you're referring to.

I guess he is talking about the reference made inhttps://github.com/dotnet/runtime/pull/95513/files#diff-b3b9edcf4c0d62e78954d826c44005cffb306b6ccf155f1a9228669229b7e765R496, but not sure where exactly to add this.@danmoseley - can you please confirm?

@teo-tsirpanisteo-tsirpanis added area-System.Buffers and removed needs-area-labelAn area label is needed to ensure this gets routed to the appropriate area owners labelsDec 31, 2023
@ghost
Copy link

Tagging subscribers to this area: @dotnet/area-system-buffers
See info inarea-owners.md if you want to be subscribed.

Issue Details

This implements the encode to UTF8 algorithmhere.

Author:SwapnilGaikwad
Assignees:-
Labels:

area-System.Buffers,community-contribution

Milestone:-

@kunalspathak
Copy link
Contributor

ping@danmoseley

@danmoseley
Copy link
Member

Oops, yes, that was what caught my eye. Generally if we use significant ideas/code from elsewhere we add a credit in THIRD-PARTY-NOTICES.TXT at the root. Up to you.

@a74nh
Copy link
Contributor

Oops, yes, that was what caught my eye. Generally if we use significant ideas/code from elsewhere we add a credit in THIRD-PARTY-NOTICES.TXT at the root. Up to you.

Looks like there is already an entry in there, but it's not immediately obvious.

License notice for vectorized base64 encoding / decoding

Contains an extra copy of the license fromhttps://github.com/aklomp/base64/blob/master/LICENSE

danmoseley reacted with thumbs up emoji

Copy link
Contributor

@kunalspathakkunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM

[CompExactlyDependsOn(typeof(AdvSimd.Arm64))]
privatestaticunsafevoidAdvSimdEncode(refbyte*srcBytes,refbyte*destBytes,byte*srcEnd,intsourceLength,intdestLength,byte*srcStart,byte*destStart)
{
// C# implementatino of https://github.com/aklomp/base64/blob/3a5add8652076612a8407627a42c768736a4263f/lib/arch/neon64/enc_loop.c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit

Suggested change
// C#implementatino of https://github.com/aklomp/base64/blob/3a5add8652076612a8407627a42c768736a4263f/lib/arch/neon64/enc_loop.c
// C#implementation of https://github.com/aklomp/base64/blob/3a5add8652076612a8407627a42c768736a4263f/lib/arch/neon64/enc_loop.c

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

you can do this in a follow-up PR

SwapnilGaikwad reacted with thumbs up emoji
@kunalspathakkunalspathak merged commitfdb03ca intodotnet:mainJan 10, 2024
@SwapnilGaikwadSwapnilGaikwad deleted the github-encode-utf8 branchJanuary 10, 2024 12:16
kunalspathak added a commit to kunalspathak/runtime that referenced this pull requestJan 13, 2024
kunalspathak added a commit that referenced this pull requestJan 16, 2024
* Revert "[libs] Skip AdvSimdEncode on Mono (#96829)"This reverts commit1a76e37.* Revert "Use multi-reg load/store for EncodeToUtf8 (#95513)"This reverts commitfdb03ca.* Wrap load/store vector APIs in '#if false'* Disable load/store vector tests* remove the trailing space
tmds pushed a commit to tmds/runtime that referenced this pull requestJan 23, 2024
* Use multi-reg load/store for EncodeToUtf8* Use the fixed version of multi-reg store* Update variable naming
tmds pushed a commit to tmds/runtime that referenced this pull requestJan 23, 2024
…#96944)* Revert "[libs] Skip AdvSimdEncode on Mono (dotnet#96829)"This reverts commit1a76e37.* Revert "Use multi-reg load/store for EncodeToUtf8 (dotnet#95513)"This reverts commitfdb03ca.* Wrap load/store vector APIs in '#if false'* Disable load/store vector tests* remove the trailing space
@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsFeb 10, 2024
@richlander
Copy link
Member

I assume this is the API being discussed. If so, it would be good to put in the initial comment so that it is easy for folks to fine.

https://learn.microsoft.com/dotnet/api/system.buffers.text.base64.encodetoutf8

Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.

Reviewers

@tannergoodingtannergoodingAwaiting requested review from tannergooding

3 more reviewers

@a74nha74nha74nh left review comments

@kunalspathakkunalspathakkunalspathak approved these changes

@MichalPetrykaMichalPetrykaMichalPetryka left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

area-System.Bufferscommunity-contributionIndicates that the PR has been added by a community member

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

7 participants

@SwapnilGaikwad@danmoseley@kunalspathak@a74nh@richlander@MichalPetryka@teo-tsirpanis

[8]ページ先頭

©2009-2025 Movatter.jp