Optimizing the following String APIs
- String.Split --> Optimizing MakeSeparatorListVectorized
- String.Replace(char oldChar, char newChar) --> Optimizing for a single iteration. Although we have measured perf on this API, it just represents optimizing a single iteration and not all.
PERF on ICX
Below tables show a result comparison output by ResultComparer in the performance repo.
Base = No changes
Diff = With the PR changes
1. Split
A Vector128 code path already exists for this API. We are adding a similar Vector256 and Vector512 code path.
base =Diff Vector256 code path vs diff =Diff Vector512 code path
| Slower | diff/base | Base Median (ns) | Diff Median (ns) | Modality |
|---|
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.09 | 25083.88 | 27315.47 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.07 | 216.23 | 231.68 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.06 | 527.11 | 561.25 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.06 | 21021.31 | 22223.51 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.04 | 292.20 | 304.67 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.04 | 5308.70 | 5499.70 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.02 | 663.91 | 678.31 | |
| Faster | base/diff | Base Median (ns) | Diff Median (ns) | Modality |
|---|
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.43 | 100.27 | 69.98 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.37 | 99.78 | 72.70 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.06 | 47539.44 | 44884.05 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.03 | 2787.30 | 2701.91 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.03 | 38497.32 | 37374.38 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.02 | 1089.85 | 1073.03 | |
base =Base Vector128 code path vs diff =Diff Vector256 code path
| Faster | base/diff | Base Median (ns) | Diff Median (ns) | Modality |
|---|
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.77 | 176.76 | 99.78 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.76 | 176.33 | 100.27 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.47 | 56625.27 | 38497.32 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.43 | 67789.59 | 47539.44 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.27 | 1151.90 | 908.21 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.20 | 6348.06 | 5308.70 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.19 | 5407.90 | 4549.70 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.19 | 1293.97 | 1089.85 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.18 | 2753.15 | 2337.50 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.16 | 24393.60 | 21021.31 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.16 | 609.65 | 527.11 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.16 | 28984.70 | 25083.88 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.15 | 3213.50 | 2787.30 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.11 | 239.76 | 216.23 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.10 | 320.35 | 292.20 | |
| System.Tests.Perf_String.Split(s: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab | 1.09 | 721.48 | 663.91 | |
This is one of the issues where Avx512 is not that performance because of the issue with using multipleVector512.Equals(). I ran a couple of iterations using StopWatch method and below are the results.

As you can see, for each iteration,Vector512 is almost the same asVector256. Let me know if there are any suggestions for further optimizingVector512 code path. We have to decide whether this can ne merged or not since there are alreadyVector128 code path for both the APIs. Also, the Vector256 and Vector512 code path provide a significant speed up over Vector128 code path.
2. Replace_Char
A Vector128 code path already exists for this API. We are just adding a single iteration of Vector512 or Vector256.
base =Diff, Vector256 code path vs diff =Diff Vector512 code path
| Slower | diff/base | Base Median (ns) | Diff Median (ns) | Modality |
|---|
| System.Tests.Perf_String.Replace_Char(text: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgbl | 1.12 | 24.74 | 27.63 | |
| System.Tests.Perf_String.Replace_Char(text: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgbl | 1.06 | 4003.77 | 4246.22 | |
| System.Tests.Perf_String.Replace_Char(text: "This is a very nice sentence", oldC | 1.03 | 18.05 | 18.56 | |
| Faster | base/diff | Base Median (ns) | Diff Median (ns) | Modality |
|---|
| System.Tests.Perf_String.Replace_Char(text: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgbl | 1.25 | 173.57 | 139.21 | |
| System.Tests.Perf_String.Replace_Char(text: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgbl | 1.15 | 1486.21 | 1292.79 | |
| System.Tests.Perf_String.Replace_Char(text: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgbl | 1.15 | 722.97 | 630.87 | |
| System.Tests.Perf_String.Replace_Char(text: "This is a very nice sentence", oldC | 1.09 | 3.90 | 3.57 | |
| System.Tests.Perf_String.Replace_Char(text: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgbl | 1.09 | 2216.98 | 2029.28 | |
base =Base Vector128 code path vs diff =Diff Vector256 code path
| Slower | diff/base | Base Median (ns) | Diff Median (ns) | Modality |
|---|
| System.Tests.Perf_String.Replace_Char(text: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgbl | 1.12 | 24.74 | 27.63 | |
| System.Tests.Perf_String.Replace_Char(text: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgbl | 1.06 | 4003.77 | 4246.22 | |
| System.Tests.Perf_String.Replace_Char(text: "This is a very nice sentence", oldC | 1.03 | 18.05 | 18.56 | |
| Faster | base/diff | Base Median (ns) | Diff Median (ns) | Modality |
|---|
| System.Tests.Perf_String.Replace_Char(text: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgbl | 1.25 | 173.57 | 139.21 | |
| System.Tests.Perf_String.Replace_Char(text: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgbl | 1.15 | 1486.21 | 1292.79 | |
| System.Tests.Perf_String.Replace_Char(text: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgbl | 1.15 | 722.97 | 630.87 | |
| System.Tests.Perf_String.Replace_Char(text: "This is a very nice sentence", oldC | 1.09 | 3.90 | 3.57 | |
| System.Tests.Perf_String.Replace_Char(text: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgbl | 1.09 | 2216.98 | 2029.28 | |
Uh oh!
There was an error while loading.Please reload this page.
Optimizing the following String APIs
PERF on ICX
Below tables show a result comparison output by ResultComparer in the performance repo.
Base = No changes
Diff = With the PR changes
1. Split
A Vector128 code path already exists for this API. We are adding a similar Vector256 and Vector512 code path.
base =
Diff Vector256 code pathvs diff =Diff Vector512 code pathbase =
Base Vector128 code pathvs diff =Diff Vector256 code pathThis is one of the issues where Avx512 is not that performance because of the issue with using multiple
Vector512.Equals(). I ran a couple of iterations using StopWatch method and below are the results.As you can see, for each iteration,
Vector512is almost the same asVector256. Let me know if there are any suggestions for further optimizingVector512code path. We have to decide whether this can ne merged or not since there are alreadyVector128code path for both the APIs. Also, the Vector256 and Vector512 code path provide a significant speed up over Vector128 code path.2. Replace_Char
A Vector128 code path already exists for this API. We are just adding a single iteration of Vector512 or Vector256.
base =
Diff, Vector256 code pathvs diff =Diff Vector512 code pathbase =
Base Vector128 code pathvs diff =Diff Vector256 code path