Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Faster IndexOf for arm64#67811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
EgorBo merged 2 commits intodotnet:mainfromEgorBo:arm-spanhelpers-fast
Apr 12, 2022
Merged

Conversation

EgorBo
Copy link
Member

@EgorBoEgorBo commentedApr 9, 2022
edited
Loading

Use a slightly faster "no matches" check inIndexOf andIndexOfAny.

Benchmark:

publicclassBenchmarks{publicIEnumerable<object[]>TestData(){// Small inputs which are handled via SIMDyieldreturnnewobject[]{"12345678",'1'};yieldreturnnewobject[]{"123456789",'9'};yieldreturnnewobject[]{"1234567812345678",'1'};yieldreturnnewobject[]{"1234567812345678",'0'};yieldreturnnewobject[]{"Console WriteLine Hello World",'o'};yieldreturnnewobject[]{"Console WriteLine Hello World",'d'};// Large inputsyieldreturnnewobject[]{newstring('x',64),'y'};yieldreturnnewobject[]{newstring('x',200),'y'};yieldreturnnewobject[]{newstring('x',1000),'y'};yieldreturnnewobject[]{newstring('x',1000000),'y'};}[Benchmark][ArgumentsSource(nameof(TestData))]publicintIndexOf_byte(stringstr,charc){returnMemoryMarshal.Cast<char,byte>(str.AsSpan()).IndexOf((byte)c);}[Benchmark][ArgumentsSource(nameof(TestData))]publicintIndexOf_char(stringstr,charc){returnstr.AsSpan().IndexOf(c);}}

IndexOf_byte

|  Method |               Toolchain |                  str | c |          Mean |      Error |     StdDev | Ratio ||-------- |------------------------ |--------------------- |-- |--------------:|-----------:|-----------:|------:|| IndexOf |      /Core_Root/corerun |             12345678 | 1 |      1.692 ns |  0.0022 ns |  0.0020 ns |  1.00 || IndexOf | /Core_Root_base/corerun |             12345678 | 1 |      1.686 ns |  0.0017 ns |  0.0013 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |     1234567812345678 | 0 |     10.262 ns |  0.0127 ns |  0.0119 ns |  1.78 || IndexOf | /Core_Root_base/corerun |     1234567812345678 | 0 |      5.763 ns |  0.0030 ns |  0.0025 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |     1234567812345678 | 1 |      1.676 ns |  0.0032 ns |  0.0030 ns |  1.01 || IndexOf | /Core_Root_base/corerun |     1234567812345678 | 1 |      1.665 ns |  0.0022 ns |  0.0020 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |            123456789 | 9 |      4.197 ns |  0.0089 ns |  0.0083 ns |  1.00 || IndexOf | /Core_Root_base/corerun |            123456789 | 9 |      4.193 ns |  0.0032 ns |  0.0029 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun | Conso(...)World [29] | d |      7.010 ns |  0.0049 ns |  0.0043 ns |  1.00 || IndexOf | /Core_Root_base/corerun | Conso(...)World [29] | d |      7.014 ns |  0.0018 ns |  0.0017 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun | Conso(...)World [29] | o |      1.679 ns |  0.0006 ns |  0.0006 ns |  1.00 || IndexOf | /Core_Root_base/corerun | Conso(...)World [29] | o |      1.678 ns |  0.0008 ns |  0.0007 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun | xxxxx(...)xxxxx [64] | y |      7.322 ns |  0.0028 ns |  0.0024 ns |  0.80 || IndexOf | /Core_Root_base/corerun | xxxxx(...)xxxxx [64] | y |      9.204 ns |  0.0032 ns |  0.0029 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |  xxxx(...)xxxx [200] | y |     13.269 ns |  0.0024 ns |  0.0022 ns |  0.62 || IndexOf | /Core_Root_base/corerun |  xxxx(...)xxxx [200] | y |     21.317 ns |  0.2285 ns |  0.2138 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun | xxxx(...)xxxx [1000] | y |     58.107 ns |  0.0458 ns |  0.0429 ns |  0.65 || IndexOf | /Core_Root_base/corerun | xxxx(...)xxxx [1000] | y |     89.510 ns |  0.0272 ns |  0.0241 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |  xx(...)xx [1000000] | y | 44,056.624 ns |  9.0990 ns |  8.5112 ns |  0.56 || IndexOf | /Core_Root_base/corerun |  xx(...)xx [1000000] | y | 78,269.573 ns | 26.4061 ns | 22.0503 ns |  1.00 |

IndexOf_char

|  Method |               Toolchain |                  str | c |          Mean |      Error |     StdDev | Ratio ||-------- |------------------------ |--------------------- |-- |--------------:|-----------:|-----------:|------:|| IndexOf |      /Core_Root/corerun |             12345678 | 1 |      1.382 ns |  0.0020 ns |  0.0018 ns |  1.00 || IndexOf | /Core_Root_base/corerun |             12345678 | 1 |      1.382 ns |  0.0023 ns |  0.0021 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |     1234567812345678 | 0 |     11.378 ns |  0.0141 ns |  0.0132 ns |  1.07 || IndexOf | /Core_Root_base/corerun |     1234567812345678 | 0 |     10.634 ns |  0.0240 ns |  0.0187 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |     1234567812345678 | 1 |      1.375 ns |  0.0016 ns |  0.0014 ns |  1.01 || IndexOf | /Core_Root_base/corerun |     1234567812345678 | 1 |      1.366 ns |  0.0034 ns |  0.0030 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |            123456789 | 9 |      2.612 ns |  0.0014 ns |  0.0011 ns |  1.00 || IndexOf | /Core_Root_base/corerun |            123456789 | 9 |      2.615 ns |  0.0029 ns |  0.0027 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun | Conso(...)World [29] | d |      5.784 ns |  0.0043 ns |  0.0038 ns |  0.91 || IndexOf | /Core_Root_base/corerun | Conso(...)World [29] | d |      6.389 ns |  0.0059 ns |  0.0052 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun | Conso(...)World [29] | o |      1.689 ns |  0.0016 ns |  0.0014 ns |  1.00 || IndexOf | /Core_Root_base/corerun | Conso(...)World [29] | o |      1.694 ns |  0.0007 ns |  0.0007 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun | xxxxx(...)xxxxx [64] | y |      7.009 ns |  0.0025 ns |  0.0023 ns |  0.76 || IndexOf | /Core_Root_base/corerun | xxxxx(...)xxxxx [64] | y |      9.204 ns |  0.0025 ns |  0.0023 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |  xxxx(...)xxxx [200] | y |     14.192 ns |  0.0070 ns |  0.0062 ns |  0.67 || IndexOf | /Core_Root_base/corerun |  xxxx(...)xxxx [200] | y |     21.267 ns |  0.2695 ns |  0.2521 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun | xxxx(...)xxxx [1000] | y |     64.790 ns |  0.2031 ns |  0.1800 ns |  0.72 || IndexOf | /Core_Root_base/corerun | xxxx(...)xxxx [1000] | y |     89.485 ns |  0.0196 ns |  0.0174 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |  xx(...)xx [1000000] | y | 60,098.282 ns |  3.1361 ns |  2.7801 ns |  0.77 || IndexOf | /Core_Root_base/corerun |  xx(...)xx [1000000] | y | 78,264.715 ns | 20.9694 ns | 19.6148 ns |  1.00 |

up to +40% faster for large inputs can 0.5-1ns regress in the worst case - if input is found in the very first vector.

There are weird regressions

| IndexOf |      /Core_Root/corerun |     1234567812345678 | 0 |     10.262 ns |  0.0127 ns |  0.0119 ns |  1.78 || IndexOf | /Core_Root_base/corerun |     1234567812345678 | 0 |      5.763 ns |  0.0030 ns |  0.0025 ns |  1.00 || IndexOf |      /Core_Root/corerun |     1234567812345678 | 0 |     11.378 ns |  0.0141 ns |  0.0132 ns |  1.07 || IndexOf | /Core_Root_base/corerun |     1234567812345678 | 0 |     10.634 ns |  0.0240 ns |  0.0187 ns |  1.00 |

but it goes away in FullPGO mode (and the PR becomes faster than the base as expected in this case) - it means the loop is not properly aligned by default. So the regression will be fixed once we update *.mibc for corelib (IndexOf definitely participates in the trainings)

adamsitnik reacted with heart emoji
@ghostghost assignedEgorBoApr 9, 2022
@ghostghost added the area-System.Memory labelApr 9, 2022
@ghost
Copy link

Tagging subscribers to this area: @dotnet/area-system-memory
See info inarea-owners.md if you want to be subscribed.

Issue Details

Use a slightly faster "no matches" check inIndexOf andIndexOfAny.

Benchmark:

publicclassBenchmarks{publicIEnumerable<object[]>TestData(){// Small inputs which are handled via SIMDyieldreturnnewobject[]{"12345678",'1'};yieldreturnnewobject[]{"123456789",'9'};yieldreturnnewobject[]{"1234567812345678",'1'};yieldreturnnewobject[]{"1234567812345678",'0'};yieldreturnnewobject[]{"Console WriteLine Hello World",'o'};yieldreturnnewobject[]{"Console WriteLine Hello World",'d'};// Large inputsyieldreturnnewobject[]{newstring('x',64),'y'};yieldreturnnewobject[]{newstring('x',200),'y'};yieldreturnnewobject[]{newstring('x',1000),'y'};yieldreturnnewobject[]{newstring('x',1000000),'y'};}[Benchmark][ArgumentsSource(nameof(TestData))]publicintIndexOf_byte(stringstr,charc){returnMemoryMarshal.Cast<char,byte>(str.AsSpan()).IndexOf((byte)c);}[Benchmark][ArgumentsSource(nameof(TestData))]publicintIndexOf_char(stringstr,charc){returnstr.AsSpan().IndexOf(c);}}

IndexOf_byte

|  Method |               Toolchain |                  str | c |          Mean |      Error |     StdDev | Ratio ||-------- |------------------------ |--------------------- |-- |--------------:|-----------:|-----------:|------:|| IndexOf |      /Core_Root/corerun |             12345678 | 1 |      1.692 ns |  0.0022 ns |  0.0020 ns |  1.00 || IndexOf | /Core_Root_base/corerun |             12345678 | 1 |      1.686 ns |  0.0017 ns |  0.0013 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |     1234567812345678 | 0 |     10.262 ns |  0.0127 ns |  0.0119 ns |  1.78 || IndexOf | /Core_Root_base/corerun |     1234567812345678 | 0 |      5.763 ns |  0.0030 ns |  0.0025 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |     1234567812345678 | 1 |      1.676 ns |  0.0032 ns |  0.0030 ns |  1.01 || IndexOf | /Core_Root_base/corerun |     1234567812345678 | 1 |      1.665 ns |  0.0022 ns |  0.0020 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |            123456789 | 9 |      4.197 ns |  0.0089 ns |  0.0083 ns |  1.00 || IndexOf | /Core_Root_base/corerun |            123456789 | 9 |      4.193 ns |  0.0032 ns |  0.0029 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun | Conso(...)World [29] | d |      7.010 ns |  0.0049 ns |  0.0043 ns |  1.00 || IndexOf | /Core_Root_base/corerun | Conso(...)World [29] | d |      7.014 ns |  0.0018 ns |  0.0017 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun | Conso(...)World [29] | o |      1.679 ns |  0.0006 ns |  0.0006 ns |  1.00 || IndexOf | /Core_Root_base/corerun | Conso(...)World [29] | o |      1.678 ns |  0.0008 ns |  0.0007 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun | xxxxx(...)xxxxx [64] | y |      7.322 ns |  0.0028 ns |  0.0024 ns |  0.80 || IndexOf | /Core_Root_base/corerun | xxxxx(...)xxxxx [64] | y |      9.204 ns |  0.0032 ns |  0.0029 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |  xxxx(...)xxxx [200] | y |     13.269 ns |  0.0024 ns |  0.0022 ns |  0.62 || IndexOf | /Core_Root_base/corerun |  xxxx(...)xxxx [200] | y |     21.317 ns |  0.2285 ns |  0.2138 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun | xxxx(...)xxxx [1000] | y |     58.107 ns |  0.0458 ns |  0.0429 ns |  0.65 || IndexOf | /Core_Root_base/corerun | xxxx(...)xxxx [1000] | y |     89.510 ns |  0.0272 ns |  0.0241 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |  xx(...)xx [1000000] | y | 44,056.624 ns |  9.0990 ns |  8.5112 ns |  0.56 || IndexOf | /Core_Root_base/corerun |  xx(...)xx [1000000] | y | 78,269.573 ns | 26.4061 ns | 22.0503 ns |  1.00 |

IndexOf_char

|  Method |               Toolchain |                  str | c |          Mean |      Error |     StdDev | Ratio ||-------- |------------------------ |--------------------- |-- |--------------:|-----------:|-----------:|------:|| IndexOf |      /Core_Root/corerun |             12345678 | 1 |      1.382 ns |  0.0020 ns |  0.0018 ns |  1.00 || IndexOf | /Core_Root_base/corerun |             12345678 | 1 |      1.382 ns |  0.0023 ns |  0.0021 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |     1234567812345678 | 0 |     11.378 ns |  0.0141 ns |  0.0132 ns |  1.07 || IndexOf | /Core_Root_base/corerun |     1234567812345678 | 0 |     10.634 ns |  0.0240 ns |  0.0187 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |     1234567812345678 | 1 |      1.375 ns |  0.0016 ns |  0.0014 ns |  1.01 || IndexOf | /Core_Root_base/corerun |     1234567812345678 | 1 |      1.366 ns |  0.0034 ns |  0.0030 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |            123456789 | 9 |      2.612 ns |  0.0014 ns |  0.0011 ns |  1.00 || IndexOf | /Core_Root_base/corerun |            123456789 | 9 |      2.615 ns |  0.0029 ns |  0.0027 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun | Conso(...)World [29] | d |      5.784 ns |  0.0043 ns |  0.0038 ns |  0.91 || IndexOf | /Core_Root_base/corerun | Conso(...)World [29] | d |      6.389 ns |  0.0059 ns |  0.0052 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun | Conso(...)World [29] | o |      1.689 ns |  0.0016 ns |  0.0014 ns |  1.00 || IndexOf | /Core_Root_base/corerun | Conso(...)World [29] | o |      1.694 ns |  0.0007 ns |  0.0007 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun | xxxxx(...)xxxxx [64] | y |      7.009 ns |  0.0025 ns |  0.0023 ns |  0.76 || IndexOf | /Core_Root_base/corerun | xxxxx(...)xxxxx [64] | y |      9.204 ns |  0.0025 ns |  0.0023 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |  xxxx(...)xxxx [200] | y |     14.192 ns |  0.0070 ns |  0.0062 ns |  0.67 || IndexOf | /Core_Root_base/corerun |  xxxx(...)xxxx [200] | y |     21.267 ns |  0.2695 ns |  0.2521 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun | xxxx(...)xxxx [1000] | y |     64.790 ns |  0.2031 ns |  0.1800 ns |  0.72 || IndexOf | /Core_Root_base/corerun | xxxx(...)xxxx [1000] | y |     89.485 ns |  0.0196 ns |  0.0174 ns |  1.00 ||         |                         |                      |   |               |            |            |       || IndexOf |      /Core_Root/corerun |  xx(...)xx [1000000] | y | 60,098.282 ns |  3.1361 ns |  2.7801 ns |  0.77 || IndexOf | /Core_Root_base/corerun |  xx(...)xx [1000000] | y | 78,264.715 ns | 20.9694 ns | 19.6148 ns |  1.00 |

up +40% faster for large inputs can 0.5-1ns regress in the worst case - if input is found in the very first vector.

Author:EgorBo
Assignees:EgorBo
Labels:

area-System.Memory

Milestone:-

@EgorBoEgorBoforce-pushed thearm-spanhelpers-fast branch from9e59caa to8ab651dCompareApril 9, 2022 20:13
Copy link
Contributor

@kunalspathakkunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM

@EgorBoEgorBo merged commita4503d6 intodotnet:mainApr 12, 2022
@adamsitnikadamsitnik added the tenet-performancePerformance related issue labelApr 14, 2022
@dakersnar
Copy link
Contributor

Performance improvement on Windows arm64:dotnet/perf-autofiling-issues#4732

@ghostghost locked asresolvedand limited conversation to collaboratorsMay 22, 2022
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@kunalspathakkunalspathakkunalspathak approved these changes

Assignees

@EgorBoEgorBo

Labels
area-System.Memorytenet-performancePerformance related issue
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

4 participants
@EgorBo@dakersnar@kunalspathak@adamsitnik

[8]ページ先頭

©2009-2025 Movatter.jp