Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Manually optimize a rem 64 instruction to avoid regression on Mono#96203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation

@andrewjsaid
Copy link
Contributor

matouskozak reacted with heart emoji
@ghostghost added area-System.Collections community-contributionIndicates that the PR has been added by a community member labelsDec 19, 2023
@ghost
Copy link

Tagging subscribers to this area: @dotnet/area-system-collections
See info inarea-owners.md if you want to be subscribed.

Issue Details

dotnet/perf-autofiling-issues#26185 (comment)

cc@adamsitnik as requested

Author:andrewjsaid
Assignees:-
Labels:

area-System.Collections,community-contribution

Milestone:-

Copy link
Member

@adamsitnikadamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The changes LGTM, big thanks for such a quick fix@andrewjsaid !

@kotlarmilos@matouskozak could you please verify it and merge the PR if it solves the problem?

FWIW I'll be back to work at the beginning of January so I won't be able to respond until then.

@adamsitnikadamsitnik added tenet-performancePerformance related issue runtime-monospecific to the Mono runtime labelsDec 19, 2023
@kotlarmilos
Copy link
Member

kotlarmilos commentedDec 20, 2023
edited
Loading

The changes LGTM, big thanks for such a quick fix@andrewjsaid !

@kotlarmilos@matouskozak could you please verify it and merge the PR if it solves the problem?

FWIW I'll be back to work at the beginning of January so I won't be able to respond until then.

@andrewjsaid Thank you for taking swift action.I tested the& 0x3F vs% 64 on the Mono Interpreter/AOT x64 and arm64, and I didn't notice any performance difference.@matouskozak Could you please double-check it on your end before proceeding with the next steps?

After using the entire expression, likeprivate static bool CheckLengthQuick(string key) => (_lengthFilter & (1UL << (key.Length % 64))) > 0;, there is a significant performance difference (orders of magnitude).

matouskozak and adamsitnik reacted with thumbs up emoji

@andrewjsaid
Copy link
ContributorAuthor

Glad to hear that,@kotlarmilos ! Apologies for all this in the first place 🙈

kotlarmilos reacted with thumbs up emoji

@kotlarmiloskotlarmilos merged commitbc83100 intodotnet:mainDec 20, 2023
{
if(key.Length<minLength)minLength=key.Length;
if(key.Length>maxLength)maxLength=key.Length;
lengthFilter|=(1UL<<(key.Length%64));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Seems like something that should be fixed in mono itself, if it makes such an impactful difference?

tannergooding reacted with thumbs up emoji
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This doesn't appear to be the primary cause of regressions inc28bec4. We will investigate it further to detect the root cause.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Was the validation in#96203 (comment) incorrect? Can we revert this then?

kotlarmilos reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

From my limited knowledge and research I couldn't find the optimization of% 64 ->& 0x3F in mono's code so this optimization might still be valid.

Just looking at the regressions and I want to point out that we see a regression inSystem.Collections.Perf_SubstringFrozenDictionary on monodotnet/perf-autofiling-issues#26221.

This is strange as the original commitc28bec4 is designed to not affectTryGetValue onsubstring strategy subtypes ofOrdinalStringFrozenDictionary . It works that way because each concrete implementation should be getting it's own codegen and in turn be optimized toif(true) as this existing comment sums up
image

Based on that I'd say regressions onSubstringFrozenDictionary tests point towards the method call toCheckLengthFilter is not being inlined or possibly we are even doing virtual method dispatch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

so this optimization might still be valid.

Even if it is, the changed code is harder to understand / maintain than the original IMO, and the pattern of mod'ing an array/span length is super common; this is just one occurrence of that. If it's impactful here, it'd be impactful in many more places, and I'd prefer we not one-off it. It also sounds like the measurements that suggested this was valuable in this case was flawed, and so we don't actually know in this particular case whether it made a meaningful difference.

andrewjsaid reacted with thumbs up emoji
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It's also worth noting thatx % 64 becomingx & 0x3F is only a valid optimization forpositive x, while it returns a different result fornegative x, so it's not always a universal option to replace either.

This optimization currently lights up in RyuJIT by virtue ofkey being astring and the runtime having implicit knowledge thatstring.Length (as well asarray.Length andspan.Length) are never negative.

From my limited knowledge and research I couldn't find the optimization of % 64 -> & 0x3F in mono's code so this optimization might still be valid.

Such optimizations generally involve checking that forx % y,x is positive andy is a power of two. It's generally easiest to just check the codegen, but this is really one of those fundamental optimizations around division/remainder that a compiler should recognize as Stephen indicated.

andrewjsaid reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Happy to submit a revert PR now and for it to be approved/merged whenever but I might be unavailable for a few weeks at some point soon so I'd rather submit now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Was the validation in#96203 (comment) incorrect? Can we revert this then?

The validation was incorrect, sorry for confusion.

Could the regressions be related to inlining, especially sinceCheckLengthQuick was introduced? Additionally, there has been a change from usingif (Equals(item, _items[index])) toif (hashCode == _hashTable.HashCodes[index]). Is it possible thatEquals has been optimized?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Could the regressions be related to inlining, especially since CheckLengthQuick was introduced?

Almost certainly since in CoreCLR, for the benchmarkSubstringFrozenDictionary,if(CheckLengthQuick(key)) is first inlined toif(true) and then just optimized away resulting in no change at all. Whereas there is a regression in mono meaning that at the very least one of these optimizations didn't work.

The inlining is possible because concretesealed implementations of FrozenDictionary all have this line of code

private protected override ref readonly TValue GetValueRefOrNullRefCore(string key) => ref base.GetValueRefOrNullRefCore(key);

which allows the JIT to codegen for each concrete implementation. When doing so, it is able to inline the implementation'sEquals,GetHashCode as it is generating the code for the specific implementation not the base class. in CoreCLR the same is happening forCheckLengthQuick, however as opposed to the existing methods,CheckLengthQuick isvirtual andnot overridden - could either of these be a reason that mono isn't inlining it like it presumably inlinesEquals andGetHashCode?

Additionally, there has been a change from using if (Equals(item, _items[index])) to if (hashCode == _hashTable.HashCodes[index]). Is it possible that Equals has been optimized?

@kotlarmilos I believe that's just the diff viewer. The change is really just adding theif (CheckLengthQuick(key)) and some indenting.

kotlarmilos reacted with thumbs up emoji
@andrewjsaidandrewjsaid deleted the frozen-collections-length-filter-perf-rem64 branchJanuary 2, 2024 14:05
andrewjsaid added a commit to andrewjsaid/dotnet-runtime that referenced this pull requestJan 2, 2024
… runtimes which do not currently optimize it (dotnet#96203)"This reverts commitbc83100.
stephentoub pushed a commit that referenced this pull requestJan 3, 2024
… runtimes which do not currently optimize it (#96203)" (#96415)This reverts commitbc83100.
@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsFeb 3, 2024
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.

Reviewers

@stephentoubstephentoubstephentoub left review comments

@tannergoodingtannergoodingtannergooding left review comments

@kotlarmiloskotlarmiloskotlarmilos approved these changes

@adamsitnikadamsitnikadamsitnik approved these changes

Assignees

No one assigned

Labels

area-System.Collectionscommunity-contributionIndicates that the PR has been added by a community memberruntime-monospecific to the Mono runtimetenet-performancePerformance related issue

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

5 participants

@andrewjsaid@kotlarmilos@stephentoub@adamsitnik@tannergooding

[8]ページ先頭

©2009-2025 Movatter.jp