Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[wasm] Interpreter automatic PGO#92981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
kg merged 3 commits intodotnet:mainfromkg:wasm-interpgo
Nov 1, 2023
Merged

[wasm] Interpreter automatic PGO#92981

kg merged 3 commits intodotnet:mainfromkg:wasm-interpgo
Nov 1, 2023

Conversation

@kg
Copy link
Member

@kgkg commentedOct 4, 2023
edited
Loading

This PR adds automatic PGO for interpreter tiering. The idea is that by maintaining a list of which interpreter methods were tiered during execution, we can skip directly to generating optimized code on future runs. This will get the application into a steady high-performance state faster on future runs, while also reducing the amount of time/memory we spend on code generation (since normally we generate unoptimized code, then generate optimized code).

The infrastructure for this feature is not platform gated, but the only platform that has an implementation for loading/saving the PGO data right now is the browser, and the runtime option controlling it is only default-on for the browser.

Web application code can either useINTERNAL.interpgo_load_data andINTERNAL.interpgo_save_data to manually control the loading/saving of the data, or can use the newwithInterpreterPgo(true, autoSaveDelay) builder method to turn on automatic mode. Automatic mode will attempt to load the data during startup, and will automatically save it after a delay (at which point your application should be in something approximating a steady-state and critical code has already tiered.) The current web implementation repurposes the cache storage code from pavel's memory snapshot feature to store the pgo table in the cache next to the snapshot(s).

This PR also adds basic (controlled by a#define and off by default) timing instrumentation for generate_code, because in my testing the browser's profiler was producing wildly incorrect timing data for interpreter codegen. This should make it easy to evaluate how well the feature is working since you can just look at the output.

One interesting finding: The list of method hashes generated on a first run is bigger than the list generated on future runs - I believe this is because once a full set of methods is tiered, we no longer need the tiered version of some of them because the hottest methods have gotten inlined into their callers. So on a second run the list of hashes stabilizes into a smaller list.

In a couple local test runs of browser-bench with this active and without, the time spent in generate_code was 3600ms without interpgo and 3209ms with interpgo, so the improvement over the course of a full run of an application with lots of generated code should be worthwhile. For smaller applications it's below the noise floor, I haven't been able to measure it successfully at that scale.

pavelsavara reacted with rocket emoji
@ghostghost assignedkgOct 4, 2023
@kgkg added the arch-wasmWebAssembly architecture labelOct 4, 2023
@ghost
Copy link

Tagging subscribers to this area:@BrzVlad,@kotlarmilos
See info inarea-owners.md if you want to be subscribed.

Issue Details

This PR adds automatic PGO for interpreter tiering. The idea is that by maintaining a list of which interpreter methods were tiered during execution, we can skip directly to generating optimized code on future runs. This will get the application into a steady high-performance state faster on future runs, while also reducing the amount of time/memory we spend on code generation (since normally we generate unoptimized code, then generate optimized code).

Application code can either useINTERNAL.interpgo_load_data andINTERNAL.interpgo_save_data to manually control the loading/saving of the data, or can use the newwithInterpreterPgo(true, autoSaveDelay) builder method to turn on automatic mode. Automatic mode will attempt to load the data during startup, and will automatically save it after a delay (at which point your application should be in something approximating a steady-state and critical code has already tiered.)

Author:kg
Assignees:kg
Labels:

area-Codegen-Interpreter-mono

Milestone:-

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is not true for generic instances.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

OK, so I need to hash the generic arguments? Or is it more complex than that?

This comment was marked as duplicate.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

some generic instances are very hot in the BCL, like the ones used in string/array searches

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It can hash the full method name for example, but maybe its good enough this way.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it's probably fine this way, if any instance of the method got tiered we can probably tier all of them. This still won't generate code for other instances until they're actually used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If it's a wrapper method, I think the token is either shared with the method that it is wrapping, or it is zero. maybe we don't mind the collisions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we share the AOT compiler's logic for uniquely identifying methods?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The aot compiler/runtime has a mono_aot_method_hash () which returns a good hash.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

mono_aot_method_hash looks comprehensive but also looks really expensive, and i'm not sure how well it will avoid collisions. do you still want me to use it?

This comment was marked as duplicate.

Copy link
Member

@radicalradical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

WBT changes look good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm not very familiar with playwrite, should we navigate the browser away before we load the page second time ?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Each page is basically a separate tab as I understand it

Checkpoint: Interpgo bloom filter worksWorkaround for 'reach managed cold' hanging foreverRemove loggingCheckpointCheckpointConfigurable interpreter PGOEnable interpgo for browser benchCheckpoint: Refactor interpgo to share cache code with memory snapshotsUse cache storage for Interpreter PGOCode cleanup + repair merge damageRepair merge damageFix some bugs and troubleshoot WBT failureFix buildMove prototypesI love CFix key collision between interpgo and memory snapshotFix storeCacheEntry always writing to the memory snapshot keyCheckpointMove more of interpgo outside of platform guardsFix buildMake the inline murmurhash functions also static to fix linker errorsRemove unused signature argumentCleanupAddress PR feedbackRearrange codeAddress PR feedbackFix assert when loading empty tableCheckpointUpdate dotnet.d.tsMove interp codegen timing to a runtime option + make it thread safeMove interp pgo logging to a runtime optionAdd a WBT test that verifies basic functionality of interpreter PGOAdd missing fileAwait interp_pgo_save_dataApply suggestions from code reviewCo-authored-by: Ankit Jain <radical@gmail.com>Cleanup / address PR feedbackAddress PR feedbackIncrease browser-bench waitFor timeoutFix syntax
@kgkg merged commit8bce5a8 intodotnet:mainNov 1, 2023
@ghostghost locked asresolvedand limited conversation to collaboratorsDec 1, 2023
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.

Reviewers

@lambdageeklambdageeklambdageek left review comments

@radicalradicalradical approved these changes

@pavelsavarapavelsavarapavelsavara approved these changes

@BrzVladBrzVladBrzVlad approved these changes

@lewinglewingAwaiting requested review from lewinglewing is a code owner

@kotlarmiloskotlarmilosAwaiting requested review from kotlarmiloskotlarmilos is a code owner

@SamMonoRTSamMonoRTAwaiting requested review from SamMonoRT

+1 more reviewer

@vargazvargazvargaz left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

@kgkg

Labels

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

6 participants

@kg@radical@vargaz@pavelsavara@lambdageek@BrzVlad

[8]ページ先頭

©2009-2025 Movatter.jp