NotificationsYou must be signed in to change notification settings
Fork5.2k
Star17.2k

Optimize plan phase for foreground gcs#45208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

PeterSolMS merged 13 commits intodotnet:masterfromPeterSolMS:optimize_plan_phase_for_foreground_gcs

Nov 26, 2020

Merged

Optimize plan phase for foreground gcs#45208

PeterSolMS merged 13 commits intodotnet:masterfromPeterSolMS:optimize_plan_phase_for_foreground_gcs

Nov 26, 2020

Conversation

Copy link

Contributor

PeterSolMS commentedNov 25, 2020

Background:

We have an optimization for gen 0 and gen 1 collections in plan_phase where we have a list of the marked objects (the "mark list") so we can visit only the surviving objects in plan_phase rather than all objects.

When we execute a gen 0 or gen 1 collection while a background collection is in progress (we call these "foreground collections") , we don't use the mark list because we still have to turn off the background mark bits for the objects that don't survive.

The Optimization:

However, as the background mark bits are not stored in the objects themselves, but in a side table, we can turn off the background mark bits in bulk for the dead objects between surviving objects. That insight enables us to still use the mark list, and save a significant amount of execution time in foreground collections.

Profile Data

Here's some profile data for our GC benchmark program GCPerfSim.dll executed with parameters "-tc 250 -tagb 5000 -tlgb 2 -lohar 0 -sohsi 50 -sohSizeRange 96-256 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -pohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time" on a 128 core AMD machine (256 virtual processors). This set of parameters causes many more foreground GCs to happen than is typical.

The below shows the original source code with the CPU samples listed in the left column. Note in particular the high counts for the code sections handling the situation where the mark list is not being used:

plan_phase_source_profile_baseline.txt

For comparison, here's the changed source code - note that the previously expensive sections not using the mark list have become much cheaper, and the new section using bgc_clear_batch_mark_array_bits to turn off background mark bits in bulk is much cheaper than the section it replaces:

plan_phase_source_profile_optimized.txt

The here are charts for the exclusive CPU samples in plan_phase. Both for the baseline and the optimization, 3 profile runs were done:

Conclusions are:

Regular gen 0 and gen 1 GC cost in plan_phase is about the same with and without the optimization
Foreground gen 0 and gen 1 GC cost in plan_phase is much more consistent and typically much lower with the optimization.

PeterSolMS added12 commits

August 21, 2020 14:57