Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Revamp caching scheme in PoolingAsyncValueTaskMethodBuilder#55955

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
stephentoub merged 2 commits intodotnet:mainfromstephentoub:tlsprocpool
Jul 20, 2021

Conversation

@stephentoub
Copy link
Member

@stephentoubstephentoub commentedJul 19, 2021
edited
Loading

The current scheme caches one instance per thread in a ThreadStatic, and then has a locked stack that all threads contend on; then to avoid blocking a thread while accessing the cache, locking is done with TryEnter rather than Enter, simply skipping the cache if there is any contention. The locked stack is capped by default at ProcessorCount*4 objects.

The new scheme is simpler: one instance per thread, one instance per core. This ends up meaning fewer objects may be cached, but it also almost entirely eliminates contention between threads trying to rent/return objects. As a result, under heavy load it can actually do a better job of using pooled objects as it doesn't bail on using the cache in the face of contention. It also reduces concerns about larger machines being more negatively impacted by the caching. Under lighter load, since we don't cache as many objects, it does mean we may end up allocating a bit more, but generally not much more (and the size of the object we do allocate is a reference-field smaller).

This is on my 12-logical core box:

MethodToolchainMeanErrorStdDevRatioGen 0Gen 1Allocated
NonPooling\main\CoreRun.exe4.314 s0.0795 s0.1005 s1.001933000.0000483000.000011,800,056 KB
NonPooling\pr\corerun.exe4.284 s0.0188 s0.0167 s0.991933000.0000483000.000011,800,063 KB
Pooling\main\CoreRun.exe3.010 s0.0452 s0.0423 s1.00--323 KB
Pooling\pr\corerun.exe2.874 s0.0452 s0.0423 s0.95--203 KB
usingBenchmarkDotNet.Attributes;usingBenchmarkDotNet.Running;usingBenchmarkDotNet.Diagnosers;usingSystem.Runtime.CompilerServices;[MemoryDiagnoser]publicclassProgram{publicstaticvoidMain(string[]args)=>BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);privateconstintConcurrency=256;privateconstintIters=100_000;[Benchmark]publicTaskNonPooling(){returnTask.WhenAll(fromiinEnumerable.Range(0,Concurrency)selectTask.Run(asyncdelegate{for(inti=0;i<Iters;i++)awaitA().ConfigureAwait(false);}));staticasyncValueTaskA()=>awaitB().ConfigureAwait(false);staticasyncValueTaskB()=>awaitC().ConfigureAwait(false);staticasyncValueTaskC()=>awaitD().ConfigureAwait(false);staticasyncValueTaskD()=>awaitTask.Yield();}[Benchmark]publicTaskPooling(){returnTask.WhenAll(fromiinEnumerable.Range(0,Concurrency)selectTask.Run(asyncdelegate{for(inti=0;i<Iters;i++)awaitA().ConfigureAwait(false);}));[AsyncMethodBuilder(typeof(PoolingAsyncValueTaskMethodBuilder))]staticasyncValueTaskA()=>awaitB().ConfigureAwait(false);[AsyncMethodBuilder(typeof(PoolingAsyncValueTaskMethodBuilder))]staticasyncValueTaskB()=>awaitC().ConfigureAwait(false);[AsyncMethodBuilder(typeof(PoolingAsyncValueTaskMethodBuilder))]staticasyncValueTaskC()=>awaitD().ConfigureAwait(false);[AsyncMethodBuilder(typeof(PoolingAsyncValueTaskMethodBuilder))]staticasyncValueTaskD()=>awaitTask.Yield();}}

@stephentoubstephentoub added this to the6.0.0 milestoneJul 19, 2021
@ghost
Copy link

Tagging subscribers to this area: @dotnet/area-system-threading-tasks
See info inarea-owners.md if you want to be subscribed.

Issue Details

The current scheme caches one instance per thread in a ThreadStatic, and then has a locked stack that all threads contend on; then to avoid blocking a thread while accessing the cache, locking is done with TryEnter rather than Enter, simply skipping the cache if there is any contention. The locked stack is capped by default at ProcessorCount*4 objects.

The new scheme is simpler: one instance per thread, one instance per core. This ends up meaning fewer objects may be cached, but it also almost entirely eliminates contention between threads trying to rent/return objects. As a result, under heavy load it can actually do a better job of using pooled objects as it doesn't bail on using the cache in the face of contention. It also reduces concerns about larger machines being more negatively impacted by the caching. Under lighter load, since we don't cache as many objects, it does mean we may end up allocating a bit more, but generally not much more (and the size of the object we do allocate is a reference-field smaller).

MethodToolchainMeanErrorStdDevRatioGen 0Gen 1Allocated
NonPooling\main\CoreRun.exe4.314 s0.0795 s0.1005 s1.001933000.0000483000.000011,800,056 KB
NonPooling\pr\corerun.exe4.284 s0.0188 s0.0167 s0.991933000.0000483000.000011,800,063 KB
Pooling\main\CoreRun.exe3.010 s0.0452 s0.0423 s1.00--323 KB
Pooling\pr\corerun.exe2.874 s0.0452 s0.0423 s0.95--203 KB
usingBenchmarkDotNet.Attributes;usingBenchmarkDotNet.Running;usingBenchmarkDotNet.Diagnosers;usingSystem.Runtime.CompilerServices;[MemoryDiagnoser]publicclassProgram{publicstaticvoidMain(string[]args)=>BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);privateconstintConcurrency=256;privateconstintIters=100_000;[Benchmark]publicTaskNonPooling(){returnTask.WhenAll(fromiinEnumerable.Range(0,Concurrency)selectTask.Run(asyncdelegate{for(inti=0;i<Iters;i++)awaitA().ConfigureAwait(false);}));staticasyncValueTaskA()=>awaitB().ConfigureAwait(false);staticasyncValueTaskB()=>awaitC().ConfigureAwait(false);staticasyncValueTaskC()=>awaitD().ConfigureAwait(false);staticasyncValueTaskD()=>awaitTask.Yield();}[Benchmark]publicTaskPooling(){returnTask.WhenAll(fromiinEnumerable.Range(0,Concurrency)selectTask.Run(asyncdelegate{for(inti=0;i<Iters;i++)awaitA().ConfigureAwait(false);}));[AsyncMethodBuilder(typeof(PoolingAsyncValueTaskMethodBuilder))]staticasyncValueTaskA()=>awaitB().ConfigureAwait(false);[AsyncMethodBuilder(typeof(PoolingAsyncValueTaskMethodBuilder))]staticasyncValueTaskB()=>awaitC().ConfigureAwait(false);[AsyncMethodBuilder(typeof(PoolingAsyncValueTaskMethodBuilder))]staticasyncValueTaskC()=>awaitD().ConfigureAwait(false);[AsyncMethodBuilder(typeof(PoolingAsyncValueTaskMethodBuilder))]staticasyncValueTaskD()=>awaitTask.Yield();}}
Author:stephentoub
Assignees:-
Labels:

area-System.Threading.Tasks,tenet-performance

Milestone:6.0.0

Copy link
Member

@adamsitnikadamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM!

It also reduces concerns about larger machines being more negatively impacted by the caching

To validate that you could usethis template, modify it and run the benchmarks with and without your changes using the AMD (32 cores), ARM (48 cores), and Mono machine (56 cores).

The current scheme caches one instance per thread in a ThreadStatic, and then has a locked stack that all threads contend on; then to avoid blocking a thread while accessing the cache, locking is done with TryEnter rather than Enter, simply skipping the cache if there is any contention.  The locked stack is capped by default at ProcessorCount*4 objects.The new scheme is simpler: one instance per thread, one instance per core.  This ends up meaning fewer objects may be cached, but it also almost entirely eliminates contention between threads trying to rent/return objects.  As a result, under heavy load it can actually do a better job of using pooled objects as it doesn't bail on using the cache in the face of contention.  It also reduces concerns about larger machines being more negatively impacted by the caching.  Under lighter load, since we don't cache as many objects, it does mean we may end up allocating a bit more, but generally not much more (and the size of the object we do allocate is a reference-field smaller).
@stephentoubstephentoub merged commit776053f intodotnet:mainJul 20, 2021
@stephentoubstephentoub deleted the tlsprocpool branchJuly 20, 2021 22:06
@ghostghost locked asresolvedand limited conversation to collaboratorsAug 19, 2021
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.

Reviewers

@adamsitnikadamsitnikadamsitnik approved these changes

Assignees

No one assigned

Labels

Projects

None yet

Milestone

6.0.0

Development

Successfully merging this pull request may close these issues.

2 participants

@stephentoub@adamsitnik

[8]ページ先頭

©2009-2025 Movatter.jp