The connection pool currently manages the list of available connections and the requests queue under a single lock.
As the number of cores and RPS rise, the speed at which the pool can manage connections becomes a bottleneck.

This PR brings the fast path (there are enough connections available to process all requests) down to aConcurrentStack.Push +ConcurrentStack.TryPop.

Numbers for ConcurrentQueue

Numbers from#70098 (comment):

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=256 --variable numberOfHttpClients=1 --server.framework net9.0 --client.framework net9.0 --json 1x256.jsoncrank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=32 --variable numberOfHttpClients=8 --server.framework net9.0 --client.framework net9.0 --json 8x32.jsoncrank compare .\1x256.json .\8x32.json

client	1x256	8x32
RPS	693,873	875,814	+26.22%
Patched RPS	873,571	876,394	+0.32%

This shows that before this PR, manually splitting load between multiple HttpClient instances can have a significant impact.
After the change, there's no more benefit to doing that as a single pool can efficiently handle the higher load.

YARP's http-http 100 byte scenario:

load	yarp-base	yarp-patched
Latency 50th (ms)	0.73	0.68	-6.97%
Latency 75th (ms)	0.82	0.74	-9.82%
Latency 90th (ms)	1.03	0.89	-13.39%
Latency 95th (ms)	1.41	1.18	-16.41%
Latency 99th (ms)	2.87	2.68	-6.63%
Mean latency (ms)	0.83	0.76	-8.74%
Requests/sec	306,699	335,921	+9.53%

In-memory loopback benchmark that stresses the connection pool contention:https://gist.github.com/MihaZupan/27f01d78c71da7b9024b321e743e3d88

Rough RPS numbers with 1-6 threads:

RPS (1000s)	1	2	3	4	5	6
main	2060	1900	1760	1670	1570	1500
patched	2150	2600	3400	3700	4100	4260

~~Breaking change consideration~~ - This is no longer relevant after switching toConcurrentStack

~~While I was careful to keep the observable behavior of the pool as close as possible to what we have today, there is one important change I made intentionally:~~

~~The order in which we dequeue idle connections is changed from LIFO to FIFO (from a stack to a queue). This is because the backing store for available connections is now aConcurrentQueue.~~
Where this distinction may be important is if a load drops for a longer period such that we no longer need as many connections. We would previously keep the overhead connections completely idle and eventually remove them via the idle timeout. With this change, we would keep cycling through all connections, potentially keeping more of them alive.
~~A slight benefit of that behavior may be that it makes it less likely to run into the idle close race condition (server closing an idle connection after we've started using it again).~~

See#99364 (comment) forConcurrentStack results (current PR).

MihaZupan added the area-System.Net.Http label

Mar 6, 2024

MihaZupan added this to the9.0.0 milestone

Mar 6, 2024

MihaZupan requested a review froma team

March 6, 2024 16:45

MihaZupan self-assigned this

Mar 6, 2024

Copy link

ghost commentedMar 6, 2024

Tagging subscribers to this area: @dotnet/ncl
See info inarea-owners.md if you want to be subscribed.

Issue Details

Closes#70098

This PR brings the fast path (there are enough connections available to process all requests) down to aConcurrentQueue.Enqueue +ConcurrentQueue.TryDequeue

Numbers from#70098 (comment):

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=256 --variable numberOfHttpClients=1 --server.framework net9.0 --client.framework net9.0 --json 1x256.jsoncrank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=32 --variable numberOfHttpClients=8 --server.framework net9.0 --client.framework net9.0 --json 8x32.jsoncrank compare .\1x256.json .\8x32.json

client	1x256	8x32
RPS	693,873	875,814	+26.22%
Patched RPS	873,571	876,394	+0.32%

YARP's http-http 100 byte scenario:

load	yarp-base	yarp-patched
Latency 50th (ms)	0.73	0.68	-6.97%
Latency 75th (ms)	0.82	0.74	-9.82%
Latency 90th (ms)	1.03	0.89	-13.39%
Latency 95th (ms)	1.41	1.18	-16.41%
Latency 99th (ms)	2.87	2.68	-6.63%
Mean latency (ms)	0.83	0.76	-8.74%
Requests/sec	306,699	335,921	+9.53%

In-memory loopback benchmark that stresses the connection pool contention:https://gist.github.com/MihaZupan/27f01d78c71da7b9024b321e743e3d88

Rough RPS numbers with 1-6 threads:

RPS (1000s)	1	2	3	4	5	6
main	2060	1900	1760	1670	1570	1500
patched	2150	2600	3400	3700	4100	4260

Breaking change consideration
While I was careful to keep the observable behavior of the pool as close as possible to what we have today, there is one important change I made intentionally:

The order in which we dequeue idle connections is changed from LIFO to FIFO (from a stack to a queue). This is because the backing store for available connections is now aConcurrentQueue.
Where this distinction may be important is if a load drops for a longer period such that we no longer need as many connections. We would previously keep the overhead connections completely idle and eventually remove them via the idle timeout. With this change, we would keep cycling through all connections, potentially keeping more of them alive.
A slight benefit of that behavior may be that it makes it less likely to run into the idle close race condition (server closing an idle connection after we've started using it again).

Author:	MihaZupan
Assignees:	MihaZupan
Labels:	`area-System.Net.Http`
Milestone:	9.0.0

Copy link

MemberAuthor

MihaZupan commentedMar 6, 2024

/azp run runtime-libraries-coreclr outerloop

Copy link

azure-pipelinesbot commentedMar 6, 2024

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

MemberAuthor

MihaZupan commentedMar 6, 2024

/azp run runtime-libraries stress-http

Copy link

azure-pipelinesbot commentedMar 6, 2024

Azure Pipelines successfully started running 1 pipeline(s).

stephentoub reviewed

Mar 6, 2024

View reviewed changes

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.csShow resolvedHide resolved

stephentoub reviewed

Mar 6, 2024

View reviewed changes

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs OutdatedShow resolvedHide resolved

stephentoub reviewed

Mar 6, 2024

View reviewed changes

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.csShow resolvedHide resolved

stephentoub reviewed

Mar 6, 2024

View reviewed changes

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs OutdatedShow resolvedHide resolved

This was referencedMar 6, 2024

System.Text.Json failing some large file tests#59678

Closed

[wasm] System.Text.Json.Tests running out of memory#98578

Closed

MihaZupan mentioned this pull request

Mar 7, 2024

Bookkeeping bug in HttpConnectionPool when handling an HTTP/2 => HTTP/1.1 downgrade#99401

Closed

doddgu mentioned this pull request

Mar 11, 2024

YARP has a higher cpu usage than Nginxdotnet/yarp#2427

Open

MihaZupan added2 commits

March 20, 2024 17:21

Improve the throughput of SocketsHttpHandler's HTTP/1.1 connection pool

71ba0b6

More comments

4153193

Copy link

MemberAuthor

MihaZupan commentedMar 21, 2024

Using a stack here is close enough (in benchmarks the collection is going to be close to empty all the time, so contention between the stack and queue is similar). I'll switch the PR to use that to avoid thebehavioral change.
We can revisit it in the future with more idle eviction heuristics to get the last few % with a queue if needed.

It does mean an extra 32 byte allocation for each enqueue op sadly (+1 for#31911).

load	yarp-main	yarp-stack		yarp-queue
Latency 50th (ms)	0.73	0.69	-4.95%	0.68	-5.91%
Latency 75th (ms)	0.81	0.75	-7.49%	0.74	-8.97%
Latency 90th (ms)	0.99	0.88	-10.98%	0.85	-14.40%
Latency 95th (ms)	1.31	1.13	-13.99%	1.05	-20.00%
Latency 99th (ms)	2.83	2.60	-7.95%	2.45	-13.29%
Mean latency (ms)	0.82	0.76	-6.78%	0.75	-8.59%
Requests/sec	312,857	335,444	+7.22%	342,141	+9.36%