
In this post, we are going to have a look at
- What is connection churn
- Common causes for connection churn in Go
- Benefits of utilizing persistent connections
- Methods for identifying connection churnand finally
- Demo that will help us better understand how to fix it connection churn in Go.
What is connection churn
While searching for a definition of what connection churn is I came across with
A system is said to have a high connection churn when its rate of newly opened connections is consistently high and its rate of closed connections is consistently high
or in other words when a service frequently opens and closes network connections instead of efficiently utilizing persistent connections. Persistent connections enable multiple HTTP requests and responses to be sent and received over a single TCP connection.
Common causes for connection churn in Go
Not Reading and Closing HTTP body
One of the most common causes and easy to miss, is not reading Body to completion and not closing after done with it.
It may be counterintuitive to read the body especially when you don't care about the contents of it.
There is a linter that can at least catch thebodyclose.
Using the default HTTP client
Go HTTP defaults are not great for heavy use
Internally Go HTTP client is using an LRU cache for persistent connection pooling. Instead of closing a socket connection after an HTTP request, if the connection is marked as reusable, it will be put back into the pool and later if you try to make another request before the idle connection timeout (90 seconds by default) it will re-use the existing connection from the pull rather than creating a new one. This will keep the number of total socket connections low as long as the pool doesn't fill up. If the pool is full the client will create a new connection and use it.
What's the size of the connection pool?
Looking at the DeafaultTransport in HTTP ClientMaxIdleConns
sets the size of the pool to 100 connections,BUT there is a caveat.DefaultMaxIdleConnsPerHost
is set to 2, what that means is even if the pool has a size of 100 connections, there is a cap of two open sockets per host.We will see this later in action with an example.
Benefits of utilizing persistent connections
From Wikipedia forHTTP persistent connection#Advantages.
- Reduced latency in subsequent requests (no handshaking and no slow start).
An example of a production service, after fixing an issue that allowed reusing connections.
- Reduced CPU usage and round-trips because of fewer new connections and TLS handshakes.
TLS handshakes can be CPU-intensive by nature, so reducing the newly established connections and inherently spend less time in TLS handshakes reduces your service's CPU usage.
- Enables HTTP pipelining of requests and responses.
- Reduced network congestion (fewer TCP connections).
- Errors can be reported without the penalty of closing the TCP connection.
Methods for identifying connection churn
Let's see some methods that can help us identify that a service is having a high connection churn.
High CPU usage on TLS
A good indicator that your service might be suffering from connection churn is when a high percentage of CPU is spent on TLS handshake.
The below CPU profile is an example of this phenomenon. ~38% of CPU time is spent on establishing TLS connections,a size comparable to the time spent running our service code.
Monitoring
Monitoring platforms likeDatadog provide tools for Network Monitoring.
An example below is a view fromNetwork Performance Monitoring where a service is having high number of connections established and closed per second. This clearly shows a service suffering from connection churn.
HTTP Trace
Another method is usinghttptrace which provides mechanisms to trace events within HTTP client requests.
An example usage where we printconnection info, which includes info on whether the connection wasReused
or not.
Netstat
Finally, we can use netstat which is a useful tool for inspecting connection statuses. In the command below we are running netstat every second and counting the number of all TCP connections at port8080
that are in statustime_wait
.
The high number oftime_wait
statuses indicates the creation of lots of short-lived TCP connections.
Fromman netstat
TIME_WAIT
The socket is waiting after close to handle packets still in the network.
Demo
It's time for the demo with 2 examples in Go that can cause connection churn.
The setup is simple, we are going to run a server that will accept requests and a client sending requests from a specified number of goroutines (you can find the code below).
To observe the state of our connections, in two different shells, we are going to run netstat
watch-n 1'netstat -n -p tcp | grep -i 8080 | grep -i time_wait | wc -l'
and
watch-n 1'netstat -n -p tcp | grep -i 8080'
to see the list of connections in more detail.
Example-1
In the first example, we run the code as is, without reading or closing the body and with the default Go HTTP client configuration.
After around ~16k requests we are getting errors
Get"http://localhost:8080/message": dial tcp[::1]:8080: connect: connection refused
we can see the shell running netstat also having a high number of sockets attime_wait
state.
Every 1.0s: netstat -n -p tcp | grep -i 8080tcp4 0 0 127.0.0.1.49478 127.0.0.1.8080 TIME_WAITtcp4 0 0 127.0.0.1.49477 127.0.0.1.8080 TIME_WAITtcp4 0 0 127.0.0.1.49481 127.0.0.1.8080 TIME_WAITtcp4 0 0 127.0.0.1.49482 127.0.0.1.8080 TIME_WAITtcp4 0 0 127.0.0.1.49486 127.0.0.1.8080 TIME_WAITtcp4 0 0 127.0.0.1.49485 127.0.0.1.8080 TIME_WAITtcp4 0 0 127.0.0.1.49489 127.0.0.1.8080 TIME_WAITtcp4 0 0 127.0.0.1.49490 127.0.0.1.8080 TIME_WAIT...
What's happening is, because we are not reading and closing any of the requests, eventually we run out of available ephemeral ports and we are not able to establish new connections.
We can fix this by closing and reading the request. After rerunning the code with the fix we can see on the netstat list with TPC connections only two connections asESTABLISHED
and the sockets with statustime_wait
remaining 0.
All our requests are handled by those two persistent connections successfully.
Every 1.0s: netstat-n-p tcp |grep-i 8080tcp4 0 0 127.0.0.1.8080 127.0.0.1.61273 ESTABLISHEDtcp4 0 0 127.0.0.1.8080 127.0.0.1.61272 ESTABLISHEDtcp4 0 0 127.0.0.1.61273 127.0.0.1.8080 ESTABLISHEDtcp4 0 0 127.0.0.1.61272 127.0.0.1.8080 ESTABLISHED
Example-2
For the second example, we are increasing the number of goroutines from 2 to 3, with this change essentially doing 3 concurrent requests.
Running the code now with the new update, we are seeing a steady increase of sockets intime_wait
status and two established connections as before.
Every 1.0s: netstat-n-p tcp |grep-i 8080tcp4 0 0 127.0.0.1.8080 127.0.0.1.49213 ESTABLISHEDtcp4 0 102 127.0.0.1.49213 127.0.0.1.8080 ESTABLISHEDtcp4 0 128 127.0.0.1.8080 127.0.0.1.49198 ESTABLISHEDtcp4 0 102 127.0.0.1.49198 127.0.0.1.8080 ESTABLISHEDtcp4 0 0 127.0.0.1.8080 127.0.0.1.61787 TIME_WAITtcp4 0 0 127.0.0.1.8080 127.0.0.1.61751 TIME_WAITtcp4 0 0 127.0.0.1.61787 127.0.0.1.8080 ...
We are successfully reusing two of the connections but still creating new connections to close the short after.
The question of why is this happening can be answered byDefaultMaxIdleConnsPerHost
. By increasing the concurrency to 3 requests, on every iteration we are able to reuse two of the connections (DefaultMaxIdleConnsPerHost
is 2) but the 3rd connection can not be put into the idle pool due tohttp: putIdleConn: too many idle connections for host
.
So the fix for this example would be adjustingMaxIdleConnsPerHost
to something > 2. Rerunning the code after the fix we can see now 3 established connections and zero sockets attime_wait
state.
Every 1.0s: netstat-n-p tcp |grep-i 8080tcp4 0 128 127.0.0.1.8080 127.0.0.1.52440 ESTABLISHEDtcp4 0 0 127.0.0.1.8080 127.0.0.1.52439 ESTABLISHEDtcp4 0 0 127.0.0.1.8080 127.0.0.1.52438 ESTABLISHEDtcp4 0 102 127.0.0.1.52440 127.0.0.1.8080 ESTABLISHEDtcp4 0 102 127.0.0.1.52439 127.0.0.1.8080 ESTABLISHEDtcp4 0 102 127.0.0.1.52438 127.0.0.1.8080 ESTABLISHED
Appendix
In MaxConnsPerHost and MaxIdleConnsPerHost in GO HTTP package, is "Host" a domain(i.e., yahoo.com) or an IP address? The actual meaning of "Host" will affect my settings for the connection pool.
The graphs are fromDatadog.
The demo was inspired byTuning the Go HTTP Client Settings for Load Testing.
The gopher art is fromMariaLetta.
Top comments(3)

Good read. May I translate it into Japanese for my blog site?

- LocationLondon
- EducationBachelor's Degree in Computer Science
- WorkSoftware Engineer at Datadog
- Joined
Thank you. Of course, if you could add a link to the original post I would appreciate it 😊

hi, thanks for the great article, if you don't mind, I created project to make it easier to test different scenarios. I added link to your article as well (because it is heavily influenced by it). This is the project -github.com/pete911/client-server
For further actions, you may consider blocking this person and/orreporting abuse