- Notifications
You must be signed in to change notification settings - Fork329
Fix net.Conn deadlines tearing down the connection when no reads or writes are active#350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation
…rites are activeWe were unconditionally canceling the read/write contexts when thedeadline timer fired, even if there were no active reads or writes.We instead only cancel the context if there is an active operation,otherwise we set a flag so that future calls (without the deadline beingreset) will fail.Updatestailscale/tailscale#5921
@nhooyr we've been using your library at Tailscale with great success, but we recently ran into this problem. The way to think about it is that the following sequence of calls should work: c.SetWriteDeadline(5*time.Second)c.Write(x)// immediatetime.Sleep(10*time.Second)c.SetWriteDeadline(5*time.Second)c.Write(x)// immediate But the timer would fire in the middle of the Happy to make tweaks if you have other ideas for how to fix this. |
… a net.ConnWe removed it in#4806 in favor of the built-in functionality from thenhooyr.io/websocket package. However, it has an issue with deadlinesthat has not been fixed yet (seecoder/websocket#350). Temporarilygo back to using a custom wrapper (using the fix from our fork) so thatderpers will stop closing connections too aggressively.Updates#5921Signed-off-by: Mihai Parparita <mihai@tailscale.com>
… a net.ConnWe removed it in#4806 in favor of the built-in functionality from thenhooyr.io/websocket package. However, it has an issue with deadlinesthat has not been fixed yet (seecoder/websocket#350). Temporarilygo back to using a custom wrapper (using the fix from our fork) so thatderpers will stop closing connections too aggressively.Updates#5921Signed-off-by: Mihai Parparita <mihai@tailscale.com>
… a net.ConnWe removed it in#4806 in favor of the built-in functionality from thenhooyr.io/websocket package. However, it has an issue with deadlinesthat has not been fixed yet (seecoder/websocket#350). Temporarilygo back to using a custom wrapper (using the fix from our fork) so thatderpers will stop closing connections too aggressively.Updates#5921Signed-off-by: Mihai Parparita <mihai@tailscale.com>
nhooyr left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Hi@mihaip!
Glad to hear and thank you for sponsoring me on GitHub. Apologies for the delayed response.
I've actually already fixed this in themasterdev branch. See#228
Just haven't tagged a new release. See#256
Your patch doesn't seem correct to me as concurrent writes/reads should be allowed. I would copy the relevant parts from my linked commit instead.
There's quite a bit of debugging required to get a new version out as the tests occassionally fail onmasterdev. I don't want to push a new release without being absolutely certain I didn't introduce a bug somewhere.
I hope to get to it soon. Life's been a little wild and unpredictable in both good and bad ways.
Sorry,
Anmol
c.writeTimer.Stop() | ||
}else { | ||
c.writeTimer.Reset(t.Sub(time.Now())) | ||
c.writeTimer.Reset(time.Until(t)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
nice, didn't realize we hadtime.Until
.
c.readTimer.Reset(t.Sub(time.Now())) | ||
c.readTimer.Reset(time.Until(t)) | ||
} | ||
c.afterReadDeadline.Store(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
while highly unlikely this can race with the timer routine as it's possible the timer has already set off and the timer goroutine has already setafterReadDeadline
to true before this goroutine hits this line.
c.readMu.Lock() | ||
deferc.readMu.Unlock() | ||
ifswapped:=c.reading.CompareAndSwap(false,true);!swapped { | ||
panic("Concurrent reads not allowed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
concurrent reads should be allowed
} | ||
ifswapped:=c.writing.CompareAndSwap(false,true);!swapped { | ||
panic("Concurrent writes not allowed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
concurrent writes should be allowed
Sorry, my commit is in dev, not master. See0a61ffe I'm not certain if it can be applied directly to master without conflict but it should. And if not, should be trivial to adapt for your needs for now. |
… a net.ConnWe removed it in#4806 in favor of the built-in functionality from thenhooyr.io/websocket package. However, it has an issue with deadlinesthat has not been fixed yet (seecoder/websocket#350). Temporarilygo back to using a custom wrapper (using the fix from our fork) so thatderpers will stop closing connections too aggressively.Updates#5921Change-Id: I1597644e8ba47b413e33f2201eab935145566c0eSigned-off-by: Mihai Parparita <mihai@tailscale.com>(cherry picked from commit9d04ffc)
… a net.ConnWe removed it in#4806 in favor of the built-in functionality from thenhooyr.io/websocket package. However, it has an issue with deadlinesthat has not been fixed yet (seecoder/websocket#350). Temporarilygo back to using a custom wrapper (using the fix from our fork) so thatderpers will stop closing connections too aggressively.Updates#5921Change-Id: I1597644e8ba47b413e33f2201eab935145566c0eSigned-off-by: Mihai Parparita <mihai@tailscale.com>(cherry picked from commit9d04ffc)
Since I've already fixed this in dev, closing. |
This commit only modifies the jsonrpc package. It doesn't provide a wayto serve websocket connections when Juno is started, which can beimplemented in a future PR.Two design decisions:We use the nhooyr.io/websocket library. The widely usedgorilla/websocket library has been archived and geth is alreadyrunning into issues [1]. The most notable consumer of nhooyr.io/websocketthat I could find is Tailscale [2,3].Unlike geth [4] and cometbft [5], we don't send ping messages to theclient. Pro: saves bandwidth. Con: if the client app hangs, the Websocketconnection will persist. As of 2017, Chromium is doing the same [6].Note that the TCP keep-alives sent every 15 seconds [7] by the HTTPserver will detect when the TCP connection is down, but will not detectwhen the application using the TCP connection is stuck.We can change this in the future, but it will slightly complicate thelogic and a ping loop is difficult to test since the pong replies arehandled by the websocket library.[1]:ethereum/go-ethereum#27261[2]:https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13[3]:coder/websocket#350 (comment)[4]:https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359[5]:https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432[6]:https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ[7]:https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
This commit only modifies the jsonrpc package. It doesn't provide a wayto serve websocket connections when Juno is started, which can beimplemented in a future PR.Two design decisions:We use the nhooyr.io/websocket library. The widely usedgorilla/websocket library has been archived and geth is alreadyrunning into issues [1]. The most notable consumer of nhooyr.io/websocketthat I could find is Tailscale [2,3].Unlike geth [4] and cometbft [5], we don't send ping messages to theclient. Pro: saves bandwidth. Con: if the client app hangs, the Websocketconnection will persist. As of 2017, Chromium is doing the same [6].Note that the TCP keep-alives sent every 15 seconds [7] by the HTTPserver will detect when the TCP connection is down, but will not detectwhen the application using the TCP connection is stuck.We can change this in the future, but it will slightly complicate thelogic and a ping loop is difficult to test since the pong replies arehandled by the websocket library.[1]:ethereum/go-ethereum#27261[2]:https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13[3]:coder/websocket#350 (comment)[4]:https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359[5]:https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432[6]:https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ[7]:https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
This commit only modifies the jsonrpc package. It doesn't provide a wayto serve websocket connections when Juno is started, which can beimplemented in a future PR.Two design decisions:We use the nhooyr.io/websocket library. The widely usedgorilla/websocket library has been archived and geth is alreadyrunning into issues [1]. The most notable consumer of nhooyr.io/websocketthat I could find is Tailscale [2,3].Unlike geth [4] and cometbft [5], we don't send ping messages to theclient. Pro: saves bandwidth. Con: if the client app hangs, the Websocketconnection will persist. As of 2017, Chromium is doing the same [6].Note that the TCP keep-alives sent every 15 seconds [7] by the HTTPserver will detect when the TCP connection is down, but will not detectwhen the application using the TCP connection is stuck.We can change this in the future, but it will slightly complicate thelogic and a ping loop is difficult to test since the pong replies arehandled by the websocket library.[1]:ethereum/go-ethereum#27261[2]:https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13[3]:coder/websocket#350 (comment)[4]:https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359[5]:https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432[6]:https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ[7]:https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
This commit only modifies the jsonrpc package. It doesn't provide a wayto serve websocket connections when Juno is started, which can beimplemented in a future PR.Two design decisions:We use the nhooyr.io/websocket library. The widely usedgorilla/websocket library has been archived and geth is alreadyrunning into issues [1]. The most notable consumer of nhooyr.io/websocketthat I could find is Tailscale [2,3].Unlike geth [4] and cometbft [5], we don't send ping messages to theclient. Pro: saves bandwidth. Con: if the client app hangs, the Websocketconnection will persist. As of 2017, Chromium is doing the same [6].Note that the TCP keep-alives sent every 15 seconds [7] by the HTTPserver will detect when the TCP connection is down, but will not detectwhen the application using the TCP connection is stuck.We can change this in the future, but it will slightly complicate thelogic and a ping loop is difficult to test since the pong replies arehandled by the websocket library.[1]:ethereum/go-ethereum#27261[2]:https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13[3]:coder/websocket#350 (comment)[4]:https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359[5]:https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432[6]:https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ[7]:https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
This commit only modifies the jsonrpc package. It doesn't provide a wayto serve websocket connections when Juno is started, which can beimplemented in a future PR.Two design decisions:We use the nhooyr.io/websocket library. The widely usedgorilla/websocket library has been archived and geth is alreadyrunning into issues [1]. The most notable consumer of nhooyr.io/websocketthat I could find is Tailscale [2] [3].Unlike geth [4] and cometbft [5], we don't send ping messages to theclient. Pro: saves bandwidth. Con: if the client app hangs, the Websocketconnection will persist. As of 2017, Chromium is doing the same [6].Note that the TCP keep-alives sent every 15 seconds [7] by the HTTPserver will detect when the TCP connection is down, but will not detectwhen the application using the TCP connection is stuck.We can change this in the future, but it will slightly complicate thelogic and a ping loop is difficult to test since the pong replies arehandled by the websocket library.[1]:ethereum/go-ethereum#27261[2]:https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13[3]:coder/websocket#350 (comment)[4]:https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359[5]:https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432[6]:https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ[7]:https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
This commit only modifies the jsonrpc package. It doesn't provide a wayto serve websocket connections when Juno is started, which can beimplemented in a future PR.Two design decisions:We use the nhooyr.io/websocket library. The widely usedgorilla/websocket library has been archived and geth is alreadyrunning into issues with it [1]. The most notable consumer ofnhooyr.io/websocket that I could find is Tailscale [2] [3].Unlike geth [4] and cometbft [5], we don't send ping messages to theclient. Pro: saves bandwidth. Con: if the client app hangs, the Websocketconnection will persist. As of 2017, Chromium is doing the same [6].Note that the TCP keep-alives sent every 15 seconds [7] by the HTTPserver will detect when the TCP connection is down, but will not detectwhen the application using the TCP connection is stuck.We can change this in the future, but it will slightly complicate thelogic and a ping loop is difficult to test since the pong replies arehandled by the websocket library.[1]:ethereum/go-ethereum#27261[2]:https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13[3]:coder/websocket#350 (comment)[4]:https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359[5]:https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432[6]:https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ[7]:https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
This commit only modifies the jsonrpc package. It doesn't provide a wayto serve websocket connections when Juno is started, which can beimplemented in a future PR.Two design decisions:We use the nhooyr.io/websocket library. The widely usedgorilla/websocket library has been archived and geth is alreadyrunning into issues with it [1]. The most notable consumer ofnhooyr.io/websocket that I could find is Tailscale [2] [3].Unlike geth [4] and cometbft [5], we don't send ping messages to theclient. Pro: saves bandwidth. Con: if the client app hangs, the Websocketconnection will persist. As of 2017, Chromium is doing the same [6].Note that the TCP keep-alives sent every 15 seconds [7] by the HTTPserver will detect when the TCP connection is down, but will not detectwhen the application using the TCP connection is stuck.We can change this in the future, but it will slightly complicate thelogic and a ping loop is difficult to test since the pong replies arehandled by the websocket library.[1]:ethereum/go-ethereum#27261[2]:https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13[3]:coder/websocket#350 (comment)[4]:https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359[5]:https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432[6]:https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ[7]:https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
This commit only modifies the jsonrpc package. It doesn't provide a wayto serve websocket connections when Juno is started, which can beimplemented in a future PR.Two design decisions:We use the nhooyr.io/websocket library. The widely usedgorilla/websocket library has been archived and geth is alreadyrunning into issues with it [1]. The most notable consumer ofnhooyr.io/websocket that I could find is Tailscale [2] [3].Unlike geth [4] and cometbft [5], we don't send ping messages to theclient. Pro: saves bandwidth. Con: if the client app hangs, the Websocketconnection will persist. As of 2017, Chromium is doing the same [6].Note that the TCP keep-alives sent every 15 seconds [7] by the HTTPserver will detect when the TCP connection is down, but will not detectwhen the application using the TCP connection is stuck.We can change this in the future, but it will slightly complicate thelogic and a ping loop is difficult to test since the pong replies arehandled by the websocket library.[1]:ethereum/go-ethereum#27261[2]:https://github.com/tailscale/tailscale/blob/0f5090c526c2019fae94695b2991cb561e131788/cmd/derper/websocket.go#L13[3]:coder/websocket#350 (comment)[4]:https://github.com/ethereum/go-ethereum/blob/6d2aeb43d516fbe2cafd9e65df7ccbd885f861d3/rpc/websocket.go#L336-L359[5]:https://github.com/cometbft/cometbft/blob/c138e785c992b3a76cb25fdf2e0820e29c56c1af/rpc/jsonrpc/server/ws_handler.go#L427-L432[6]:https://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJhttps://groups.google.com/a/chromium.org/g/net-dev/c/2RAm-ZYAIYY/m/NMFuHBuHAAAJ[7]:https://github.com/golang/go/blob/f3bf18117b284b63f4350a5aa61773a30d91a6d5/src/net/dial.go#L15-L17
We were unconditionally canceling the read/write contexts when the deadline timer fired, even if there were no active reads or writes. We instead only cancel the context if there is an active operation, otherwise we set a flag so that future calls (without the deadline being reset) will fail.
Updatestailscale/tailscale#5921