- Notifications
You must be signed in to change notification settings - Fork928
fix: close SSH sessions bottom-up if top-down fails#14678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This stack of pull requests is managed by Graphite.Learn more about stacking. Join@spikecurtis and the rest of your teammates on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
While I think this is a valid approach and appreciate the test coverage you've written, I have a slightly different approach in mind that I feel will be more robust. I'd like to hear your thoughts.
Uh oh!
There was an error while loading.Please reload this page.
d9dc47a
to21ee4de
CompareThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for taking another look at the implementation and making it more robust, looking great! 👍🏻
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
21ee4de
toa3d0c5b
Comparea3d0c5b
to7dd56e0
Compare6ff9a05
intomainUh oh!
There was an error while loading.Please reload this page.
Merge activity
|
Uh oh!
There was an error while loading.Please reload this page.
Fixeshttps://github.com/coder/customers/issues/669
In
coder ssh
we normally attempt to tear stuff down top to bottom, so that SSH stuff like remote-forwards are cleaned up nicely, tailnet Coordination gets a disconnect, etc.But, TCP timeouts can be very long (72 hours currently for SSH), and so if network connectivity down, we can effectively deadlock trying to tear down the remote-forward state, which involves sending an SSH command message and doesn't time out independently of the underlying TCP connection.
This PR introduces a "graceful shutdown" timeout for the upper layers of our SSH stuff to finish closing. If they haven't closed in 5 seconds, we shut down the agent connection, which cascades bottom-up.