Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix(agent): ensure SSH server shutdown with process groups#17227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
mafredri merged 4 commits intomainfrommafredri/fix-agent-agentssh-close-block
Apr 3, 2025

Conversation

mafredri
Copy link
Member

@mafredrimafredri commentedApr 2, 2025
edited
Loading

After closer inspection of logs mentioned in#17108, it looks like the "connecting to coderd" loop was a red herring and the final log line from(*agentssh.Server).Close is missing.

PTY's already create a process group, so they seem to have been unaffected. But non-PTY SSH sessions were able to prevent SSH server close from completing when child processes were left running. For good measure, I added a timeout mechanism just so we don't ever block on SSH server shutdown longer than 10s.

This can be verified on currentmain by starting a workspace that has a shutdown script, and running the following command:

ssh coder.testa "/bin/bash -c 'trap \"sleep 6000\" SIGTERM; sleep 6000'"

The stopping the workspace. Shutdown scripts won't run. With this change, they do.

Fixes#17108

@@ -0,0 +1,24 @@
//go:build !windows
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Review: These are copy-pasta from agentscripts package.

@mafredrimafredri marked this pull request as ready for reviewApril 2, 2025 15:54

func cmdCancel(ctx context.Context, logger slog.Logger, cmd *exec.Cmd) func() error {
return func() error {
logger.Debug(ctx, "cmdCancel: sending interrupt to process", slog.F("pid", cmd.Process.Pid))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

📋🍝 suggestion: check thatcmd.Process != nil as it could have exited.

s.logger.Debug(ctx, "closing server")

// Stop accepting new connections.
s.logger.Debug(ctx, "closing all active listeners")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

suggestion: log the number of active listeners

mafredri reacted with thumbs up emoji
// Close all active sessions to gracefully
// terminate client connections.
s.logger.Debug(ctx, "closing all active sessions")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

suggestion: log the number of active sessions

mafredri reacted with thumbs up emoji
for l := range s.listeners {
_ = l.Close()
}
s.logger.Debug(ctx, "closing all active connections")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

suggestion: log the number of active conns

mafredri reacted with thumbs up emoji
// stop accepting new ones. If all processes have not exited after
// 10 seconds, we just log it and move on as it's more important
// to run the shutdown scripts.
sshShutdownCtx, sshShutdownCancel := context.WithTimeout(a.hardCtx, 10*time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I can foresee someone wanting to adjust this timeout. Maybe add a CLI option for it?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think that if we do get requests to make this configurable, it will help drive the implementation. I'm not certain this approach is the end-goal we want and I would like to avoid exposing functionality that may limit future implementations.

When we start solving#6175, changing this behavior may be relevant.

I'll go ahead and reduce this to 5 seconds as well as the default docker grace timeout is 10s I believe.

The original motivation behind closing SSH before running shutdown scripts is to ensure that commands doing work exit before potentially doing backups of state or something along those lines. I think giving those commands a grace of 5 seconds is plenty in most cases.

johnstcn reacted with thumbs up emoji
@@ -1044,6 +1055,11 @@ func (s *Server) trackSession(ss ssh.Session, add bool) (ok bool) {
// Close the server and all active connections. Server can be re-used
// after Close is done.
func (s *Server) Close() error {
return s.close(context.Background())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do we need this extra method with the context argument? I don't see it used elsewhere.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Originally I intended to uses.close inShutdown too, but I oopsied that change. I'll revert this though since ctx is only used for logger and we don't even utilize that feature often.

johnstcn reacted with thumbs up emoji
@@ -1082,15 +1103,36 @@ func (s *Server) Close() error {
s.closing = nil
s.mu.Unlock()

s.logger.Debug(ctx, "closing server done")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

suggestion: log elapsed time for closing

@mafredrimafredri merged commitb61f0ab intomainApr 3, 2025
29 checks passed
@mafredrimafredri deleted the mafredri/fix-agent-agentssh-close-block branchApril 3, 2025 13:01
@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsApr 3, 2025
@mafredrimafredri changed the titlefix(agent): improve SSH server shutdown and continue after timeoutfix(agent): ensure SSH server shutdown with process groupsApr 3, 2025
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@johnstcnjohnstcnjohnstcn approved these changes

@spikecurtisspikecurtisAwaiting requested review from spikecurtis

@aslilacaslilacAwaiting requested review from aslilac

Assignees

@mafredrimafredri

Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

investigate whyrun_on_stop doesn't work consistently
2 participants
@mafredri@johnstcn

[8]ページ先頭

©2009-2025 Movatter.jp