- Notifications
You must be signed in to change notification settings - Fork928
fix: Add reaper to coder agent#2441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
- The coder agent runs as PID 1 in some of our Docker workspaces. In such cases it is the responsibility of the init process to reap dead processes. Failing to do so can result in an inability to create new processes by running out of PIDs. This PR adds a reaper to our agent that is only spawned if it detects that it is PID1.
@kylecarbs what do you think about skipping the test if it's in CI, I'm a little concerned about having a forkexec in our tests... |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Really nice solution!
Uh oh!
There was an error while loading.Please reload this page.
pattrs := &syscall.ProcAttr{ | ||
Dir: pwd, | ||
// Add our marker for identifying the child process. | ||
Env: append(os.Environ(), fmt.Sprintf("%s=true", agentEnvMark)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Should we filter this out from sub-process envs (i.e. when we launch shells)? It probably doesn't matter for correctness, more of a cleanliness thing.
We could also addCODER_AGENT_PID
as part of the sub-process envs for programmatic use (e.g. to enable pprof,kill -USR1 $CODER_AGENT_PID
). We'd need to do that always though, not only when forking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I agree that it's certainly cleaner to filter it out. The only reason I didn't is because I did not want implementation details of thereaper
package to leak outside. In other words, if you're developing the agent I don't want people to have to think about using a special function that filters out arcane environment variables that you don't understand.
I think if it becomes an issue we devote some time to figuring out something unobtrusive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I already merged this but I think a cleaner solution is to have a--noreap
flag that we append to the forkexec cmd. Then we don't pollute subprocess envs or leak to the world whats happening under the hood. What do you think@mafredri
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That makes sense and I agree that's a clean solution@sreya. 👍🏻
@@ -50,6 +52,23 @@ func workspaceAgent() *cobra.Command { | |||
} | |||
defer logWriter.Close() | |||
logger := slog.Make(sloghuman.Sink(cmd.ErrOrStderr()), sloghuman.Sink(logWriter)).Leveled(slog.LevelDebug) | |||
isLinux := runtime.GOOS == "linux" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Any reason to limit this tolinux
only? I suppose this should work on Darwin and other BSD flavors too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
It was causing things to fail when it ran on macos. I don't think limiting it tolinux
is going to be a big deal because the reaper is primarily going to be used for docker containers and on mac that's going to be run in a linux vm anyway.
Uh oh!
There was an error while loading.Please reload this page.
The coder agent runs as PID 1 in some of our Docker workspaces.
In such cases it is the responsibility of the init process to
reap dead processes. Failing to do so can result in an inability
to create new processes by running out of PIDs.
This PR adds a reaper to our agent that is only spawned if it
detects that it is PID1.
resolves#2398