- Notifications
You must be signed in to change notification settings - Fork907
feat(agent/agentcontainers): implement sub agent injection#18245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
d49f84e
to011a8aa
Compare91ff08e
to3960774
Compare011a8aa
to63f93bc
Compare3960774
to1cf1905
Compare63f93bc
to0deaab8
Compare1cf1905
tof190036
Compare0deaab8
to8796ba3
Comparedc146ab
tod1447f3
Compare8796ba3
toadbfd45
Compared1447f3
to3547372
CompareThis change adds support for sub agent creation and injection into devcontainers.Closescoder/internal#621
3547372
to7358ee0
Comparea8e4495
toeb29bba
CompareI'm still working on an integration test and the existing mocks are being a PITA (think those are about sorted now though). Promoting this to "ready for review" to get some feedback on the approach@DanielleMaywood@johnstcn. (Also going to break out the "follow-up PR" tasks into new issues before merging this.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I still have to read some more but adding my comments so far.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
err:=api.dccli.Exec(agentCtx,dc.WorkspaceFolder,dc.ConfigPath,agentPath, []string{"agent"}, | ||
WithContainerID(container.ID), | ||
WithRemoteEnv( | ||
"CODER_AGENT_URL="+api.subAgentURL, | ||
"CODER_AGENT_TOKEN="+agent.AuthToken.String(), | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Would it make more sense to background this? If the parent agent ends up crashing and being restarted, we'll lose the sub-agents and have to re-inject them. We can keep track of the expected PID in e.g./.coder-agent/pid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We could probably background it either on the host or inside the container, but not doing so has some nice properties:
- We immediately discover if a sub agent exits/crashes and we could restart immediately (we don't currently)
- Job control is simpler (simply cancel the context vs looking up processes and verifying against pid)
- With prebuilds, we can exit all sub-agents on claim and re-inject afterwards to ensure a clean slate
For the case where the parent agent crashes, keeping those sub-agents may be a bit hit-and-miss and those dev containers could be affected anyway on agent startup. I'm not aware of agents crashing though so this might not even be a concern we need to be mindful of now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Fair enough!
if_,err:=api.ccli.ExecAs(ctx,container.ID,"root","setcap","cap_net_admin+ep",coderPathInsideContainer);err!=nil { | ||
logger.Warn(ctx,"set CAP_NET_ADMIN on agent binary failed",slog.Error(err)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This will probably fail unless the container is running as privileged or has the specific CAP_NET_ADMIN privilege set on the container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
As per the comment, this is an optional networking boost. (See regular agent bootstrap script, I'll update the comment to reference it.) Did you have some action in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We could check for both of these things before trying? Not a blocker though.
agent/agentcontainers/api.go Outdated
// Make sure the agent binary is executable so we can run it. | ||
if_,err:=api.ccli.ExecAs(ctx,container.ID,"root","chmod","+x",coderPathInsideContainer);err!=nil { | ||
returnxerrors.Errorf("set agent binary executable: %w",err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Do we also need tochown
the binary so that it's readable by the default container user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Good callout. I didn't consider this butdocker cp
seems to follow the permissions of the file on disk. So unless wechown
it could be nonsense within the container (non-existent user, etc).
It's unlikely that the permissions will be bad for the user (typically 0755), but we could improve it for sure. It might make sense to turn this into a script rather than N amount ofdocker exec
s.
logger.Info(ctx,"starting subagent in dev container") | ||
err:=api.dccli.Exec(agentCtx,dc.WorkspaceFolder,dc.ConfigPath,agentPath, []string{"agent"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Do we try to execute this as a non-root user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
AFAIK this will get executed as the remote user configured bydevcontainer.json
(or if unconfigured, container user), which seems like the correct behavior to me.
injected:=make(map[uuid.UUID]bool,len(api.injectedSubAgentProcs)) | ||
for_,proc:=rangeapi.injectedSubAgentProcs { | ||
injected[proc.agent.ID]=true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This could probably be amap[uuid.UUID]struct{}
instead, and then below on line 888 just check for_, found := injected[agent.ID]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't foresee the memory savings being necessary here (will we have 1000s of sub agents?). The current form reads better and is simpler to use IMO (I always prefer this form for readability where applicable).
for_,agent:=rangeagents { | ||
ifinjected[agent.ID] { | ||
continue | ||
} | ||
err:=api.subAgentClient.Delete(ctx,agent.ID) | ||
iferr!=nil { | ||
api.logger.Error(ctx,"failed to delete agent", | ||
slog.Error(err), | ||
slog.F("agent_id",agent.ID), | ||
slog.F("agent_name",agent.Name), | ||
) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Should we set an upper bound on deletion attempts and raise if more than say 3 attempts fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Are you suggesting silently ignoring failures unless >= 3 fail? Or perhaps adding retry logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm mainly worried about spamming error logs into the void.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
These will be part of the parent agent log 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We can leave it as-is for now, but I think if this does start happening frequently (or all the time) it may be difficult to catch if it just goes into the parent agent log.
466bc6b
to780483b
Compare
Uh oh!
There was an error while loading.Please reload this page.
This change adds support for sub agent creation and injection into dev
containers.
TODO:
devcontainer.json
parsing,follow-up PR).customizations.coder.devcontainer.name
from docker container label (materializeddevcontainer.json
on creation,follow-up PR)Updatescoder/internal#621