- Notifications
You must be signed in to change notification settings - Fork1.1k
feat(agent/agentcontainers): implement sub agent injection#18245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
d49f84e to011a8aaCompare91ff08e to3960774Compare011a8aa to63f93bcCompare3960774 to1cf1905Compare63f93bc to0deaab8Compare1cf1905 tof190036Compare0deaab8 to8796ba3Comparedc146ab tod1447f3Compare8796ba3 toadbfd45Compared1447f3 to3547372CompareThis change adds support for sub agent creation and injection into devcontainers.Closescoder/internal#621
3547372 to7358ee0Comparea8e4495 toeb29bbaComparemafredri commentedJun 6, 2025
I'm still working on an integration test and the existing mocks are being a PITA (think those are about sorted now though). Promoting this to "ready for review" to get some feedback on the approach@DanielleMaywood@johnstcn. (Also going to break out the "follow-up PR" tasks into new issues before merging this.) |
johnstcn left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I still have to read some more but adding my comments so far.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
| err:=api.dccli.Exec(agentCtx,dc.WorkspaceFolder,dc.ConfigPath,agentPath, []string{"agent"}, | ||
| WithContainerID(container.ID), | ||
| WithRemoteEnv( | ||
| "CODER_AGENT_URL="+api.subAgentURL, | ||
| "CODER_AGENT_TOKEN="+agent.AuthToken.String(), | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Would it make more sense to background this? If the parent agent ends up crashing and being restarted, we'll lose the sub-agents and have to re-inject them. We can keep track of the expected PID in e.g./.coder-agent/pid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We could probably background it either on the host or inside the container, but not doing so has some nice properties:
- We immediately discover if a sub agent exits/crashes and we could restart immediately (we don't currently)
- Job control is simpler (simply cancel the context vs looking up processes and verifying against pid)
- With prebuilds, we can exit all sub-agents on claim and re-inject afterwards to ensure a clean slate
For the case where the parent agent crashes, keeping those sub-agents may be a bit hit-and-miss and those dev containers could be affected anyway on agent startup. I'm not aware of agents crashing though so this might not even be a concern we need to be mindful of now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Fair enough!
| if_,err:=api.ccli.ExecAs(ctx,container.ID,"root","setcap","cap_net_admin+ep",coderPathInsideContainer);err!=nil { | ||
| logger.Warn(ctx,"set CAP_NET_ADMIN on agent binary failed",slog.Error(err)) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This will probably fail unless the container is running as privileged or has the specific CAP_NET_ADMIN privilege set on the container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
As per the comment, this is an optional networking boost. (See regular agent bootstrap script, I'll update the comment to reference it.) Did you have some action in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We could check for both of these things before trying? Not a blocker though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Sure, I don't think it's very high priority but let's create a ticket for future enhancement. 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
| // Make sure the agent binary is executable so we can run it. | ||
| if_,err:=api.ccli.ExecAs(ctx,container.ID,"root","chmod","+x",coderPathInsideContainer);err!=nil { | ||
| returnxerrors.Errorf("set agent binary executable: %w",err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Do we also need tochown the binary so that it's readable by the default container user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Good callout. I didn't consider this butdocker cp seems to follow the permissions of the file on disk. So unless wechown it could be nonsense within the container (non-existent user, etc).
It's unlikely that the permissions will be bad for the user (typically 0755), but we could improve it for sure. It might make sense to turn this into a script rather than N amount ofdocker execs.
| logger.Info(ctx,"starting subagent in dev container") | ||
| err:=api.dccli.Exec(agentCtx,dc.WorkspaceFolder,dc.ConfigPath,agentPath, []string{"agent"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Do we try to execute this as a non-root user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
AFAIK this will get executed as the remote user configured bydevcontainer.json (or if unconfigured, container user), which seems like the correct behavior to me.
| injected:=make(map[uuid.UUID]bool,len(api.injectedSubAgentProcs)) | ||
| for_,proc:=rangeapi.injectedSubAgentProcs { | ||
| injected[proc.agent.ID]=true | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This could probably be amap[uuid.UUID]struct{} instead, and then below on line 888 just check for_, found := injected[agent.ID]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't foresee the memory savings being necessary here (will we have 1000s of sub agents?). The current form reads better and is simpler to use IMO (I always prefer this form for readability where applicable).
| for_,agent:=rangeagents { | ||
| ifinjected[agent.ID] { | ||
| continue | ||
| } | ||
| err:=api.subAgentClient.Delete(ctx,agent.ID) | ||
| iferr!=nil { | ||
| api.logger.Error(ctx,"failed to delete agent", | ||
| slog.Error(err), | ||
| slog.F("agent_id",agent.ID), | ||
| slog.F("agent_name",agent.Name), | ||
| ) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Should we set an upper bound on deletion attempts and raise if more than say 3 attempts fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Are you suggesting silently ignoring failures unless >= 3 fail? Or perhaps adding retry logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm mainly worried about spamming error logs into the void.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
These will be part of the parent agent log 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We can leave it as-is for now, but I think if this does start happening frequently (or all the time) it may be difficult to catch if it just goes into the parent agent log.
466bc6b to780483bCompareabe9116 tod5eb3fcComparemafredri commentedJun 9, 2025
@DanielleMaywood@johnstcn I've added I also added |
| token:=os.Getenv("CODER_AGENT_TOKEN") | ||
| ifurl==""||token=="" { | ||
| _,_=fmt.Fprintln(os.Stderr,"CODER_AGENT_URL and CODER_AGENT_TOKEN must be set") | ||
| return10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can we name these specific status codes as something more meaningful to human eyes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
They don't really have a meaning, just something to differentiate the states and started at 10 since I got tired of bumping everything as I added more stuff 😅, the println should hopefully be helpful here.
| } | ||
| deferr.Body.Close() | ||
| t.Logf("Sub-agent request payload received: %+v",payload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
suggestion: do we perhaps want to allow the caller to run some function against the paylaod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We send it on the channel and do some verification already 👍🏻
| // The agent will copy "itself", but in the case of this test, the | ||
| // agent is actually this test binary. So we'll tell the test binary | ||
| // to execute the sub-agent main function via this env. | ||
| agentcontainers.WithSubAgentEnv("CODER_TEST_RUN_SUB_AGENT_MAIN=1"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
mafredri commentedJun 10, 2025
One last addendum, implemented a quick 'n dirty |
fca9917 intomainUh oh!
There was an error while loading.Please reload this page.

Uh oh!
There was an error while loading.Please reload this page.
This change adds support for sub agent creation and injection into dev
containers.
TODO:
requires on-diskimplemented viadevcontainer.jsonparsing,follow-up PRpwdcheck, we can improve this in the future via "materialized devcontainer.json" viadevcontainer read-configuration).customizations.coder.devcontainer.namefrom docker container label (materializeddevcontainer.jsonon creation,follow-up PR)Updatescoder/internal#621