NotificationsYou must be signed in to change notification settings
Fork907
Star10k

feat(agent/agentcontainers): implement sub agent injection#18245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

mafredri wants to merge8 commits intomain

base:main

Choose a base branch

frommafredri/feat-agent-devcontainer-injection-4

Open

feat(agent/agentcontainers): implement sub agent injection#18245

mafredri wants to merge8 commits intomainfrommafredri/feat-agent-devcontainer-injection-4

+797 −68

Conversation

Copy link

Member

mafredri commentedJun 5, 2025•
edited
Loading

This change adds support for sub agent creation and injection into dev
containers.

TODO:

Pass the correct access URL to sub agent
Add integration test
Use correct directory for sub agent (requires on-diskdevcontainer.json parsing,follow-up PR)
Parse.customizations.coder.devcontainer.name from docker container label (materializeddevcontainer.json on creation,follow-up PR)
Add support for downloading agent binaries for different architectures (follow-up PR)
Make sure there are reduced capabilities for sub-agents (e.g. no containers API,follow-up PR)

Updatescoder/internal#621

github-actionsbot assignedmafredri

Jun 5, 2025

This was referencedJun 5, 2025

chore(agent): update agent proto client#18242

Merged

feat(agent/agentcontainers): refactor Lister to ContainerCLI and implement new methods#18243

Merged

feat(agent/agentcontainers): add Exec method to devcontainers CLI#18244

Merged

mafredri force-pushed themafredri/feat-agent-devcontainer-injection-3 branch fromd49f84e to011a8aaCompare

June 5, 2025 12:51

mafredri force-pushed themafredri/feat-agent-devcontainer-injection-4 branch from91ff08e to3960774Compare

June 5, 2025 12:52

mafredri force-pushed themafredri/feat-agent-devcontainer-injection-3 branch from011a8aa to63f93bcCompare

June 5, 2025 13:59

mafredri force-pushed themafredri/feat-agent-devcontainer-injection-4 branch from3960774 to1cf1905Compare

June 5, 2025 13:59

mafredri force-pushed themafredri/feat-agent-devcontainer-injection-3 branch from63f93bc to0deaab8Compare

June 6, 2025 08:44

mafredri force-pushed themafredri/feat-agent-devcontainer-injection-4 branch from1cf1905 tof190036Compare

June 6, 2025 08:44

mafredri force-pushed themafredri/feat-agent-devcontainer-injection-3 branch from0deaab8 to8796ba3Compare

June 6, 2025 09:30

mafredri force-pushed themafredri/feat-agent-devcontainer-injection-4 branch 2 times, most recently fromdc146ab tod1447f3Compare

June 6, 2025 09:45

mafredri force-pushed themafredri/feat-agent-devcontainer-injection-3 branch from8796ba3 toadbfd45Compare

June 6, 2025 11:20

mafredri force-pushed themafredri/feat-agent-devcontainer-injection-4 branch fromd1447f3 to3547372Compare

June 6, 2025 11:27

Base automatically changed frommafredri/feat-agent-devcontainer-injection-3 tomain

June 6, 2025 11:39

feat(agent/agentcontainers): implement sub agent injection

7358ee0

This change adds support for sub agent creation and injection into devcontainers.Closescoder/internal#621

mafredri force-pushed themafredri/feat-agent-devcontainer-injection-4 branch from3547372 to7358ee0Compare

June 6, 2025 11:39

mafredri added3 commits

June 6, 2025 12:06

implement sub agent url

34aa574

improve doc on container workspace folder, add todo

7a3c8a3

fix coderd and cli tests

eb29bba

mafredri force-pushed themafredri/feat-agent-devcontainer-injection-4 branch froma8e4495 toeb29bbaCompare

June 6, 2025 15:59

fix

aa42ab8

Copy link

MemberAuthor

mafredri commentedJun 6, 2025

I'm still working on an integration test and the existing mocks are being a PITA (think those are about sorted now though). Promoting this to "ready for review" to get some feedback on the approach@DanielleMaywood @johnstcn.

(Also going to break out the "follow-up PR" tasks into new issues before merging this.)

mafredri marked this pull request as ready for review

June 6, 2025 16:14

johnstcn reviewed

Jun 6, 2025

View reviewed changes

Copy link

Member

johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I still have to read some more but adding my comments so far.

agent/agentcontainers/api.goShow resolvedHide resolved

agent/agentcontainers/api.go OutdatedShow resolvedHide resolved

agent/agentcontainers/api.go

Comment on lines +1093 to +1099

		err:=api.dccli.Exec(agentCtx,dc.WorkspaceFolder,dc.ConfigPath,agentPath, []string{"agent"},
		WithContainerID(container.ID),
		WithRemoteEnv(
		"CODER_AGENT_URL="+api.subAgentURL,
		"CODER_AGENT_TOKEN="+agent.AuthToken.String(),
		),
		)

Copy link

Member

johnstcnJun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Would it make more sense to background this? If the parent agent ends up crashing and being restarted, we'll lose the sub-agents and have to re-inject them. We can keep track of the expected PID in e.g./.coder-agent/pid

Copy link

MemberAuthor

mafredriJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We could probably background it either on the host or inside the container, but not doing so has some nice properties:

We immediately discover if a sub agent exits/crashes and we could restart immediately (we don't currently)
Job control is simpler (simply cancel the context vs looking up processes and verifying against pid)
With prebuilds, we can exit all sub-agents on claim and re-inject afterwards to ensure a clean slate

For the case where the parent agent crashes, keeping those sub-agents may be a bit hit-and-miss and those dev containers could be affected anyway on agent startup. I'm not aware of agents crashing though so this might not even be a concern we need to be mindful of now?

Copy link

Member

johnstcnJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Fair enough!

agent/agentcontainers/api.go

Comment on lines +1009 to +1011

		if_,err:=api.ccli.ExecAs(ctx,container.ID,"root","setcap","cap_net_admin+ep",coderPathInsideContainer);err!=nil {
		logger.Warn(ctx,"set CAP_NET_ADMIN on agent binary failed",slog.Error(err))
		}

Copy link

Member

johnstcnJun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This will probably fail unless the container is running as privileged or has the specific CAP_NET_ADMIN privilege set on the container?

Copy link

MemberAuthor

mafredriJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

As per the comment, this is an optional networking boost. (See regular agent bootstrap script, I'll update the comment to reference it.) Did you have some action in mind?

Copy link

Member

johnstcnJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We could check for both of these things before trying? Not a blocker though.

agent/agentcontainers/api.go Outdated

Comment on lines 1002 to 1005

		// Make sure the agent binary is executable so we can run it.
		if_,err:=api.ccli.ExecAs(ctx,container.ID,"root","chmod","+x",coderPathInsideContainer);err!=nil {
		returnxerrors.Errorf("set agent binary executable: %w",err)
		}

Copy link

Member

johnstcnJun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do we also need tochown the binary so that it's readable by the default container user?

Copy link

MemberAuthor

mafredriJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Good callout. I didn't consider this butdocker cp seems to follow the permissions of the file on disk. So unless wechown it could be nonsense within the container (non-existent user, etc).

It's unlikely that the permissions will be bad for the user (typically 0755), but we could improve it for sure. It might make sense to turn this into a script rather than N amount ofdocker execs.

agent/agentcontainers/api.go


		logger.Info(ctx,"starting subagent in dev container")

		err:=api.dccli.Exec(agentCtx,dc.WorkspaceFolder,dc.ConfigPath,agentPath, []string{"agent"},

Copy link

Member

johnstcnJun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do we try to execute this as a non-root user?

Copy link

MemberAuthor

mafredriJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

AFAIK this will get executed as the remote user configured bydevcontainer.json (or if unconfigured, container user), which seems like the correct behavior to me.

agent/agentcontainers/api.go

Comment on lines +879 to +882

		injected:=make(map[uuid.UUID]bool,len(api.injectedSubAgentProcs))
		for_,proc:=rangeapi.injectedSubAgentProcs {
		injected[proc.agent.ID]=true
		}

Copy link

Member

johnstcnJun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This could probably be amap[uuid.UUID]struct{} instead, and then below on line 888 just check for_, found := injected[agent.ID]

Copy link

MemberAuthor

mafredriJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't foresee the memory savings being necessary here (will we have 1000s of sub agents?). The current form reads better and is simpler to use IMO (I always prefer this form for readability where applicable).

agent/agentcontainers/api.go

Comment on lines +887 to +899

		for_,agent:=rangeagents {
		ifinjected[agent.ID] {
		continue
		}
		err:=api.subAgentClient.Delete(ctx,agent.ID)
		iferr!=nil {
		api.logger.Error(ctx,"failed to delete agent",
		slog.Error(err),
		slog.F("agent_id",agent.ID),
		slog.F("agent_name",agent.Name),
		)
		}
		}

Copy link

Member

johnstcnJun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Should we set an upper bound on deletion attempts and raise if more than say 3 attempts fail?

Copy link

MemberAuthor

mafredriJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Are you suggesting silently ignoring failures unless >= 3 fail? Or perhaps adding retry logic?

Copy link

Member

johnstcnJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm mainly worried about spamming error logs into the void.

Copy link

MemberAuthor

mafredriJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These will be part of the parent agent log 🤔

Copy link

Member

johnstcnJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We can leave it as-is for now, but I think if this does start happening frequently (or all the time) it may be difficult to catch if it just goes into the parent agent log.