Movatterモバイル変換

Copy link

Member

johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I still have to read some more but adding my comments so far.

agent/agentcontainers/api.goShow resolvedHide resolved

agent/agentcontainers/api.go OutdatedShow resolvedHide resolved

Comment on lines 1093 to 1099

		err:=api.dccli.Exec(agentCtx,dc.WorkspaceFolder,dc.ConfigPath,agentPath, []string{"agent"},
		WithContainerID(container.ID),
		WithRemoteEnv(
		"CODER_AGENT_URL="+api.subAgentURL,
		"CODER_AGENT_TOKEN="+agent.AuthToken.String(),
		),
		)

Copy link

Member

johnstcnJun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Would it make more sense to background this? If the parent agent ends up crashing and being restarted, we'll lose the sub-agents and have to re-inject them. We can keep track of the expected PID in e.g./.coder-agent/pid

Copy link

MemberAuthor

mafredriJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We could probably background it either on the host or inside the container, but not doing so has some nice properties:

We immediately discover if a sub agent exits/crashes and we could restart immediately (we don't currently)
Job control is simpler (simply cancel the context vs looking up processes and verifying against pid)
With prebuilds, we can exit all sub-agents on claim and re-inject afterwards to ensure a clean slate

For the case where the parent agent crashes, keeping those sub-agents may be a bit hit-and-miss and those dev containers could be affected anyway on agent startup. I'm not aware of agents crashing though so this might not even be a concern we need to be mindful of now?

Copy link

Member

johnstcnJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Fair enough!

Comment on lines +1009 to +1011

		if_,err:=api.ccli.ExecAs(ctx,container.ID,"root","setcap","cap_net_admin+ep",coderPathInsideContainer);err!=nil {
		logger.Warn(ctx,"set CAP_NET_ADMIN on agent binary failed",slog.Error(err))
		}

Copy link

Member

johnstcnJun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This will probably fail unless the container is running as privileged or has the specific CAP_NET_ADMIN privilege set on the container?

Copy link

MemberAuthor

mafredriJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

As per the comment, this is an optional networking boost. (See regular agent bootstrap script, I'll update the comment to reference it.) Did you have some action in mind?

Copy link

Member

johnstcnJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We could check for both of these things before trying? Not a blocker though.

Copy link

MemberAuthor

mafredriJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sure, I don't think it's very high priority but let's create a ticket for future enhancement. 👍🏻

Copy link

MemberAuthor

mafredriJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

coder/internal#683

Comment on lines 1002 to 1005

		// Make sure the agent binary is executable so we can run it.
		if_,err:=api.ccli.ExecAs(ctx,container.ID,"root","chmod","+x",coderPathInsideContainer);err!=nil {
		returnxerrors.Errorf("set agent binary executable: %w",err)
		}

Copy link

Member

johnstcnJun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do we also need tochown the binary so that it's readable by the default container user?

Copy link

MemberAuthor

mafredriJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Good callout. I didn't consider this butdocker cp seems to follow the permissions of the file on disk. So unless wechown it could be nonsense within the container (non-existent user, etc).

It's unlikely that the permissions will be bad for the user (typically 0755), but we could improve it for sure. It might make sense to turn this into a script rather than N amount ofdocker execs.


		logger.Info(ctx,"starting subagent in dev container")

		err:=api.dccli.Exec(agentCtx,dc.WorkspaceFolder,dc.ConfigPath,agentPath, []string{"agent"},

Copy link

Member

johnstcnJun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do we try to execute this as a non-root user?

Copy link

MemberAuthor

mafredriJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

AFAIK this will get executed as the remote user configured bydevcontainer.json (or if unconfigured, container user), which seems like the correct behavior to me.

Comment on lines +879 to +882

		injected:=make(map[uuid.UUID]bool,len(api.injectedSubAgentProcs))
		for_,proc:=rangeapi.injectedSubAgentProcs {
		injected[proc.agent.ID]=true
		}

Copy link

Member

johnstcnJun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This could probably be amap[uuid.UUID]struct{} instead, and then below on line 888 just check for_, found := injected[agent.ID]

Copy link

MemberAuthor

mafredriJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't foresee the memory savings being necessary here (will we have 1000s of sub agents?). The current form reads better and is simpler to use IMO (I always prefer this form for readability where applicable).

Comment on lines +887 to +899

		for_,agent:=rangeagents {
		ifinjected[agent.ID] {
		continue
		}
		err:=api.subAgentClient.Delete(ctx,agent.ID)
		iferr!=nil {
		api.logger.Error(ctx,"failed to delete agent",
		slog.Error(err),
		slog.F("agent_id",agent.ID),
		slog.F("agent_name",agent.Name),
		)
		}
		}

Copy link

Member

johnstcnJun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Should we set an upper bound on deletion attempts and raise if more than say 3 attempts fail?

Copy link

MemberAuthor

mafredriJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Are you suggesting silently ignoring failures unless >= 3 fail? Or perhaps adding retry logic?

Copy link

Member

johnstcnJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm mainly worried about spamming error logs into the void.

Copy link

MemberAuthor

mafredriJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These will be part of the parent agent log 🤔

Copy link

Member

johnstcnJun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We can leave it as-is for now, but I think if this does start happening frequently (or all the time) it may be difficult to catch if it just goes into the parent agent log.

mafredri added3 commits

June 9, 2025 08:49

skip test on win

fb4cdad

fix review comments

cf17cd4

ensure agent binary permissions owner/o+rx

780483b

mafredri force-pushed themafredri/feat-agent-devcontainer-injection-4 branch from466bc6b to780483bCompare

June 9, 2025 09:30

johnstcn approved these changes

Jun 9, 2025

DanielleMaywood approved these changes

Jun 9, 2025

Check dev container (container) properties before attempting to modify CAP_NET_ADMINcoder/internal#683

mafredri added2 commits

June 9, 2025 11:14

update cap net admin comment

934a222

implement fake agent api sub agent methods

56c7ceb

mafredri mentioned this pull request

Jun 9, 2025

Open

mafredri added4 commits

June 9, 2025 11:27

do not set workspace folder if container id

591a9bf

add WithContainerLabelIncludeFilter

9afa5ea

add sub agent env and revert container id change

050177b

add sub agent as part of autostart integration test

d5eb3fc

mafredri force-pushed themafredri/feat-agent-devcontainer-injection-4 branch fromabe9116 tod5eb3fcCompare

June 9, 2025 16:04

fixup! add sub agent env and revert container id change

1629bee

Copy link

MemberAuthor

mafredri commentedJun 9, 2025

@DanielleMaywood @johnstcn I've addedWithContainerLabelIncludeFilter to filter out injection in tests and prevent them from interfering with non-test dev containers.

I also addedWithSubAgentEnv to update the autostart integration test inagent package. It now verifies that a sub agent is started as well.

mafredri added2 commits

June 9, 2025 16:27

fixup! add sub agent as part of autostart integration test

67ee0c5

fixup! fixup! add sub agent env and revert container id change

757dc85

DanielleMaywood approved these changes

johnstcn approved these changes

agent/agent_test.go

		token:=os.Getenv("CODER_AGENT_TOKEN")
		ifurl==""\|\|token=="" {
		_,_=fmt.Fprintln(os.Stderr,"CODER_AGENT_URL and CODER_AGENT_TOKEN must be set")
		return10

Copy link

Member

johnstcnJun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we name these specific status codes as something more meaningful to human eyes?

Copy link

MemberAuthor

mafredriJun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

They don't really have a meaning, just something to differentiate the states and started at 10 since I got tired of bumping everything as I added more stuff 😅, the println should hopefully be helpful here.

agent/agent_test.go

		}
		deferr.Body.Close()

		t.Logf("Sub-agent request payload received: %+v",payload)

Copy link

Member

johnstcnJun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

suggestion: do we perhaps want to allow the caller to run some function against the paylaod?

Copy link

MemberAuthor

mafredriJun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We send it on the channel and do some verification already 👍🏻

agent/agent_test.go

Comment on lines +2178 to +2181

		// The agent will copy "itself", but in the case of this test, the
		// agent is actually this test binary. So we'll tell the test binary
		// to execute the sub-agent main function via this env.
		agentcontainers.WithSubAgentEnv("CODER_TEST_RUN_SUB_AGENT_MAIN=1"),

Copy link

Member

johnstcnJun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

use the correct directory for the sub agent

dc7f7c3

Copy link

MemberAuthor

mafredri commentedJun 10, 2025

One last addendum, implemented a quick 'n dirtypwd check to get the directory usingdevcontainer exec. Noticed the hard-coded path wasn't really working out in many cases.

DanielleMaywood approved these changes

mafredri merged commitfca9917 intomain

31 checks passed

mafredri deleted the mafredri/feat-agent-devcontainer-injection-4 branch

June 10, 2025 09:37

github-actionsbot locked and limited conversation to collaborators