Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Try to use pidfd and epoll to wait init process exit#4517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
lifubang wants to merge2 commits intoopencontainers:main
base:main
Choose a base branch
Loading
fromlifubang:kill-via-pidfd

Conversation

@lifubang
Copy link
Member

@lifubanglifubang commentedNov 9, 2024
edited
Loading

This PR does some optimizations forrunc delete -f.

  1. Nowadays, go runtime has tried to useunix.PidFDSendSignal to send signal to the process,
    this is helpful to reduce the risk of pid reuse attack. So we should replaceunix.Kill with
    os.Process.Signal in runc when possible.
  2. Butos.Process.Wait is used to wait the child process, to wait a unrelated process, we
    should introduce pidfd & epoll to reduce the sleep time when we want to detect the init
    process exited or not.
  3. For the kernel which doesn't support pidfd & epoll solution, we will fall back to the traditional
    unix.Kill solution, but for stopped containers or containers running in a low load machine,
    we don't need to wait 100ms to do the next detection.

Close:#4512

@lifubang
Copy link
MemberAuthor

@abel-von PTAL

@kolyshkin
Copy link
Contributor

It seems that the first commit can be merged now and is definitely an improvement.

For the rest of it, give me a few days to review.

@kolyshkin
Copy link
Contributor

Reviewing this reminded me the next step needed for pidfd support in Go, so I wrote this proposal:golang/go#70352

rata reacted with thumbs up emoji

@lifubang
Copy link
MemberAuthor

Reviewing this reminded me the next step needed for pidfd support in Go, so I wrote this proposal:golang/go#70352

Wonderful proposal, I used to think that golang wouldn't support similar interfaces, but I think it's very useful, looking forward its coming.

@abel-von
Copy link

LGTM

@kolyshkin
Copy link
Contributor

@lifubang are you going to keep working on it? This looks very good overall

@lifubang
Copy link
MemberAuthor

@lifubang are you going to keep working on it? This looks very good overall

Thanks, I'll work on it later.

@lifubanglifubangforce-pushed thekill-via-pidfd branch 2 times, most recently fromcc64599 to16ae7fcCompareDecember 8, 2024 14:34
Copy link
Contributor

@kolyshkinkolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Frankly, I'm having a hard time reviewing this because it's hard to wrap around all the kill/destroy logic that we already have in place:

  • container.Signal;
  • terminate method ofparentProcess;
  • destroy(c *Container);
  • etc.

All this seem like a bunch of code full of special cases and kludges, and (in my eyes) it cries to be refactored to be more straightforward and clean. Maybe I'm wrong but this stands in the way of me reviewing this.

As I can't finish this, here's my WIP review bits and pieces:

  1. The logic added might also be useful fromfunc destroy(c *Container) error.

  2. Using epoll on pidfd can also be used when there are multiple pids (i.e. from signalAllProcesses, in the current code).

  3. This adds a new public method (container.Kill) when we already havecontainer.Signal. I understand why, but maybe we should call itcontainer.KillSync (orcontainer.EnsureKilled) instead (so it's clear it not just sends the SIGKILL but also waits for the container to be killed). A note tocontainer.Signal should be added referencing the new method. It should also be described in libcontainer's README.

@kolyshkin
Copy link
Contributor

This would be very nice to have in 1.3 but we might not have enough time to have it ready by 1.3-rc1.

@lifubang
Copy link
MemberAuthor

2. Using epoll on pidfd can also be used when there are multiple pids (i.e. from signalAllProcesses, in the current code).

I think it's not worth to do:

  1. We have no killed check mechanism for shared pid ns container killing before;
  2. I thinkcgroupv1 will be deprecated in a short future, and we have changed to usecgroup.kill in cgroupv2.
    WDYT@kolyshkin

If we don't want to introduce a wait mechanism insignalAllProcesses, I think I will complete this PR in a short time.

@lifubanglifubangforce-pushed thekill-via-pidfd branch 2 times, most recently fromedc621d to5f102d8CompareFebruary 25, 2025 03:46
@kolyshkin
Copy link
Contributor

I think it's not worth to do:

  1. We have no killed check mechanism for shared pid ns container killing before;
  2. I thinkcgroupv1 will be deprecated in a short future, and we have changed to usecgroup.kill in cgroupv2.
    WDYT@kolyshkin

If we don't want to introduce a wait mechanism insignalAllProcesses, I think I will complete this PR in a short time.

I agree, let's drop the 'signalAllProcesses` (frankly I don't remember why I thought it would be useful in there).

@kolyshkin
Copy link
Contributor

Will try to review the rest of it tomorrow.

Comment on lines 459 to 460
// We don't need unix.PidfdSendSignal because go runtime will use it if possible.
_=c.Signal(unix.SIGKILL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe usec.signal here as we made sure that we have private pidns.

Maybe don't ignore error, because otherwise you might be waiting for 10 seconds for something that will never happen.

Or, you can be explicit and usePidfdSendSignal here to be 100% sure that it succeeded.

lifubang reacted with thumbs up emoji
Copy link
Contributor

@kolyshkinkolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM, thanks!

@rata PTAL

@rata
Copy link
Member

I'm on PTO, will take a look next week if it's not already merged ;)

Copy link
Member

@ratarata left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I left several comment. I feel like I'm missing something regarding the relation of pidfd and pidns. Sorry if that is the case :-/

Comment on lines 466 to 468
n,err:=unix.EpollWait(epollfd,events,10000)
iferr!=nil {
iferr==unix.EINTR {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can you add a wrapper in libcontainer/internal/linux for unix.EpollWait? Using the retryOnEINTR helper, so we don't have to do it here.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'll rebase to add this based on#4697.

Comment on lines 503 to 507
err:=c.killViaPidfd()
iferr==nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

we need a linter to suggest automatically to use the single-line if here:if err := ...; err == nil {...

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It can't use a single line to check the err, because we have to log the err and go to next.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

But you can check if no errors, IOWerr==nil. It's exactly the same

What am I missing?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We need to print this error in the next line, but anyway, I have declared the err as a block scope var now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That is okay. I think using an else is even simpler. Or something like:

iferr:=c.Kill...();err!=nil {logrus.Debugf("pidfd....")}else {returnnil}

deferunix.Close(epollfd)

event:= unix.EpollEvent{
Events:unix.EPOLLIN,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think doing epoll with only one fd is kind of overkill, I wonder if there are simpler solutions. But if, as I mentioned in another comment, we can kill all the processes in the cgroup, then the epoll might be worth it?

returnerrors.New("container init still running")
}

ifn>0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is there a case where n < 0? I mean, do we need this if here?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'll refactor to cover this condition.

}

func (c*Container)killViaPidfd()error {
pidfd,err:=unix.PidfdOpen(c.initProcess.pid(),0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I know the old code was killing only the init process. But would it make sense to kill all the processes in the cgroup instead?

We can do it as another PR, if that makes sense.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

As you mentioned in the above, we can consider this order:

  1. Use cgroup.kill if the kernel supports it
  2. Use pidfd if the kernel supports it
  3. Just send a signal otherwise

But I think we still need to consider whether the container has a private pid ns or not.
What I think it's that, it's reasonable to kill only the init process for a private pid ns container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Taking into account this:#4517 (comment).

It seems simpler to kill pid1 if it has it's own pidns. Otherwise, cgroup.kill or, if not supported, send a signal.

Does it make sense?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think it's no need to usecgroup.kill to kill all process in the cgroup to kill the pid1 if it has a private own pidns. We just only need to kill the exact pid1 process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Exactly what I said in the last comment :)


// EnsureKilled kills the container and waits for the kernel to finish killing it.
func (c*Container)EnsureKilled()error {
// When a container doesn't have a private pidns, we have to kill all processes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

s/doesn't/does/?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

no

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Opening again, sorry, but the if checks that it does have a private PID namespace. The comment says it doesn't. Something seems odd. Am I missing something?

}

func (c*Container)killViaPidfd()error {
pidfd,err:=unix.PidfdOpen(c.initProcess.pid(),0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Taking into account this:#4517 (comment).

It seems simpler to kill pid1 if it has it's own pidns. Otherwise, cgroup.kill or, if not supported, send a signal.

Does it make sense?

delete.go Outdated
Comment on lines 61 to 68
// When --force is given, we kill all container processes and
// then destroy the container. This is done even for a stopped
// container, because (in case it does not have its own PID
// namespace) there may be some leftover processes in the
// container's cgroup.
ifforce {
returnkillAndDestroy(container)
}
s,err:=container.Status()
iferr!=nil {
returnerr
}
switchs {
caselibcontainer.Stopped:
returncontainer.Destroy()
// For a stopped container, because (in case it does not have
// its own PID namespace) there may be some leftover processes
// in the container's cgroup.
if!container.Config().Namespaces.IsPrivate(configs.NEWPID) {
iferr:=container.EnsureKilled();err!=nil {
returnerr
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I can't wrap my head around this. Before thisPR on force, we were usingcontainer.Signal(). Now, if a container is stopped and doesn't have a private pidns, we don't do nothing here, and at the end we docontainer.Destroy().

The commit msg doesn't mention anything about this. Am I missing something?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, it's a mistake, removed now. Thanks.

@lifubanglifubang marked this pull request as draftApril 8, 2025 10:27
lifubangand others added2 commitsApril 11, 2025 02:59
Signed-off-by: lifubang <lifubang@acmcoder.com>
When using unix.Kill to kill the container, we need a for loop to detectthe init process exited or not manually, we sleep 100ms each time in thecurrent, but for stopped containers or containers running in a low loadmachine, we don't need to wait so long time. This change will reduce thedelete delay in some situations, especially for those pods with manycontainers in.Co-authored-by: Abel Feng <fshb1988@gmail.com>Signed-off-by: lifubang <lifubang@acmcoder.com>
@lifubanglifubang marked this pull request as ready for reviewApril 11, 2025 07:38
returnunix.EpollWait(epfd,events,msec)
})
iferr!=nil {
return0,os.NewSyscallError("epollwait",err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

epoll_wait returns -1 on error.Let's return -1 here too.


// EnsureKilled kills the container and waits for the kernel to finish killing it.
func (c*Container)EnsureKilled()error {
// When a container doesn't have a private pidns, we have to kill all processes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Opening again, sorry, but the if checks that it does have a private PID namespace. The comment says it doesn't. Something seems odd. Am I missing something?

Comment on lines +515 to +521
iferr=c.killViaPidfd();err==nil {
returnnil
}

logrus.Debugf("pidfd & epoll failed, falling back to unix.Signal: %v",err)
}
returnc.kill()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I can't wrap my head around this either. Why don't we use a pidfd inc.Signal() (if that is available, if it's not we fallback to sending a signal to the pid number) and create a new function to wait on the process to die? Again, if that if pidfd is possible, we wait on that, if it's not, we fallback to wait as we do now.

Having a (public/exported)Kill() andKillViaPidfd() seems like something that should be abstracted.

What we are doing now seems kind of complex:

  • If it has a pidns, then try to kill with a pidfd the pid1
  • If it doesn't have a pidns, then try to kill the process sending a signal to the pid number (even if pidfd is supported, why?). The function we call here also handles the case of having a private pidns, which makes this more tricky.

If what I propose doesn't seem okay, I'm open to other ways to simplify this :)

@rata
Copy link
Member

@lifubang if you address the comments and want another review, if you can please comment or request a review via github, that greatly helps me to know when it is ready for another round. Thanks!

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@rataratarata requested changes

@kolyshkinkolyshkinkolyshkin approved these changes

@AkihiroSudaAkihiroSudaAwaiting requested review from AkihiroSuda

Requested changes must be addressed to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@lifubang@kolyshkin@abel-von@rata

[8]ページ先頭

©2009-2025 Movatter.jp