Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Better errors fromrunc init#4928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
lifubang merged 1 commit intoopencontainers:mainfromkolyshkin:better-init-errors
Nov 22, 2025

Conversation

@kolyshkin
Copy link
Contributor

@kolyshkinkolyshkin commentedOct 12, 2025
edited
Loading

This currently includes#4930 (and serves as a test for it). Draft until that one is merged.

This currently includes#4951 and is therefore a draft until#4951 is merged.

Inspired by the discussion in#4905.

In case early stage of runc init (nsenter) fails for some reason, it
logs error(s) with FATAL log level, via bail().

The runc init log is read by a parent (runc create/run/exec) and is
logged via normal logrus mechanism, which is all fine and dandy, except
whenrunc init fails, we return the error from the parent (which is
usually not too helpful, for example):

runc run failed: unable to start container process: can't get final child's PID from pipe: EOF

Now, the actual underlying error is from runc init and it was logged
earlier; here's how full runc output looks like:

FATA[0000] nsexec-1[3247792]: failed to unshare remaining namespaces: No space left on device
FATA[0000] nsexec-0[3247790]: failed to sync with stage-1: next state
ERRO[0000] runc run failed: unable to start container process: can't get final child's PID from pipe: EOF

The problem is, upper level runtimes tend to ignore everything except
the last line from runc, and thus error reported by e.g. docker is not
very helpful.

This patch tries to improve the situation by collecting FATAL errors
from runc init and appending those to the error returned (instead of
logging). With it, the above error will look like this:

ERRO[0000] runc run failed: unable to start container process: can't get final child's PID from pipe: EOF; runc init error(s): nsexec-1[141549]: failed to unshare remaining namespaces: No space left on device; nsexec-0[141547]: failed to sync with stage-1: next state

Yes, it is long and ugly, but at least the upper level runtime will report it.

Fixes:#4905

@kolyshkinkolyshkinforce-pushed thebetter-init-errors branch 2 times, most recently from08fb065 to0200b76CompareOctober 13, 2025 19:01
@kolyshkinkolyshkin marked this pull request as draftOctober 13, 2025 22:41
@kolyshkinkolyshkin marked this pull request as ready for reviewOctober 14, 2025 00:05
@kolyshkinkolyshkin marked this pull request as draftOctober 14, 2025 18:47
@kolyshkinkolyshkin marked this pull request as ready for reviewOctober 15, 2025 23:02
@kolyshkinkolyshkin requested review fromAkihiroSuda,cyphar,lifubang andrata and removed request forcypharOctober 15, 2025 23:02
@kolyshkinkolyshkinforce-pushed thebetter-init-errors branch 2 times, most recently fromabf4958 toef31851CompareOctober 24, 2025 01:48
@rata
Copy link
Member

@kolyshkin The extra path (the one no present in the other mentioned PRs) LGTM. But would that print the libcrypto issue? I mean, is the go panic forwarded?

This panic you posted in this issue, for example:#4916 (comment)

It seems packages.microsoft.com is down now, I can't easily test myself (Yeah, I'm sending some messages, but they are probably aware already :)). If you still have that install handy, it will be great if you can test it :)

@kolyshkinkolyshkin marked this pull request as ready for reviewOctober 29, 2025 16:55
@kolyshkin
Copy link
ContributorAuthor

@kolyshkin The extra path (the one no present in the other mentioned PRs) LGTM. But would that print the libcrypto issue? I mean, is the go panic forwarded?

Alas, no. This PR is about the C code ofrunc init (i.e. libct/nsenter).

You can emulate the libcrypto error by adding "panic" call intolibcontainer.Init, and think of ways to catch that in the parent. I thought about it a bit, and haven't found an easy way to catch that. This is because we redirect runc init stdout/stderr to our own stdout/stderr.

rata reacted with thumbs up emoji

@kolyshkinkolyshkin added the backport/1.4-todoA PR in main branch which needs to backported to release-1.4 labelNov 5, 2025
@kolyshkinkolyshkinforce-pushed thebetter-init-errors branch 2 times, most recently froma2a30fe tob7b4ebcCompareNovember 10, 2025 23:44
@kolyshkinkolyshkinforce-pushed thebetter-init-errors branch 2 times, most recently from1332bee to3de1348CompareNovember 12, 2025 05:37
Copy link
Member

@cypharcyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM, aside from the trailing space nit.

@kolyshkinkolyshkinforce-pushed thebetter-init-errors branch 2 times, most recently fromf7a66d5 to4a2b6b3CompareNovember 13, 2025 18:35
In case early stage of runc init (nsenter) fails for some reason, itlogs error(s) with FATAL log level, via bail().The runc init log is read by a parent (runc create/run/exec) and islogged via normal logrus mechanism, which is all fine and dandy, exceptwhen `runc init` fails, we return the error from the parent (which isusually not too helpful, for example):runc run failed: unable to start container process: can't get final child's PID from pipe: EOFNow, the actual underlying error is from runc init and it was loggedearlier; here's how full runc output looks like:FATA[0000] nsexec-1[3247792]: failed to unshare remaining namespaces: No space left on deviceFATA[0000] nsexec-0[3247790]: failed to sync with stage-1: next stateERRO[0000] runc run failed: unable to start container process: can't get final child's PID from pipe: EOFThe problem is, upper level runtimes tend to ignore everything exceptthe last line from runc, and thus error reported by e.g. docker is notvery helpful.This patch tries to improve the situation by collecting FATAL errorsfrom runc init and appending those to the error returned (instead oflogging). With it, the above error will look like this:ERRO[0000] runc run failed: unable to start container process: can't get final child's PID from pipe: EOF; runc init error(s): nsexec-1[141549]: failed to unshare remaining namespaces: No space left on device; nsexec-0[141547]: failed to sync with stage-1: next stateYes, it is long and ugly, but at least the upper level runtime willreport it.Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@lifubanglifubang merged commita7a402a intoopencontainers:mainNov 22, 2025
37 checks passed
@cypharcyphar added backport/1.4-doneA PR in main branch which has been backported to release-1.4 and removed backport/1.4-todoA PR in main branch which needs to backported to release-1.4 labelsNov 26, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@lifubanglifubanglifubang approved these changes

@cypharcypharcyphar approved these changes

@ratarataAwaiting requested review from rata

@AkihiroSudaAkihiroSudaAwaiting requested review from AkihiroSuda

Assignees

No one assigned

Labels

backport/1.4-doneA PR in main branch which has been backported to release-1.4

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Error when starting the containers: "can't get final child's PID from pipe"

4 participants

@kolyshkin@rata@lifubang@cyphar

[8]ページ先頭

©2009-2025 Movatter.jp