Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Fix probe behavior for non-running containers and sidecars#133055

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed

Conversation

AadiDev005
Copy link

Fix probe behavior for non-running containers and sidecars
Bug Description
The probe worker in worker.go mishandles liveness and startup probes for non-running containers and sidecar scenarios. This causes test failures in TestDoProbe, particularly for Liveness-4, Startup-0, Startup-4, Startup-5, Startup-6, Startup-7, Startup-8, and Startup-9. The issue seems tied to how doProbe processes container states (e.g., non-running, terminated) and sidecar restart policies.
Changes Made

Updated worker.go to:
Return false for liveness probes on non-running containers (not in Running state).
Set results.Unknown for startup probes on non-running containers, stopping with results.Success when Started=true.
Add sidecar-specific logic to respect restartPolicy: Always (related to#132826).

Adjusted logging to use klog.V(4).Infof for debugging.

Remaining Test Failures
The following tests still fail (see attached test_output.txt):

Liveness-4: Expects continue == true for a non-running, non-terminated container, but gets false. Should liveness probes continue in this case?
Startup-0: Expects Success, but gets UNKNOWN for a non-running container. Should startup probes assume success earlier?
Startup-4: Expects Failure, but gets UNKNOWN for a non-running container. Is this a test expectation mismatch?
Startup-5: Expects continue == false for a terminated container, but gets true. Likely tied to sidecar restart logic.
Startup-6, Startup-7: Expect Success, but get UNKNOWN for non-running containers. Similar to Startup-0.
Startup-8: Expects continue == false for a terminated container, but gets true. Same as Startup-5.
Startup-9: Expects continue == true, but gets false when Started=true. Should startup probes continue after success?

Linker Warning
The build shows: ld: warning: search path '/usr/local/libtorch/lib' not found. It doesn’t seem to affect the tests, but is this safe to ignore, or should I update the build config to remove this path?
Reviewers

@gjkim42
@SergeyKanzhelev

Questions

Is the doProbe logic correct for liveness probes on non-running containers (returning false)?
Should startup probes return Success or Failure instead of UNKNOWN for non-running containers?
For sidecar containers with restartPolicy: Always, should probes continue after termination?
Any suggestions to fix the remaining test failures?

Logs

Attached test_output.txt with debug logs from worker.go.

@k8s-ci-robotk8s-ci-robot added the do-not-merge/work-in-progressIndicates that a PR should not merge because it is a work in progress. labelJul 18, 2025
@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow ourrelease note process to remove it.

Instructions for interacting with me using PR comments are availablehere. If you have questions or suggestions related to my behavior, please file an issue against thekubernetes-sigs/prow repository.

@k8s-ci-robotk8s-ci-robot added do-not-merge/release-note-label-neededIndicates that a PR should not merge because it's missing one of the release note labels. size/XXLDenotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yesIndicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kindIndicates a PR lacks a `kind/foo` label and requires one. labelsJul 18, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are availablehere. If you have questions or suggestions related to my behavior, please file an issue against thekubernetes-sigs/prow repository.

@k8s-ci-robotk8s-ci-robot added needs-rebaseIndicates a PR cannot be merged because it has merge conflicts with HEAD. do-not-merge/needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one. labelsJul 18, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying thetriage/accepted label and provide further guidance.

Thetriage/accepted label can be added by org members by writing/triage accepted in a comment.

Instructions for interacting with me using PR comments are availablehere. If you have questions or suggestions related to my behavior, please file an issue against thekubernetes-sigs/prow repository.

@k8s-ci-robotk8s-ci-robot added needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-testIndicates a PR that requires an org member to verify it is safe to test. labelsJul 18, 2025
@k8s-ci-robot
Copy link
Contributor

Hi@AadiDev005. Thanks for your PR.

I'm waiting for akubernetes member to verify that this patch is reasonable to test. If it is, they should reply with/ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors shouldjoin the org to skip this step.

Once the patch is verified, the new status will be reflected by theok-to-test label.

I understand the commands that are listedhere.

Instructions for interacting with me using PR comments are availablehere. If you have questions or suggestions related to my behavior, please file an issue against thekubernetes-sigs/prow repository.

@k8s-ci-robotk8s-ci-robot added the needs-priorityIndicates a PR lacks a `priority/foo` label and requires one. labelJul 18, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR isNOT APPROVED

This pull-request has been approved by:AadiDev005
Once this PR has been reviewed and has the lgtm label, please assignsoltysh,yujuhong for approval. For more information seethe Code Review Process.

The full list of commands accepted by this bot can be foundhere.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing/approve in a comment
Approvers can cancel approval by writing/approve cancel in a comment

@k8s-ci-robotk8s-ci-robot added area/dependencyIssues or PRs related to dependency changes area/kubelet sig/cliCategorizes an issue or PR as relevant to SIG CLI. sig/nodeCategorizes an issue or PR as relevant to SIG Node. labelsJul 18, 2025
@k8s-ci-robotk8s-ci-robot removed the do-not-merge/needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one. labelJul 18, 2025
@github-project-automationgithub-project-automationbot moved this toNeeds Triage inSIG CLIJul 18, 2025
@AadiDev005
Copy link
Author

Hi@gjkim42,

I’ve been working on fixing the sidecar startup probe issue (#132826) where the main container starts despite a sidecar failing its startup probe when restartPolicy: Never is set for the pod. I updated worker.go to include the isRestartableInitContainer check and modified the probe logic to continue probing sidecar containers with restartPolicy: Always after restarts. I also adjusted liveness and startup probe handling for non-running/terminated containers to fix test failures.

However, I’m stuck with persistent TestDoProbe failures:

Liveness-4: Expects continue == true for a non-running, non-terminated container, but gets false.
Startup-0, Startup-4, Startup-6, Startup-7: Expect Success or Failure, but get UNKNOWN for non-running containers.
Startup-5, Startup-8: Expect continue == false for terminated containers, but get true.
Startup-9: Expects continue == true when Started=true, but gets false.
I’ve attached the test output (test_output.txt) from running go test -timeout 30s -run ^TestDoProbe$ k8s.io/kubernetes/pkg/kubelet/prober. There’s also a linker warning (ld: warning: search path '/usr/local/libtorch/lib' not found) that I’m unsure how to resolve.

Could you help clarify:

Should liveness probes continue (true) for non-running, non-terminated containers?
Should startup probes return Success or Failure instead of UNKNOWN for non-running containers?
For sidecar containers, is the logic to continue probing after termination correct?
Any advice on fixing the test failures or the linker warning?
Thanks for your guidance!
test_output.txt

Copy link
Member

@gjkim42gjkim42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

First of all, you just made a PR that changes 273 files, and it is almost impossible to review. You have to make a PR as small as possible.

I'd like the PR to have a small fix (just like#132996) and corresponding tests for it.

@github-project-automationgithub-project-automationbot moved this fromNeeds Triage toIn Progress inSIG CLIJul 18, 2025
@dims
Copy link
Member

please avoid introducing unnecessary new dependencies as well! +1 to what@gjkim42 said!

@github-project-automationgithub-project-automationbot moved this fromIn Progress toClosed inSIG CLIJul 19, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@gjkim42gjkim42gjkim42 requested changes

@brianpursleybrianpursleyAwaiting requested review from brianpursley

@derekwaynecarrderekwaynecarrAwaiting requested review from derekwaynecarr

Assignees
No one assigned
Labels
area/dependencyIssues or PRs related to dependency changesarea/kubeletcncf-cla: yesIndicates the PR's author has signed the CNCF CLA.do-not-merge/needs-kindIndicates a PR lacks a `kind/foo` label and requires one.do-not-merge/release-note-label-neededIndicates that a PR should not merge because it's missing one of the release note labels.do-not-merge/work-in-progressIndicates that a PR should not merge because it is a work in progress.needs-ok-to-testIndicates a PR that requires an org member to verify it is safe to test.needs-priorityIndicates a PR lacks a `priority/foo` label and requires one.needs-rebaseIndicates a PR cannot be merged because it has merge conflicts with HEAD.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/cliCategorizes an issue or PR as relevant to SIG CLI.sig/nodeCategorizes an issue or PR as relevant to SIG Node.size/XXLDenotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Status: Closed
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

4 participants
@AadiDev005@k8s-ci-robot@dims@gjkim42

[8]ページ先頭

©2009-2025 Movatter.jp