- Notifications
You must be signed in to change notification settings - Fork41k
Fix probe behavior for non-running containers and sidecars#133055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow ourrelease note process to remove it. Instructions for interacting with me using PR comments are availablehere. If you have questions or suggestions related to my behavior, please file an issue against thekubernetes-sigs/prow repository. |
PR needs rebase. Instructions for interacting with me using PR comments are availablehere. If you have questions or suggestions related to my behavior, please file an issue against thekubernetes-sigs/prow repository. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are availablehere. If you have questions or suggestions related to my behavior, please file an issue against thekubernetes-sigs/prow repository. |
Hi@AadiDev005. Thanks for your PR. I'm waiting for akubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listedhere. Instructions for interacting with me using PR comments are availablehere. If you have questions or suggestions related to my behavior, please file an issue against thekubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR isNOT APPROVED This pull-request has been approved by:AadiDev005 The full list of commands accepted by this bot can be foundhere. Needs approval from an approver in each of these files: Approvers can indicate their approval by writing |
Hi@gjkim42, I’ve been working on fixing the sidecar startup probe issue (#132826) where the main container starts despite a sidecar failing its startup probe when restartPolicy: Never is set for the pod. I updated worker.go to include the isRestartableInitContainer check and modified the probe logic to continue probing sidecar containers with restartPolicy: Always after restarts. I also adjusted liveness and startup probe handling for non-running/terminated containers to fix test failures. However, I’m stuck with persistent TestDoProbe failures: Liveness-4: Expects continue == true for a non-running, non-terminated container, but gets false. Could you help clarify: Should liveness probes continue (true) for non-running, non-terminated containers? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
First of all, you just made a PR that changes 273 files, and it is almost impossible to review. You have to make a PR as small as possible.
- https://google.github.io/eng-practices/review/developer/small-cls.html
- https://www.kubernetes.dev/docs/guide/pull-requests/#smaller-is-better-small-commits-small-pull-requests
I'd like the PR to have a small fix (just like#132996) and corresponding tests for it.
please avoid introducing unnecessary new dependencies as well! +1 to what@gjkim42 said! |
Fix probe behavior for non-running containers and sidecars
Bug Description
The probe worker in worker.go mishandles liveness and startup probes for non-running containers and sidecar scenarios. This causes test failures in TestDoProbe, particularly for Liveness-4, Startup-0, Startup-4, Startup-5, Startup-6, Startup-7, Startup-8, and Startup-9. The issue seems tied to how doProbe processes container states (e.g., non-running, terminated) and sidecar restart policies.
Changes Made
Updated worker.go to:
Return false for liveness probes on non-running containers (not in Running state).
Set results.Unknown for startup probes on non-running containers, stopping with results.Success when Started=true.
Add sidecar-specific logic to respect restartPolicy: Always (related to#132826).
Adjusted logging to use klog.V(4).Infof for debugging.
Remaining Test Failures
The following tests still fail (see attached test_output.txt):
Liveness-4: Expects continue == true for a non-running, non-terminated container, but gets false. Should liveness probes continue in this case?
Startup-0: Expects Success, but gets UNKNOWN for a non-running container. Should startup probes assume success earlier?
Startup-4: Expects Failure, but gets UNKNOWN for a non-running container. Is this a test expectation mismatch?
Startup-5: Expects continue == false for a terminated container, but gets true. Likely tied to sidecar restart logic.
Startup-6, Startup-7: Expect Success, but get UNKNOWN for non-running containers. Similar to Startup-0.
Startup-8: Expects continue == false for a terminated container, but gets true. Same as Startup-5.
Startup-9: Expects continue == true, but gets false when Started=true. Should startup probes continue after success?
Linker Warning
The build shows: ld: warning: search path '/usr/local/libtorch/lib' not found. It doesn’t seem to affect the tests, but is this safe to ignore, or should I update the build config to remove this path?
Reviewers
@gjkim42
@SergeyKanzhelev
Questions
Is the doProbe logic correct for liveness probes on non-running containers (returning false)?
Should startup probes return Success or Failure instead of UNKNOWN for non-running containers?
For sidecar containers with restartPolicy: Always, should probes continue after termination?
Any suggestions to fix the remaining test failures?
Logs
Attached test_output.txt with debug logs from worker.go.