- Notifications
You must be signed in to change notification settings - Fork41k
feat(kubelet): Add ResourceHealthStatus for DRA pods#133043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Conversation
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow ourrelease note process to remove it. Instructions for interacting with me using PR comments are availablehere. If you have questions or suggestions related to my behavior, please file an issue against thekubernetes-sigs/prow repository. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are availablehere. If you have questions or suggestions related to my behavior, please file an issue against thekubernetes-sigs/prow repository. |
Hi@Jpsassine. Thanks for your PR. I'm waiting for akubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listedhere. Instructions for interacting with me using PR comments are availablehere. If you have questions or suggestions related to my behavior, please file an issue against thekubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR isNOT APPROVED This pull-request has been approved by:Jpsassine The full list of commands accepted by this bot can be foundhere. Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
af8ab7a
to6fafada
Compare1699243
toeee2a8d
CompareThis change introduces the ability for the Kubelet to monitor and reportthe health of devices allocated via Dynamic Resource Allocation (DRA).This addresses a key part of KEP-4680 by providing visibility intodevice failures, which helps users and controllers diagnose pod failures.The implementation includes:- A new `v1alpha1.NodeHealth` gRPC service with a `WatchResources` stream that DRA plugins can optionally implement.- A health information cache within the Kubelet's DRA manager to track the last known health of each device and handle plugin disconnections.- An asynchronous update mechanism that triggers a pod sync when a device's health changes.- A new `allocatedResourcesStatus` field in `v1.ContainerStatus` to expose the device health information to users via the Pod API.Update vendorKEP-4680: Fix lint, boilerplate, and codegen issuesAdd another e2e test, add TODO for KEP4680 & update test infra helpersAdd Feature Gate e2e test
eee2a8d
to3fb463b
Compareis this PR still needed? |
@SergeyKanzhelev this is for my debugging so that I did not spam my main PR with and keep triggering presubmits. |
This change introduces the ability for the Kubelet to monitor and report the health of devices allocated via Dynamic Resource Allocation (DRA). This addresses a key part of KEP-4680 by providing visibility into device failures, which helps users and controllers diagnose pod failures.
The implementation includes:
v1alpha1.NodeHealth
gRPC service with aWatchResources
stream that DRA plugins can optionally implement.allocatedResourcesStatus
field inv1.ContainerStatus
to expose the device health information to users via the Pod API.Update vendor
KEP-4680: Fix lint, boilerplate, and codegen issues
Add another e2e test, add TODO for KEP4680 & update test infra helpers
Add Feature Gate e2e test
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR is related to:
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: