Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat(kubelet): Add ResourceHealthStatus for DRA pods#133043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
Jpsassine wants to merge1 commit intokubernetes:master
base:master
Choose a base branch
Loading
fromJpsassine:dra_kep4680_test_gcevm

Conversation

Jpsassine
Copy link

This change introduces the ability for the Kubelet to monitor and report the health of devices allocated via Dynamic Resource Allocation (DRA). This addresses a key part of KEP-4680 by providing visibility into device failures, which helps users and controllers diagnose pod failures.

The implementation includes:

  • A newv1alpha1.NodeHealth gRPC service with aWatchResources stream that DRA plugins can optionally implement.
  • A health information cache within the Kubelet's DRA manager to track the last known health of each device and handle plugin disconnections.
  • An asynchronous update mechanism that triggers a pod sync when a device's health changes.
  • A newallocatedResourcesStatus field inv1.ContainerStatus to expose the device health information to users via the Pod API.

Update vendor

KEP-4680: Fix lint, boilerplate, and codegen issues

Add another e2e test, add TODO for KEP4680 & update test infra helpers

Add Feature Gate e2e test

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow ourrelease note process to remove it.

Instructions for interacting with me using PR comments are availablehere. If you have questions or suggestions related to my behavior, please file an issue against thekubernetes-sigs/prow repository.

@k8s-ci-robotk8s-ci-robot added do-not-merge/work-in-progressIndicates that a PR should not merge because it is a work in progress. size/XXLDenotes a PR that changes 1000+ lines, ignoring generated files. do-not-merge/release-note-label-neededIndicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yesIndicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kindIndicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one. needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one. labelsJul 17, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying thetriage/accepted label and provide further guidance.

Thetriage/accepted label can be added by org members by writing/triage accepted in a comment.

Instructions for interacting with me using PR comments are availablehere. If you have questions or suggestions related to my behavior, please file an issue against thekubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

Hi@Jpsassine. Thanks for your PR.

I'm waiting for akubernetes member to verify that this patch is reasonable to test. If it is, they should reply with/ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors shouldjoin the org to skip this step.

Once the patch is verified, the new status will be reflected by theok-to-test label.

I understand the commands that are listedhere.

Instructions for interacting with me using PR comments are availablehere. If you have questions or suggestions related to my behavior, please file an issue against thekubernetes-sigs/prow repository.

@k8s-ci-robotk8s-ci-robot added needs-ok-to-testIndicates a PR that requires an org member to verify it is safe to test. needs-priorityIndicates a PR lacks a `priority/foo` label and requires one. area/dependencyIssues or PRs related to dependency changes area/kubelet area/test sig/nodeCategorizes an issue or PR as relevant to SIG Node. sig/testingCategorizes an issue or PR as relevant to SIG Testing. wg/device-managementCategorizes an issue or PR as relevant to WG Device Management. and removed do-not-merge/needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one. labelsJul 17, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR isNOT APPROVED

This pull-request has been approved by:Jpsassine
Once this PR has been reviewed and has the lgtm label, please assignklueska for approval. For more information seethe Code Review Process.

The full list of commands accepted by this bot can be foundhere.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing/approve in a comment
Approvers can cancel approval by writing/approve cancel in a comment

@JpsassineJpsassineforce-pushed thedra_kep4680_test_gcevm branch 3 times, most recently from1699243 toeee2a8dCompareJuly 17, 2025 22:02
This change introduces the ability for the Kubelet to monitor and reportthe health of devices allocated via Dynamic Resource Allocation (DRA).This addresses a key part of KEP-4680 by providing visibility intodevice failures, which helps users and controllers diagnose pod failures.The implementation includes:- A new `v1alpha1.NodeHealth` gRPC service with a `WatchResources`  stream that DRA plugins can optionally implement.- A health information cache within the Kubelet's DRA manager to track  the last known health of each device and handle plugin disconnections.- An asynchronous update mechanism that triggers a pod sync when a  device's health changes.- A new `allocatedResourcesStatus` field in `v1.ContainerStatus` to  expose the device health information to users via the Pod API.Update vendorKEP-4680: Fix lint, boilerplate, and codegen issuesAdd another e2e test, add TODO for KEP4680 & update test infra helpersAdd Feature Gate e2e test
@JpsassineJpsassineforce-pushed thedra_kep4680_test_gcevm branch fromeee2a8d to3fb463bCompareJuly 17, 2025 22:04
@SergeyKanzhelev
Copy link
Member

is this PR still needed?

@SergeyKanzhelevSergeyKanzhelev moved this fromTriage toPRs Waiting on Author inSIG Node CI/Test BoardJul 18, 2025
@Jpsassine
Copy link
Author

@SergeyKanzhelev this is for my debugging so that I did not spam my main PR with and keep triggering presubmits.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@andrewsykimandrewsykimAwaiting requested review from andrewsykim

@bart0shbart0shAwaiting requested review from bart0sh

Assignees
No one assigned
Labels
area/dependencyIssues or PRs related to dependency changesarea/kubeletarea/testcncf-cla: yesIndicates the PR's author has signed the CNCF CLA.do-not-merge/needs-kindIndicates a PR lacks a `kind/foo` label and requires one.do-not-merge/release-note-label-neededIndicates that a PR should not merge because it's missing one of the release note labels.do-not-merge/work-in-progressIndicates that a PR should not merge because it is a work in progress.needs-ok-to-testIndicates a PR that requires an org member to verify it is safe to test.needs-priorityIndicates a PR lacks a `priority/foo` label and requires one.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/nodeCategorizes an issue or PR as relevant to SIG Node.sig/testingCategorizes an issue or PR as relevant to SIG Testing.size/XXLDenotes a PR that changes 1000+ lines, ignoring generated files.wg/device-managementCategorizes an issue or PR as relevant to WG Device Management.
Projects
Status:🆕 New
Status: PRs Waiting on Author
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@Jpsassine@k8s-ci-robot@SergeyKanzhelev

[8]ページ先頭

©2009-2025 Movatter.jp