juju/jujuPublic

NotificationsYou must be signed in to change notification settings
Fork563
Star2.6k

MutatingAdmissionWebhook's on Kubernetes affects cluster critical resources #21052

New issue

Open

MutatingAdmissionWebhook's on Kubernetes affects cluster critical resources#21052

Labels

kind/bugindicates a bug in the projectneeds-triageissue is in need of triage

Description

berkayoz

opened

on Oct 30, 2025

Description

Hey team,

We've observed a couple of issues regarding the webhooks(MutatingWebhookConfiguration) installed on a Kubernetes cluster interfering with cluster critical resources and possibly other resources. The situation we are facing is that these webhooks introduce failures in node operations when the webhooks themselves are failing/unreachable.

Specifically the kube-apiserver tries calling these webhooks per request, waits for the timeout amount provided in the webhook configuration before completing the request. Under a failure scenario where these webhooks are unreachable for any reason, this introducesn * timeout latency to each apiserver request withn being the amount of webhooks. From an example deployment,timeoutSeconds: 10 is set, which means if there are 10 webhooks, an added100s of latency to the apiserver calls can be observed.

To note the webhook is handled by the model operator pod in cluster and requests are expected to be routed through the CNI.

The main culprit in my opinion here is the filters and rules on these webhooks being too wide. Again from an example k8s cloud, forjuju-model-admission-controller-myk8s we can see

failurePolicy:IgnorematchPolicy:Equivalentname:admission.juju.isnamespaceSelector:matchLabels:controller.juju.is/id:dbd899ca-70a6-4bb0-8831-2053eec5d5cemodel.juju.is/name:controllerobjectSelector:matchExpressions:    -key:model.juju.is/disable-webhookoperator:DoesNotExistreinvocationPolicy:Neverrules:  -apiGroups:    -'*'apiVersions:    -'*'operations:    -CREATE    -UPDATEresources:    -'*'scope:'*'sideEffects:NonetimeoutSeconds:10

Here we can observe thatCREATE andUPDATE operations for anyapiGroups, anyresources for bothCluster andNamescape scopes will trigger these webhooks. The main filters we seem to have here is thenamespaceSelector andobjectSelector.

From theupstream docs

If the object is a cluster scoped resource other than a Namespace,namespaceSelector has no effect.

based on this information, these webhooks run for any cluster scoped resource such asTokenRequest,TokenReview used for authentication by kubelet, CNI CRs for Cilium, and more gets affected by this.

An example failure scenario is after restarting a control plane machine in a cluster with these webhooks, the restarted node is not able to recover and serve requests properly. The root cause here is that these webhooks will fail until the pod networking is setup by the CNI(e.g.cilium), Cilium itself needs to create/update certain resources(possibly cluster scoped) on startup for operations which results in these requests timing out/not completing in time since these webhooks introduce100s of latency. This gets us into a cyclic dependency where Cilium can not start because webhooks are failing, and webhooks are failing because Cilium is unable to setup the pod networking on the node. To break out this cycle one has to disable the admission webhooks by setting--disable-admission-plugins=MutatingAdmissionWebhook on the kube-apiserver, restarting Cilium to get the node ready and roll back this change to re-enable the webhooks.

Juju version

3.6.11

Cloud

Kubernetes

Expected behaviour

I am not knowledgeable in why and how these webhooks are used but I believe it is best to narrow down the rules, scopes and filtering here to make sure these webhooks only apply to specific resources.

Reproduce / Test

Create multiple models on a k8s multi-node cluster
Restart the control plane VMs
Observe webhook / timeout failures onkube-apiserver logs
Observe Cilium failing to start

Notes & References

No response

Metadata

Assignees

No one assigned

Labels

kind/bugindicates a bug in the projectneeds-triageissue is in need of triage

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MutatingAdmissionWebhook's on Kubernetes affects cluster critical resources #21052

Description

Description

Juju version

Cloud

Expected behaviour

Reproduce / Test

Notes & References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions