Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
/jujuPublic

MutatingAdmissionWebhook's on Kubernetes affects cluster critical resources #21052

Open
Labels
kind/bugindicates a bug in the projectneeds-triageissue is in need of triage
@berkayoz

Description

@berkayoz

Description

Hey team,

We've observed a couple of issues regarding the webhooks(MutatingWebhookConfiguration) installed on a Kubernetes cluster interfering with cluster critical resources and possibly other resources. The situation we are facing is that these webhooks introduce failures in node operations when the webhooks themselves are failing/unreachable.

Specifically the kube-apiserver tries calling these webhooks per request, waits for the timeout amount provided in the webhook configuration before completing the request. Under a failure scenario where these webhooks are unreachable for any reason, this introducesn * timeout latency to each apiserver request withn being the amount of webhooks. From an example deployment,timeoutSeconds: 10 is set, which means if there are 10 webhooks, an added100s of latency to the apiserver calls can be observed.

To note the webhook is handled by the model operator pod in cluster and requests are expected to be routed through the CNI.

The main culprit in my opinion here is the filters and rules on these webhooks being too wide. Again from an example k8s cloud, forjuju-model-admission-controller-myk8s we can see

failurePolicy:IgnorematchPolicy:Equivalentname:admission.juju.isnamespaceSelector:matchLabels:controller.juju.is/id:dbd899ca-70a6-4bb0-8831-2053eec5d5cemodel.juju.is/name:controllerobjectSelector:matchExpressions:    -key:model.juju.is/disable-webhookoperator:DoesNotExistreinvocationPolicy:Neverrules:  -apiGroups:    -'*'apiVersions:    -'*'operations:    -CREATE    -UPDATEresources:    -'*'scope:'*'sideEffects:NonetimeoutSeconds:10

Here we can observe thatCREATE andUPDATE operations for anyapiGroups, anyresources for bothCluster andNamescape scopes will trigger these webhooks. The main filters we seem to have here is thenamespaceSelector andobjectSelector.

From theupstream docs

If the object is a cluster scoped resource other than a Namespace,namespaceSelector has no effect.

based on this information, these webhooks run for any cluster scoped resource such asTokenRequest,TokenReview used for authentication by kubelet, CNI CRs for Cilium, and more gets affected by this.

An example failure scenario is after restarting a control plane machine in a cluster with these webhooks, the restarted node is not able to recover and serve requests properly. The root cause here is that these webhooks will fail until the pod networking is setup by the CNI(e.g.cilium), Cilium itself needs to create/update certain resources(possibly cluster scoped) on startup for operations which results in these requests timing out/not completing in time since these webhooks introduce100s of latency. This gets us into a cyclic dependency where Cilium can not start because webhooks are failing, and webhooks are failing because Cilium is unable to setup the pod networking on the node. To break out this cycle one has to disable the admission webhooks by setting--disable-admission-plugins=MutatingAdmissionWebhook on the kube-apiserver, restarting Cilium to get the node ready and roll back this change to re-enable the webhooks.

Juju version

3.6.11

Cloud

Kubernetes

Expected behaviour

I am not knowledgeable in why and how these webhooks are used but I believe it is best to narrow down the rules, scopes and filtering here to make sure these webhooks only apply to specific resources.

Reproduce / Test

  • Create multiple models on a k8s multi-node cluster
  • Restart the control plane VMs
  • Observe webhook / timeout failures onkube-apiserver logs
  • Observe Cilium failing to start

Notes & References

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugindicates a bug in the projectneeds-triageissue is in need of triage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp