zalando-incubator/kube-metrics-adapterPublic

NotificationsYou must be signed in to change notification settings
Fork114
Star536

General purpose metrics adapter for Kubernetes HPA metrics

License

MIT license

536 stars 114 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 593 Commits
.github		.github
docs		docs
example		example
hack		hack
how-to		how-to
pkg		pkg
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.zappr.yaml		.zappr.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MAINTAINERS		MAINTAINERS
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
delivery.yaml		delivery.yaml
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Repository files navigation

kube-metrics-adapter

Kube Metrics Adapter is a general purpose metrics adapter for Kubernetes thatcan collect and serve custom and external metrics for Horizontal PodAutoscaling.

It supports scaling based onPrometheus metrics,SQS queues and others out of the box.

It discovers Horizontal Pod Autoscaling resources and starts to collect therequested metrics and stores them in memory. It's implemented using thecustom-metrics-apiserverlibrary.

Here's an example of aHorizontalPodAutoscaler resource configured to getrequests-per-second metrics from each pod of the deploymentmyapp.

apiVersion:autoscaling/v2kind:HorizontalPodAutoscalermetadata:name:myapp-hpaannotations:# metric-config.<metricType>.<metricName>.<collectorType>/<configKey>metric-config.pods.requests-per-second.json-path/json-key:"$.http_server.rps"metric-config.pods.requests-per-second.json-path/path:/metricsmetric-config.pods.requests-per-second.json-path/port:"9090"spec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:myappminReplicas:1maxReplicas:10metrics:  -type:Podspods:metric:name:requests-per-secondtarget:averageValue:1ktype:AverageValue

Themetric-config.* annotations are used by thekube-metrics-adapter toconfigure a collector for getting the metrics. In the above example itconfigures ajson-path pod collector.

Kubernetes compatibility

Like thesupportpolicy offeredfor Kubernetes, this project aims to support the latest three minor releases ofKubernetes.

The default supported API isautoscaling/v2 (available sincev1.23).This API MUST be available in the cluster which is the default.

Building

This project usesGo modules asintroduced in Go 1.11 therefore you need Go >=1.11 installed in order to build.If using Go 1.11 you also need toactivate Modulesupport.

Assuming Go has been setup with module support it can be built simply by running:

export GO111MODULE=on# needed if the project is checked out in your $GOPATH.$ make

Install in Kubernetes

Clone this repository, and run as below:

$cd kube-metrics-adapter/docs$ kubectl apply -f.

Collectors

Collectors are different implementations for getting metrics requested by anHPA resource. They are configured based on HPA resources and started on-demand by thekube-metrics-adapter to only collect the metrics required for scaling the application.

The collectors are configured either simply based on the metrics defined in anHPA resource, or via additional annotations on the HPA resource.

Pod collector

The pod collector allows collecting metrics from each pod matching the label selector defined in the HPA'sscaleTargetRef.Currently onlyjson-path collection is supported.

Supported HPA`scaleTargetRef`

The Pod Collector utilizes thescaleTargetRef specified in an HPA resource to obtain the label selector from the referenced Kubernetes object. This enables the identification and management of pods associated with that object. Currently, the supported Kubernetes objects for this operation are:Deployment,StatefulSet andRollout.

Supported metrics

Metric	Description	Type	K8s Versions
custom	No predefined metrics. Metrics are generated from user defined queries.	Pods	`>=1.12`

Example

This is an example of using the pod collector to collect metrics from a jsonmetrics endpoint of each pod matched by the HPA.

apiVersion:autoscaling/v2kind:HorizontalPodAutoscalermetadata:name:myapp-hpaannotations:# metric-config.<metricType>.<metricName>.<collectorType>/<configKey>metric-config.pods.requests-per-second.json-path/json-key:"$.http_server.rps"metric-config.pods.requests-per-second.json-path/json-eval:"ceil($['active processes'] / $['total processes'] * 100)"# cannot use both json-eval and json-keymetric-config.pods.requests-per-second.json-path/path:/metricsmetric-config.pods.requests-per-second.json-path/port:"9090"metric-config.pods.requests-per-second.json-path/scheme:"https"metric-config.pods.requests-per-second.json-path/aggregator:"max"metric-config.pods.requests-per-second.json-path/interval:"60s"# optionalmetric-config.pods.requests-per-second.json-path/min-pod-ready-age:"30s"# optionalspec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:myappminReplicas:1maxReplicas:10metrics:  -type:Podspods:metric:name:requests-per-secondtarget:averageValue:1ktype:AverageValue

The pod collector is configured through the annotations which specify thecollector namejson-path and a set of configuration options for thecollector.json-key defines the json-path query for extracting the rightmetric. This assumes the pod is exposing metrics in JSON format. For the aboveexample the following JSON data would be expected:

{"http_server": {"rps":0.5  }}

The json-path query support depends on thegithub.com/spyzhov/ajson library.See the README for possible queries. It's expected that the metric you queryreturns something that can be turned into afloat64.

Thejson-eval configuration option allows for more complex calculations to beperformed on the extracted metric. Thejson-eval expression is evaluated usingajson's script engine.

The other configuration optionspath,port andscheme specify where the metricsendpoint is exposed on the pod. Thepath andport options do not have default valuesso they must be defined. Thescheme is optional and defaults tohttp.

Theaggregator configuration option specifies the aggregation function used to aggregatevalues of JSONPath expressions that evaluate to arrays/slices of numbers.It's optional but when the expression evaluates to an array/slice, it's absence willproduce an error. The supported aggregation functions areavg,max,min andsum.

Theraw-query configuration option specifies the query params to send along to the endpoint:

metric-config.pods.requests-per-second.json-path/path:/metricsmetric-config.pods.requests-per-second.json-path/port:"9090"metric-config.pods.requests-per-second.json-path/raw-query:"foo=bar&baz=bop"

will create a URL like this:

http://<podIP>:9090/metrics?foo=bar&baz=bop

There are also configuration options for custom (connect and request) timeouts when querying pods for metrics:

metric-config.pods.requests-per-second.json-path/request-timeout:2smetric-config.pods.requests-per-second.json-path/connect-timeout:500ms

The default for both of the above values is 15 seconds.

Themin-pod-ready-age configuration option instructs the service to start collecting metrics from the pods only if they are "older" (time elapsed after pod reached "Ready" state) than the specified amount of time.This is handy when pods need to warm up before HPAs will start tracking their metrics.

The default value is 0 seconds.

Prometheus collector

The Prometheus collector is a generic collector which can map Prometheusqueries to metrics that can be used for scaling. This approach is differentfrom how it's done in thek8s-prometheus-adapterwhere all available Prometheus metrics are collectedand transformed into metrics which the HPA can scale on, and there is nopossibility to do custom queries.With the approach implemented here, users can define custom queries and only metricsreturned from those queries will be available, reducing the total number ofmetrics stored.

One downside of this approach is that bad performing queries can slow down/killPrometheus, so it can be dangerous to allow in a multi tenant cluster. It'salso not possible to restrict the available metrics using something like RBACsince any user would be able to create the metrics based on a custom query.

I still believe custom queries are more useful, but it's good to be aware ofthe trade-offs between the two approaches.

Supported metrics

Metric	Description	Type	Kind	K8s Versions
`prometheus-query`	Generic metric which requires a user defined query.	External		`>=1.12`
custom	No predefined metrics. Metrics are generated from user defined queries.	Object	any	`>=1.12`

Example: External Metric

This is an example of an HPA configured to get metrics based on a Prometheusquery. The query is defined in the annotationmetric-config.external.processed-events-per-second.prometheus/querywhereprocessed-events-per-second is the query name which will be associatedwith the result of the query.This allows having multiple prometheus queries associated with a single HPA.

apiVersion:autoscaling/v2kind:HorizontalPodAutoscalermetadata:name:myapp-hpaannotations:# This annotation is optional.# If specified, then this prometheus server is used,# instead of the prometheus server specified as the CLI argument `--prometheus-server`.metric-config.external.processed-events-per-second.prometheus/prometheus-server:http://prometheus.my-namespace.svc# metric-config.<metricType>.<metricName>.<collectorType>/<configKey>metric-config.external.processed-events-per-second.prometheus/query:|      scalar(sum(rate(event-service_events_count{application="event-service",processed="true"}[1m])))metric-config.external.processed-events-per-second.prometheus/interval:"60s"# optionalspec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:custom-metrics-consumerminReplicas:1maxReplicas:10metrics:  -type:Externalexternal:metric:name:processed-events-per-secondselector:matchLabels:type:prometheustarget:type:AverageValueaverageValue:"10"

Example: Object Metric [DEPRECATED]

Note: Prometheus Object metrics aredeprecated and will most likely beremoved in the future. Use the Prometheus External metrics instead as describedabove.

This is an example of an HPA configured to get metrics based on a Prometheusquery. The query is defined in the annotationmetric-config.object.processed-events-per-second.prometheus/query whereprocessed-events-per-second is the metric name which will be associated withthe result of the query.

It also specifies an annotationmetric-config.object.processed-events-per-second.prometheus/per-replica whichinstructs the collector to treat the results as an average over all podstargeted by the HPA. This makes it possible to mimic the behavior oftargetAverageValue which is not implemented for metric typeObject as ofKubernetes v1.10. (It will most likely come in v1.12).

apiVersion:autoscaling/v2beta1kind:HorizontalPodAutoscalermetadata:name:myapp-hpaannotations:# metric-config.<metricType>.<metricName>.<collectorType>/<configKey>metric-config.object.processed-events-per-second.prometheus/query:|      scalar(sum(rate(event-service_events_count{application="event-service",processed="true"}[1m])))metric-config.object.processed-events-per-second.prometheus/per-replica:"true"spec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:custom-metrics-consumerminReplicas:1maxReplicas:10metrics:  -type:Objectobject:metricName:processed-events-per-secondtarget:apiVersion:v1kind:Podname:dummy-podtargetValue:10# this will be treated as targetAverageValue

Note: The HPA object requires anObject to be specified. However when a Prometheus metric is used there is no needfor this object. But to satisfy the schema we specify a dummy pod calleddummy-pod.

Skipper collector

The skipper collector is a simple wrapper around the Prometheus collector tomake it easy to define an HPA for scaling based onIngress orRouteGroup metrics whenskipper is used as the ingressimplementation in your cluster. It assumes you are collecting Prometheusmetrics from skipper and it provides the correct Prometheus queries out of thebox so users don't have to define those manually.

Supported metrics

Metric	Description	Type	Kind	K8s Versions
`requests-per-second`	Scale based on requests per second for a certain ingress or routegroup.	Object	`Ingress`,`RouteGroup`	`>=1.19`

Example

Ingress

This is an example of an HPA that will scale based onrequests-per-second foran ingress calledmyapp.

apiVersion:autoscaling/v2kind:HorizontalPodAutoscalermetadata:name:myapp-hpaspec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:myappminReplicas:1maxReplicas:10metrics:  -type:Objectobject:describedObject:apiVersion:networking.k8s.io/v1kind:Ingressname:myappmetric:name:requests-per-secondselector:matchLabels:backend:backend1# optional backendtarget:averageValue:"10"type:AverageValue

RouteGroup

This is an example of an HPA that will scale based onrequests-per-second fora routegroup calledmyapp.

apiVersion:autoscaling/v2kind:HorizontalPodAutoscalermetadata:name:myapp-hpaspec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:myappminReplicas:1maxReplicas:10metrics:  -type:Objectobject:describedObject:apiVersion:zalando.org/v1kind:RouteGroupname:myappmetric:name:requests-per-secondselector:matchLabels:backend:backend1# optional backendtarget:averageValue:"10"type:AverageValue

Metric weighting based on backend

Skipper supports sending traffic to different backends based on annotationspresent on theIngress object, or weights on the RouteGroup backends. Bydefault the number of replicas will be calculated based on the full trafficserved by that ingress/routegroup. If however only the traffic being routed toa specific backend should be used then the backend name can be specified viathebackend label undermatchLabels for the metric. The ingress annotationwhere the backend weights can be obtained can be specified through the flag--skipper-backends-annotation.

External RPS collector

The External RPS collector, like Skipper collector, is a simple wrapper around the Prometheus collector tomake it easy to define an HPA for scaling based on the RPS measured for a given hostname. Whenskipper is used as the ingressimplementation in your cluster everything should work automatically, in case another reverse proxy is used as ingress, likeNginx for example, its necessary to configure which prometheus metric should be used through--external-rps-metric-name <metric-name> flag. Assumingskipper-ingress is being used or the appropriate metric name is passed using the flag mentioned previously this collector provides the correct Prometheus queries out of thebox so users don't have to define those manually.

Supported metrics

Metric	Description	Type	Kind	K8s Versions
`requests-per-second`	Scale based on requests per second for a certain hostname.	External		`>=1.12`

Example: External Metric

This is an example of an HPA that will scale based onrequests-per-second for the RPS measured in the hostnames called:www.example1.com andwww.example2.com; and weighted by 42%.

apiVersion:autoscaling/v2kind:HorizontalPodAutoscalermetadata:name:myapp-hpaannotations:metric-config.external.example-rps.requests-per-second/hostnames:www.example1.com,www.example2.commetric-config.external.example-rps.requests-per-second/weight:"42"spec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:custom-metrics-consumerminReplicas:1maxReplicas:10metrics:  -type:Externalexternal:metric:name:example-rpsselector:matchLabels:type:requests-per-secondtarget:type:AverageValueaverageValue:"42"

Multiple hostnames per metric

This metric supports a relation of n:1 between hostnames and metrics. The way it works is the measured RPS is the sum of the RPS rate of each of the specified hostnames. This value is further modified by the weight parameter explained below.

Metric weighting based on backend

There are ingress-controllers, like skipper-ingress, that supports sending traffic to different backends based on some kind of configuration, in case of skipper annotationspresent on theIngress object, or weights on the RouteGroup backends. Bydefault the number of replicas will be calculated based on the full trafficserved by these components. If however only the traffic being routed toa specific hostname should be used then the weight for the configured hostname(s) might be specified via theweight annotationmetric-config.external.<metric-name>.request-per-second/weight for the metric being configured.

InfluxDB collector

The InfluxDB collector mapsFlux queries to metrics that can be used for scaling.

Note that the collector targets anInfluxDB v2 instance, that's whywe only support Flux instead of InfluxQL.

Supported metrics

Metric	Description	Type	Kind	K8s Versions
`flux-query`	Generic metric which requires a user defined query.	External		`>=1.10`

Example: External Metric

This is an example of an HPA configured to get metrics based on a Flux query.The query is defined in the annotationmetric-config.external.<metricName>.influxdb/query where<metricName> isthe query name which will be associated with the result of the query. Thisallows having multiple flux queries associated with a single HPA.

apiVersion:autoscaling/v2kind:HorizontalPodAutoscalermetadata:name:myapp-hpaannotations:# These annotations are optional.# If specified, then they are used for setting up the InfluxDB client properly,# instead of using the ones specified via CLI. Respectively:#  - --influxdb-address#  - --influxdb-token#  - --influxdb-orgmetric-config.external.queue-depth.influxdb/address:"http://influxdbv2.my-namespace.svc"metric-config.external.queue-depth.influxdb/token:"secret-token"# This could be either the organization name or the ID.metric-config.external.queue-depth.influxdb/org:"deadbeef"# metric-config.<metricType>.<metricName>.<collectorType>/<configKey># <configKey> == query-namemetric-config.external.queue-depth.influxdb/query:|        from(bucket: "apps")          |> range(start: -30s)          |> filter(fn: (r) => r._measurement == "queue_depth")          |> group()          |> max()          // Rename "_value" to "metricvalue" for letting the metrics server properly unmarshal the result.          |> rename(columns: {_value: "metricvalue"})          |> keep(columns: ["metricvalue"])metric-config.external.queue-depth.influxdb/interval:"60s"# optionalspec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:queryd-v1minReplicas:1maxReplicas:4metrics:  -type:Externalexternal:metric:name:queue-depthselector:matchLabels:type:influxdbtarget:type:Valuevalue:"1"

AWS collector

The AWS collector allows scaling based on external metrics exposed by AWSservices e.g. SQS queue lengths.

AWS IAM role

To integrate with AWS, the controller needs to run on nodes withaccess to AWS API. Additionally the controller have to have a rolewith the following policy to get all required data from AWS:

PolicyDocument:Statement:    -Action:'sqs:GetQueueUrl'Effect:AllowResource:'*'    -Action:'sqs:GetQueueAttributes'Effect:AllowResource:'*'    -Action:'sqs:ListQueues'Effect:AllowResource:'*'    -Action:'sqs:ListQueueTags'Effect:AllowResource:'*'Version:2012-10-17

Supported metrics

Metric	Description	Type	K8s Versions
`sqs-queue-length`	Scale based on SQS queue length	External	`>=1.12`

Example

This is an example of an HPA that will scale based on the length of an SQSqueue.

apiVersion:autoscaling/v2kind:HorizontalPodAutoscalermetadata:name:myapp-hpaspec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:custom-metrics-consumerminReplicas:1maxReplicas:10metrics:  -type:Externalexternal:metric:name:my-sqsselector:matchLabels:type:sqs-queue-lengthqueue-name:foobarregion:eu-central-1target:averageValue:"30"type:AverageValue

ThematchLabels are used bykube-metrics-adapter to configure a collectorthat will get the queue length for an SQS queue namedfoobar in regioneu-central-1.

The AWS account of the queue currently depends on howkube-metrics-adapter isconfigured to get AWS credentials. The normal assumption is that you run theadapter in a cluster running in the AWS account where the queue is defined.Please open an issue if you would like support for other use cases.

ZMON collector

The ZMON collector allows scaling based on external metrics exposed byZMON checks.

Supported metrics

Metric	Description	Type	K8s Versions
`zmon-check`	Scale based on any ZMON check results	External	`>=1.12`

Example

This is an example of an HPA that will scale based on the specified valueexposed by a ZMON check with id1234.

apiVersion:autoscaling/v2kind:HorizontalPodAutoscalermetadata:name:myapp-hpaannotations:# metric-config.<metricType>.<metricName>.<collectorType>/<configKey>metric-config.external.my-zmon-check.zmon/key:"custom.*"metric-config.external.my-zmon-check.zmon/tag-application:"my-custom-app-*"metric-config.external.my-zmon-check.zmon/interval:"60s"# optionalspec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:custom-metrics-consumerminReplicas:1maxReplicas:10metrics:  -type:Externalexternal:metric:name:my-zmon-checkselector:matchLabels:type:zmoncheck-id:"1234"# the ZMON check to query for metricskey:"custom.value"tag-application:my-custom-appaggregators:avg# comma separated list of aggregation functions, default: lastduration:5m# default: 10mtarget:averageValue:"30"type:AverageValue

Thecheck-id specifies the ZMON check to query for the metrics.keyspecifies the JSON key in the check output to extract the metric value from.E.g. if you have a check which returns the following data:

{"custom": {"value":1.0    },"other": {"value":3.0    }}

Then the value1.0 would be returned when the key is defined ascustom.value.

Thetag-<name> labels defines the tags used for the kariosDB query. In anormal ZMON setup the following tags will be available:

application
alias (name of Kubernetes cluster)
entity - full ZMON entity ID.

aggregators defines the aggregation functions applied to the metrics query.For instance if you define the entity filtertype=kube_pod,application=my-custom-app you might get three entities back andthen you might want to get an average over the metrics for those threeentities. This would be possible by using theavg aggregator. The defaultaggregator islast which returns only the latest metric point from thequery. The supported aggregation functions areavg,count,last,max,min,sum,diff. See theKariosDB docs fordetails.

Theduration defines the duration used for the timeseries query. E.g. if youspecify a duration of5m then the query will return metric points for thelast 5 minutes and apply the specified aggregation with the same duration .e.gmax(5m).

The annotationsmetric-config.external.my-zmon-check.zmon/key andmetric-config.external.my-zmon-check.zmon/tag-<name> can be optionally used ifyou need to define akey or othertag with a "star" query syntax likevalues.*. Thishack is in place because it's not allowed to use* in themetric label definitions. If both annotations and corresponding label isdefined, then the annotation takes precedence.

Nakadi collector

The Nakadi collector allows scaling based onNakadiSubscription API stats metricsconsumer_lag_seconds orunconsumed_events.

Supported metrics

Metric Type	Description	Type	K8s Versions
`unconsumed-events`	Scale based on number of unconsumed events for a Nakadi subscription	External	`>=1.24`
`consumer-lag-seconds`	Scale based on number of max consumer lag seconds for a Nakadi subscription	External	`>=1.24`

apiVersion:autoscaling/v2kind:HorizontalPodAutoscalermetadata:name:myapp-hpaannotations:# metric-config.<metricType>.<metricName>.<collectorType>/<configKey>metric-config.external.my-nakadi-consumer.nakadi/interval:"60s"# optionalspec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:custom-metrics-consumerminReplicas:0maxReplicas:8# should match number of partitions for the event typemetrics:  -type:Externalexternal:metric:name:my-nakadi-consumerselector:matchLabels:type:nakadisubscription-id:"708095f6-cece-4d02-840e-ee488d710b29"metric-type:"consumer-lag-seconds|unconsumed-events"target:# value is compatible with the consumer-lag-seconds metric type.# It describes the amount of consumer lag in seconds before scaling# additionally up.# if an event-type has multiple partitions the value of# consumer-lag-seconds is the max of all the partitions.value:"600"# 10mtype:Value# averageValue is compatible with unconsumed-events metric type.# This means for every 30 unconsumed events a pod is added.# unconsumed-events is the sum of of unconsumed_events over all# partitions.averageValue:"30"type:AverageValue

Thesubscription-id is the Subscription ID of the relevant consumer. Themetric-type indicates whether to scale onconsumer-lag-seconds orunconsumed-events as outlined below.

unconsumed-events - is the total number of unconsumed events over allpartitions. When using thismetric-type you should also use the targetaverageValue which indicates the number of events which can be handled perpod. To best estimate the number of events per pods, you need to understand theaverage time for processing an event as well as the rate of events.

Example: You have an event type producing 100 events per second between 00:00and 08:00. Between 08:01 to 23:59 it produces 400 events per second.Let's assume that on average a single pod can consume 100 events per second,then we can define 100 asaverageValue and the HPA would scale to 1 between00:00 and 08:00, and scale to 4 between 08:01 and 23:59. If there for somereason is a short spike of 800 events per second, then it would scale to 8 podsto process those events until the rate goes down again.

consumer-lag-seconds - describes the age of the oldest unconsumed event fora subscription. If the event type has multiple partitions the lag is defined asthe max age over all partitions. When using thismetric-type you should usethe targetvalue to indicate the max lag (in seconds) before the HPA shouldscale.

Example: You have a subscription with a defined SLO of "99.99 of events areconsumed within 30 min.". In this case you can define a targetvalue of e.g.20 min. (1200s) (to include a safety buffer) such that the HPA only scales upfrom 1 to 2 if the target of 20 min. is breached and it needs to work fasterwith more consumers.For this case you should also account for the average time for processing anevent when defining the target.

Alternative to definingsubscription-id you can also filter based onowning_application,event-types andconsumer-group:

metrics:-type:Externalexternal:metric:name:my-nakadi-consumerselector:matchLabels:type:nakadiowning-application:"example-app"# comma separated list of event typesevent-types:"example-event-type,example-event-type2"consumer-group:"abcd1234"metric-type:"consumer-lag-seconds|unconsumed-events"

This is useful in dynamic environments where the subscription ID might not beknown before deployment time (e.g. because it's created by the same deployment).

HTTP Collector

The http collector allows collecting metrics from an external endpoint specified in the HPA.Currently onlyjson-path collection is supported.

Supported metrics

Metric	Description	Type	K8s Versions
custom	No predefined metrics. Metrics are generated from user defined queries.	Pods	`>=1.12`

Example

This is an example of using the HTTP collector to collect metrics from a jsonmetrics endpoint specified in the annotations.

apiVersion:autoscaling/v2kind:HorizontalPodAutoscalermetadata:name:myapp-hpaannotations:# metric-config.<metricType>.<metricName>.<collectorType>/<configKey>metric-config.external.unique-metric-name.json-path/json-key:"$.some-metric.value"metric-config.external.unique-metric-name.json-path/json-eval:ceil($['active processes'] / $['total processes'] * 100)# cannot use both json-eval and json-keymetric-config.external.unique-metric-name.json-path/endpoint:"http://metric-source.app-namespace:8080/metrics"metric-config.external.unique-metric-name.json-path/aggregator:"max"metric-config.external.unique-metric-name.json-path/interval:"60s"# optionalspec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:myappminReplicas:1maxReplicas:10metrics:  -type:Externalexternal:metric:name:unique-metric-nameselector:matchLabels:type:json-pathtarget:averageValue:1type:AverageValue

The HTTP collector similar to the Pod Metrics collector. The followingconfiguration values are supported:

json-key to specify the JSON path of the metric to be queried
json-eval to specify an evaluate string toevaluate on the script engine,cannot be used in conjunction withjson-key
endpoint the fully formed path to query for the metric. In the above example a KubernetesServicein the namespaceapp-namespace is called.
aggregator is only required if the metric is an array of values and specifies how the valuesare aggregated. Currently this option can support the values:sum,max,min,avg.

Scrape Interval

It's possible to configure the scrape interval for each of the metric types viaan annotation:

metric-config.<metricType>.<metricName>.<collectorType>/interval:"30s"

The default is60s but can be reduced to let the adapter collect metrics moreoften.

ScalingSchedule Collectors

TheScalingSchedule andClusterScalingSchedule collectors allowcollecting time-based metrics from the respective CRD objects specifiedin the HPA.

These collectors are disabled by default, you have to start the serverwith the--scaling-schedule flag to enable it. Remember to deploy the CRDsScalingSchedule andClusterScalingSchedule and allow the serviceaccount used by the server to read, watch and list them.

Supported metrics

Metric	Description	Type	K8s Versions
ObjectName	The metric is calculated and stored for each`ScalingSchedule` and`ClusterScalingSchedule` referenced in the HPAs	`ScalingSchedule` and`ClusterScalingSchedule`	`>=1.16`

Ramp-up and ramp-down feature

To avoid abrupt scaling due to time based metrics,theSchalingSchedulecollector has a feature of ramp-up and ramp-down the metric over aspecific period of time. The duration of the scaling window can beconfigured individually in the[Cluster]ScalingSchedule object, viathe optionscalingWindowDurationMinutes or globally for all scheduledevents, and defaults to a globally configured value if not specified.The default for the latter is set to 10 minutes, but can be changedusing the--scaling-schedule-default-scaling-window flag.

This spreads the scale events around, creating less load on the othercomponents, and helping the rest of the metrics (like the CPU ones) toadjust as well.

TheHPA algorithm does not make changes if the metricchange is less than the specified by thehorizontal-pod-autoscaler-tolerance flag:

We'll skip scaling if the ratio is sufficiently close to 1.0 (within aglobally-configurable tolerance, from the--horizontal-pod-autoscaler-tolerance flag, which defaults to 0.1.

With that in mind, the ramp-up and ramp-down feature divides the scalingover the specified period of time in buckets, trying to achieve changesbigger than the configured tolerance. The number of buckets defaults to10 and can be configured by the--scaling-schedule-ramp-steps flag.

Important: note that the ramp-up and ramp-down feature can lead todeployments achieving less than the specified number of pods, due to theHPA 10% change rule and the ceiling function applied to the desirednumber of the pods (check thealgorithm details). Itvaries with the configured metric forScalingSchedule events, thenumber of pods and the configuredhorizontal-pod-autoscaler-toleranceflag of your kubernetes installation.This gist contains the code tosimulate the situations a deployment with different number of pods, witha metric of 10000 can face with 10 buckets (max of 90% of the metricreturned) and 5 buckets (max of 80% of the metric returned). The ramp-upand ramp-down feature can be disabled by setting--scaling-schedule-default-scaling-window to 0 and abrupt scalings canbe handled viascaling policies.

Example

This is an example of using the ScalingSchedule collectors to collectmetrics from a deployed kind of the CRD. First, the schedule object:

apiVersion:zalando.org/v1kind:ClusterScalingSchedulemetadata:name:"scheduling-event"spec:schedules:  -type:OneTimedate:"2021-10-02T08:08:08+02:00"durationMinutes:30value:100  -type:RepeatingdurationMinutes:10value:120period:startTime:"15:45"timezone:"Europe/Berlin"days:      -Mon      -Wed      -Fri

This resource defines a scheduling event namedscheduling-event withtwo schedules of the kindClusterScalingSchedule.

ClusterScalingSchedule objects aren't namespaced, what means it can bereferenced by any HPA in any namespace in the cluster.ScalingSchedulehave the exact same fields and behavior, but can be referenced just byHPAs in the same namespace. The schedules can have the typeRepeatingorOneTime.

This example configuration will generate the following result: at2021-10-02T08:08:08+02:00 for 30 minutes a metric with the value of100 will be returned. Every Monday, Wednesday and Friday, starting at 15hours and 45 minutes (Berlin time), a metric with the value of 120 willbe returned for 10 minutes. It's not the case of this example, but if multipleschedules collide in time, the biggest value is returned.

Check the CRDs definitions(ScalingSchedule,ClusterScalingSchedule) fora better understanding of the possible fields and their behavior.

An HPA can reference the deployedClusterScalingSchedule object asthis example:

apiVersion:autoscaling/v2kind:HorizontalPodAutoscalermetadata:name:"myapp-hpa"spec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:myappminReplicas:1maxReplicas:15metrics:  -type:Objectobject:describedObject:apiVersion:zalando.org/v1kind:ClusterScalingSchedulename:"scheduling-event"metric:name:"scheduling-event"target:type:AverageValueaverageValue:"10"

The name of the metric is equal to the name of the referenced object.Thetarget.averageValue in this example is set to 10. This value willbe used by the HPA controller to define the desired number of pods,based on the metric obtained (check theHPA algorithmdetailsfor more context). This HPA configuration explicitly says that each podof this application supports 10 units of theClusterScalingSchedulemetric. Multiple applications can share the sameClusterScalingSchedule orScalingSchedule event and have a differentnumber of pods based on itstarget.averageValue configuration.

In our specific example at2021-10-02T08:08:08+02:00 as the metric hasthe value 100, this application will scale to 10 pods (100/10). EveryMonday, Wednesday and Friday, starting at 15 hours and 45 minutes(Berlin time) the application will scale to 12 pods (120/10). Bothscaling up will last at least the configured duration times of theschedules. After that, regular HPA scale down behavior applies.

Note that these number of pods are just considering these custommetrics, the normal HPA behavior still applies, such as: in case ofmultiple metrics the biggest number of pods is the utilized one, HPA maxand min replica configuration, autoscaling policies, etc.

About

General purpose metrics adapter for Kubernetes HPA metrics

Code of conduct

Security policy

Activity

Custom properties

Stars

536 stars

Watchers

16 watching

Forks

114 forks

Report repository

Releases23

v0.2.4 Latest

Mar 6, 2025

+ 22 releases

Packages

Contributors44

+ 30 contributors

Movatterモバイル変換

License

zalando-incubator/kube-metrics-adapter

Folders and files

Latest commit

History

Repository files navigation

kube-metrics-adapter

Kubernetes compatibility

Building

Install in Kubernetes

Collectors

Pod collector

Supported HPAscaleTargetRef

Supported metrics

Example

Prometheus collector

Supported metrics

Example: External Metric

Example: Object Metric [DEPRECATED]

Skipper collector

Supported metrics

Example

Ingress

RouteGroup

Metric weighting based on backend

External RPS collector

Supported metrics

Example: External Metric

Multiple hostnames per metric

Metric weighting based on backend

InfluxDB collector

Supported metrics

Example: External Metric

AWS collector

AWS IAM role

Supported metrics

Example

ZMON collector

Supported metrics

Example

Nakadi collector

Supported metrics

HTTP Collector

Supported metrics

Example

Scrape Interval

ScalingSchedule Collectors

Supported metrics

Ramp-up and ramp-down feature

Example

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases23

Packages0

Uh oh!

Uh oh!

Contributors44

Uh oh!

Languages

Supported HPA`scaleTargetRef`

Packages