Scale and autoscale runtime services

You are currently viewing version 1.9 of the Apigee hybrid documentation.This version is end of life. You should upgrade to a newer version. For more information, see Supported versions.

You can scale most services running in Kubernetes from the command line or in a configuration override. You can set scaling parameters for Apigee hybrid runtime services in theoverrides.yaml file.

Service	Implemented As	Scaling
Cassandra	ApigeeDatastore (CRD)	SeeScaling Cassandra.
Ingress/LoadBalancer	Deployment	Manage scaling with`ingressGateways.replicaCountMin` and`ingressGateways.replicaCountMax`. Note: Apigee ingress gateway uses Anthos Service Mesh which uses Horizontal Pod Autoscaling (HPAs).
Logger	DaemonSet	DaemonSets manage replicas of a pod on all nodes, so they scale when you scale the pods themselves.
MART Apigee Connect Watcher	ApigeeOrganization (CRD)	To scale via configuration, increase the value of the Deployment's`replicaCountMin` configuration property for the`mart`,`watcher`, and/or`connectAgent` stanzas. For example: mart:replicaCountMax:2replicaCountMin:1watcher:replicaCountMax:2replicaCountMin:1connectAgent:replicaCountMax:2replicaCountMin:1 These Deployments use a Horizontal Pod Autoscaler for autoscaling. Set the Deployment object's`targetCPUUtilizationPercentage` property to the threshold for scaling up; when this value is exceeded, Kubernetes adds pods up to the value of`replicaCountMax`. For more information on setting configuration properties, seeManage runtime plane components.
Runtime Synchronizer UDCA	ApigeeEnvironment (CRD)	To scale via configuration, increase the value of the`replicaCountMin` property for the`udca`,`synchronizer`, and/or`runtime` stanzas in the overrides file. For example: synchronizer:replicaCountMax:10replicaCountMin:1runtime:replicaCountMax:10replicaCountMin:1udca:replicaCountMax:10replicaCountMin:1 Note:These changes apply to ALL environments in the overrides file. If you wish to customize scaling for each environment seeAdvanced configurations below. These deployments use a Horizontal Pod Autoscaler for autoscaling. Set the Deployment object's`targetCPUUtilizationPercentage` property to the threshold for scaling up; when this value is exceeded, Kubernetes adds pods up to the value of`replicaCountMax`. For more information on setting configuration properties, seeManage runtime plane components.

Advanced configurations

In some scenarios, you may need to use advanced scaling options. Example scenarios include:

Setting different scaling options for each environment. For example, where env1 has aminReplica of 5 and env2 has aminReplica of 2.
Setting different scaling options for each component within an environment. For example, where theudca component has amaxReplica of 5 and thesynchronizer component has amaxReplica of 2.

The following example shows how to use thekubernetes patch command to change themaxReplicas property for theruntime component:

Create environment variables to use with the command:

exportENV=my-environment-nameexportNAMESPACE=apigee#the namespace where apigee is deployedexportCOMPONENT=runtime#can be udca or synchronizerexportMAX_REPLICAS=2exportMIN_REPLICAS=1

Apply the patch. Note that this example assumes thatkubectl is in yourPATH:

kubectlpatchapigeeenvironment-n$NAMESPACE\$(kubectlgetapigeeenvironments-n$NAMESPACE-ojsonpath='{.items[?(@.spec.name=="'$ENV'")]..metadata.name}')\--patch"$(echo -e "spec:\ncomponents:\n$COMPONENT:\nautoScaler:\nmaxReplicas:$MAX_REPLICAS\nminReplicas:$MIN_REPLICAS")"\--typemerge

Verify the change:
```
kubectl get hpa -n $NAMESPACE
```

Environment-based scaling

By default, scaling is described at the organization level. You can override the default settings by specifying environment-specific scaling in theoverrides.yaml file as shown in the following example:

envs:# Apigee environment name- name: test  components:  # Environment-specific scaling override  # Otherwise, uses scaling defined at the respective root component   runtime:    replicaCountMin: 2    replicaCountMax: 20

Metrics-based scaling

With metrics-based scaling, the runtime can use CPU and application metrics to scale theapigee-runtime pods. The KubernetesHorizontal Pod Autoscaler (HPA) API, uses thehpaBehavior field to configure the scale-up and scale-down behaviors of the target service. Metrics-based scaling is not available for any other components in a hybrid deployment.

Note: An internal connection between Prometheus and the Prometheus-Adapter on port 6443 must be open in orderto receive metrics data and enable scaling. For more information on required ports, see Internal connections.

Scaling can be adjusted based on the following metrics:

Metric	Measure	Considerations
serverNioTaskWaitTime	Average wait time (in picoseconds) of processing queue in runtime instances for proxy requests at the http layer.	This metric measures the impact of the number and payload size of proxy requests and responses.
serverMainTaskWaitTime	Average wait time (in picoseconds) of processing queue in runtime instances for proxy requests to process policies.	This metric measures the impact of complexity in the policies attached to the proxy request flow.

The following example from theruntime stanza in theoverrides.yaml illustrates the standard parameters (and permitted ranges) for scalingapigee-runtime pods in a hybrid implementation:

hpaMetrics:serverMainTaskWaitTime:400M(300Mto450M)serverNioTaskWaitTime:400M(300Mto450M)targetCPUUtilizationPercentage:75hpaBehavior:scaleDown:percent:periodSeconds:60(30-180)value:20(5-50)pods:periodSeconds:60(30-180)value:2(1-15)selectPolicy:MinstabilizationWindowSeconds:120(60-300)scaleUp:percent:periodSeconds:60(30-120)value:20(5-100)pods:periodSeconds:60(30-120)value:4(2-15)selectPolicy:MaxstabilizationWindowSeconds:30(30-120)

Configure more aggressive scaling

Increasing thepercent andpods values of the scale-up policy will result in a more aggressive scale-up policy. Similarly, increasing thepercent andpods values inscaleDown will result in an aggressive scale-down policy. For example:

hpaMetrics:serverMainTaskWaitTime:400MserverNioTaskWaitTime:400MtargetCPUUtilizationPercentage:75hpaBehavior:scaleDown:percent:periodSeconds:60value:20pods:periodSeconds:60value:4selectPolicy:MinstabilizationWindowSeconds:120scaleUp:percent:periodSeconds:60value:30pods:periodSeconds:60value:5selectPolicy:MaxstabilizationWindowSeconds:30

In the above example, thescaleDown.pods.value is increased to5, thescaleUp.percent.value is increased to30, and thescaleUp.pods.value is increased to5.

Note: The value ofperiodSeconds should not go below 30.

Configure less aggressive scaling

ThehpaBehavior configuration values can also be decreased to implement less aggressive scale-up and scale-down policies. For example:

hpaMetrics:serverMainTaskWaitTime:400MserverNioTaskWaitTime:400MtargetCPUUtilizationPercentage:75hpaBehavior:scaleDown:percent:periodSeconds:60value:10pods:periodSeconds:60value:1selectPolicy:MinstabilizationWindowSeconds:180scaleUp:percent:periodSeconds:60value:20pods:periodSeconds:60value:4selectPolicy:MaxstabilizationWindowSeconds:30

In the above example, thescaleDown.percent.value is decreased to10, thescaleDown.pods.value is decreased to1, and thescaleUp.stablizationWindowSeconds is increased to180.

For more information about metrics-based scaling using thehpaBehavior field, see Scaling policies.

Disable metrics-based scaling

While metrics-based scaling is enabled by default and cannot be completely disabled, you can configure the metrics thresholds at a level that metrics-based scaling will not be triggered. The resulting scaling behavior will be the same as CPU-based scaling. For example, you can use the following configuration to prevent triggering metrics-based scaling:

hpaMetrics:serverMainTaskWaitTime:4000MserverNioTaskWaitTime:4000MtargetCPUUtilizationPercentage:75hpaBehavior:scaleDown:percent:periodSeconds:60value:10pods:periodSeconds:60value:1selectPolicy:MinstabilizationWindowSeconds:180scaleUp:percent:periodSeconds:60value:20pods:periodSeconds:60value:4selectPolicy:MaxstabilizationWindowSeconds:30

Troubleshooting

This section describes troubleshooting methods for common errors you may encounter while configuring scaling and auto-scaling.

HPA shows`unknown` for metrics values

If metrics-based scaling does not work and the HPA showsunknown for metrics values, use the following command to check the HPA output:

kubectl describe hpaHPA_NAME

When running the command, replaceHPA_NAME with the name of the HPA you wish to view.

The output will show the CPU target and utilization of the service, indicating that CPU scaling will work in the absence of metrics-based scaling. For HPA behavior using multiple parameters, seeScaling on multiple metrics.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Scale and autoscale runtime services Stay organized with collections Save and categorize content based on your preferences.