Best practices for high availability with OpenShift Stay organized with collections Save and categorize content based on your preferences.
This document describes best practices to achieve high availability (HA) withRed Hat OpenShift Container Platformworkloads on Compute Engine. This document focuses on application-levelstrategies to help you ensure that your workloads remain highly available whenfailures occur. These strategies help you eliminate single points of failure andimplement mechanisms for automatic failover and recovery.
This document is intended for platform and application architects and assumesthat you have some experience in deploying OpenShift. For more information abouthow to deploy OpenShift, see theRed Hat documentation.
This document is part of a series that focuses on the application-levelstrategies that ensure your workloads remain highly available and quicklyrecoverable in the face of failures. The documents in this series are asfollows:
- Disaster recovery for OpenShift on Google Cloud
- Best practices for high availability with OpenShift (this page)
- OpenShift on Google Cloud: Disaster Recovery Strategies for active-passive and active-inactive setups
Spread deployments across multiple zones
We recommend that you deploy OpenShift across multiple zones within aGoogle Cloud region. This approach helps ensure that if a zone experiences anoutage, the cluster's control plane nodes continue to function in the otherzones the deployment is spread across. To deploy OpenShift across multiplezones, specify a list of Google Cloud zones from the same region in yourinstall-config.yaml file.
For fine-grained control over the locations where nodes are deployed, werecommend defining VM placement policies which ensure that the VMs are spreadacrossdifferent failure domainsin the same zone. Applying aspread placement policyto your cluster nodes helps reduce the number of nodes that are simultaneouslyimpacted by location-specific disruptions. For more information on how to createa spread policy for existing clusters, seeCreate and apply spread placement policies to VMs.
Similarly, to prevent multiple pods from being scheduled on the same node, werecommend that you usepod anti-affinity rules.These rules spread application replicas across multiple zones. The followingexample demonstrates how to implement pod anti-affinity rules:
apiVersion: apps/v1kind: Deploymentmetadata: name: my-app namespace: my-app-namespacespec: replicas: 3 selector: matchLabels: app: my-app template: metadata: labels: app: my-app spec: # Pod Anti-Affinity: Prefer to schedule new pods on nodes in different zones. affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: app: my-app topologyKey: topology.kubernetes.io/zone containers: - name: my-app-container image: quay.io/myorg/my-app:latest ports: - containerPort: 8080
For stateless services like web front ends or REST APIs, we recommend that yourun multiple pod replicasfor each service or route. This approach ensures that traffic is automaticallyrouted to pods in available zones.
Proactively manage load to prevent resource over-commitment
We recommend that you proactively manage your application's load to preventresource over-commitment. Over-commitment can lead to poor service performanceunder load. You can help prevent over-commitment by setting resourcerequest limits, for a more detailed explanation seemanaging resources for your pod.Additionally, you can automatically scale replicas up or down based on CPU,memory, or custom metrics, using thehorizontal pod autoscaler.
We also recommend that you use the following load balancing services:
- OpenShift ingress operator. Ingress operator deploys HAProxy-based ingresscontrollers to handle routing to your pods. Specifically, we recommend thatyouconfigure global access for Ingress controller,which enables clients in any region within the same VPC network and region asthe load balancer, to reach the workloads running on your cluster.Additionally, we recommend that youimplement ingress controller health checksto monitor the health of your pods and restart failing pods.
- Google Cloud Load Balancing. Load Balancing distributes traffic acrossGoogle Cloud zones.Choose a load balancer thatmeets your application's needs.
Define pod disruption budgets
We recommend that you define disruption budgets to specify the minimum number ofpods that your application requires to be available during disruptions likemaintenance events or updates. The following example shows how to define adisruption budget:
apiVersion: policy/v1kind: PodDisruptionBudgetmetadata: name: my-app-pdb namespace: my-app-namespacespec: # Define how many pods need to remain available during a disruption. # At least one of "minAvailable" or "maxUnavailable" must be specified. minAvailable: 2 selector: matchLabels: app: my-app
For more information, seeSpecifying a Disruption Budget for your Application.
Use storage that supports HA and data replication
For stateful workloads that require persistent data storage outside ofcontainers, we recommend the following best practices.
Disk best practices
If you require disk storage use one of the following:
- Block storage:Compute Engine regional Persistent Disk with synchronous replication
- Shared file storage:Filestorewithsnapshots and backups enabled
After you select a storage option, install its driver in your cluster:
The CSI Persistent Disk operator provides a storage class that you can use tocreate Persistent volume claims (PVC). For Filestore, you mustcreate the Filestore storage class.
Database best practices
If you require a database use one of the following:
- Fully-managed database: We recommend that you useCloud SQLorAlloyDB for PostgreSQL to manage database HA on yourbehalf. If you use Cloud SQL, you can use theCloud SQL Proxy Operator tosimplify connection management between your application and the database.
- Self-managed database: We recommend that you use a database that supportsHA and that you deploy its operator to enable HA. For more information, seethe documentation related to your database operator, such asRedis Enterprise for Kubernetes,MariaDB Operator,orCloudNative PostgreSQL Operator.
After you install your database operator, configure a cluster with multipleinstances. The following example shows the configuration for a cluster with thefollowing attributes:
- A PostgreSQL cluster named
my-postgres-clusteris created with three instancesfor high availability. - The cluster uses the
regionalpd-balancedstorage class for durable andreplicated storage across zones. - A database named
mydatabaseis initialized with a usermyuser, whosecredentials are stored in a Kubernetes secret calledmy-database-secret. - Superuser access is disabled for enhanced security.
- Monitoring is enabled for the cluster.
apiVersion: postgresql.cnpg.io/v1kind: Clustermetadata: name: my-postgres-cluster namespace: postgres-namespacespec: instances: 3 storage: size: 10Gi storageClass: regionalpd-balanced bootstrap: initdb: database: mydatabase owner: myuser secret: name: my-database-secret enableSuperuserAccess: false monitoring: enabled: true---apiVersion: 1kind: Secretmetadata: name: my-database-secret namespace: postgres-namespacetype: Opaquedata: username: bXl1c2Vy # Base64-encoded value of "myuser" password: c2VjdXJlcGFzc3dvcmQ= # Base64-encoded value of "securepassword"
Externalize application state
We recommend that you move session state or caching to shared in-memory stores(for example, Redis) or persistent datastores (for example, Postgres, MySQL)that are configured to run in HA mode.
Summary of best practices
In summary, implement the following best practices to achieve high availabilitywith OpenShift:
- Spread deployments across multiple zones
- Proactively manage load to prevent resource over-commitment
- Define pod disruption budgets
- Use HA data replication features
- Externalize application state
What's next
- Learn how toinstall OpenShift on Google Cloud.
- Learn more aboutRed Hat solutions on Google Cloud.
- Learn about the differentarchitectural options for DR with OpenShift on Google Cloud.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.