Best practices for high availability with OpenShift

This document describes best practices to achieve high availability (HA) withRed Hat OpenShift Container Platformworkloads on Compute Engine. This document focuses on application-levelstrategies to help you ensure that your workloads remain highly available whenfailures occur. These strategies help you eliminate single points of failure andimplement mechanisms for automatic failover and recovery.

This document is intended for platform and application architects and assumesthat you have some experience in deploying OpenShift. For more information abouthow to deploy OpenShift, see theRed Hat documentation.

This document is part of a series that focuses on the application-levelstrategies that ensure your workloads remain highly available and quicklyrecoverable in the face of failures. The documents in this series are asfollows:

Spread deployments across multiple zones

We recommend that you deploy OpenShift across multiple zones within aGoogle Cloud region. This approach helps ensure that if a zone experiences anoutage, the cluster's control plane nodes continue to function in the otherzones the deployment is spread across. To deploy OpenShift across multiplezones, specify a list of Google Cloud zones from the same region in yourinstall-config.yaml file.

For fine-grained control over the locations where nodes are deployed, werecommend defining VM placement policies which ensure that the VMs are spreadacrossdifferent failure domainsin the same zone. Applying aspread placement policyto your cluster nodes helps reduce the number of nodes that are simultaneouslyimpacted by location-specific disruptions. For more information on how to createa spread policy for existing clusters, seeCreate and apply spread placement policies to VMs.

Similarly, to prevent multiple pods from being scheduled on the same node, werecommend that you usepod anti-affinity rules.These rules spread application replicas across multiple zones. The followingexample demonstrates how to implement pod anti-affinity rules:

apiVersion: apps/v1kind: Deploymentmetadata:  name: my-app  namespace: my-app-namespacespec:  replicas: 3  selector:    matchLabels:      app: my-app  template:    metadata:      labels:        app: my-app    spec:      # Pod Anti-Affinity: Prefer to schedule new pods on nodes in different zones.      affinity:        podAntiAffinity:          requiredDuringSchedulingIgnoredDuringExecution:          - labelSelector:              matchLabels:                app: my-app            topologyKey: topology.kubernetes.io/zone      containers:      - name: my-app-container        image: quay.io/myorg/my-app:latest        ports:        - containerPort: 8080

For stateless services like web front ends or REST APIs, we recommend that yourun multiple pod replicasfor each service or route. This approach ensures that traffic is automaticallyrouted to pods in available zones.

Proactively manage load to prevent resource over-commitment

We recommend that you proactively manage your application's load to preventresource over-commitment. Over-commitment can lead to poor service performanceunder load. You can help prevent over-commitment by setting resourcerequest limits, for a more detailed explanation seemanaging resources for your pod.Additionally, you can automatically scale replicas up or down based on CPU,memory, or custom metrics, using thehorizontal pod autoscaler.

We also recommend that you use the following load balancing services:

  • OpenShift ingress operator. Ingress operator deploys HAProxy-based ingresscontrollers to handle routing to your pods. Specifically, we recommend thatyouconfigure global access for Ingress controller,which enables clients in any region within the same VPC network and region asthe load balancer, to reach the workloads running on your cluster.Additionally, we recommend that youimplement ingress controller health checksto monitor the health of your pods and restart failing pods.
  • Google Cloud Load Balancing. Load Balancing distributes traffic acrossGoogle Cloud zones.Choose a load balancer thatmeets your application's needs.

Define pod disruption budgets

We recommend that you define disruption budgets to specify the minimum number ofpods that your application requires to be available during disruptions likemaintenance events or updates. The following example shows how to define adisruption budget:

apiVersion: policy/v1kind: PodDisruptionBudgetmetadata:  name: my-app-pdb  namespace: my-app-namespacespec:  # Define how many pods need to remain available during a disruption.  # At least one of "minAvailable" or "maxUnavailable" must be specified.  minAvailable: 2  selector:    matchLabels:      app: my-app

For more information, seeSpecifying a Disruption Budget for your Application.

Use storage that supports HA and data replication

For stateful workloads that require persistent data storage outside ofcontainers, we recommend the following best practices.

Disk best practices

If you require disk storage use one of the following:

After you select a storage option, install its driver in your cluster:

The CSI Persistent Disk operator provides a storage class that you can use tocreate Persistent volume claims (PVC). For Filestore, you mustcreate the Filestore storage class.

Database best practices

If you require a database use one of the following:

After you install your database operator, configure a cluster with multipleinstances. The following example shows the configuration for a cluster with thefollowing attributes:

  • A PostgreSQL cluster namedmy-postgres-cluster is created with three instancesfor high availability.
  • The cluster uses theregionalpd-balanced storage class for durable andreplicated storage across zones.
  • A database namedmydatabase is initialized with a usermyuser, whosecredentials are stored in a Kubernetes secret calledmy-database-secret.
  • Superuser access is disabled for enhanced security.
  • Monitoring is enabled for the cluster.
apiVersion: postgresql.cnpg.io/v1kind: Clustermetadata:  name: my-postgres-cluster  namespace: postgres-namespacespec:  instances: 3  storage:    size: 10Gi    storageClass: regionalpd-balanced  bootstrap:    initdb:      database: mydatabase      owner: myuser      secret:        name: my-database-secret  enableSuperuserAccess: false  monitoring:    enabled: true---apiVersion: 1kind: Secretmetadata:  name: my-database-secret  namespace: postgres-namespacetype: Opaquedata:  username: bXl1c2Vy # Base64-encoded value of "myuser"  password: c2VjdXJlcGFzc3dvcmQ= # Base64-encoded value of "securepassword"

Externalize application state

We recommend that you move session state or caching to shared in-memory stores(for example, Redis) or persistent datastores (for example, Postgres, MySQL)that are configured to run in HA mode.

Summary of best practices

In summary, implement the following best practices to achieve high availabilitywith OpenShift:

  • Spread deployments across multiple zones
  • Proactively manage load to prevent resource over-commitment
  • Define pod disruption budgets
  • Use HA data replication features
  • Externalize application state

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.