Cassandra troubleshooting guide

You are currently viewing version 1.2 of the Apigee hybrid documentation.This version is end of life. You should upgrade to a newer version. For more information, seeSupported versions.

This topic discusses steps you can take to troubleshoot and fix problems with theCassandra datastore. Cassandra is a persistent datastore that runs in thecassandra component of thehybrid runtime architecture. See alsoRuntime service configuration overview.

Cassandra pods are stuck in the Pending state

Symptom

When starting up, the Cassandra pods remain in thePending state.

Error message

When you usekubectl to view the pod states, you see that one or more Cassandra pods are stuck in thePending state. ThePending state indicates that Kubernetes is unable to schedule the pod on a node: the pod cannot be created. For example:

kubectl get pods -nnamespaceNAME                                     READY   STATUS      RESTARTS   AGEadah-resources-install-4762w             0/4     Completed   0          10mapigee-cassandra-0                       0/1     Pending     0          10m...

Possible causes

A pod stuck in the Pending state can have multiple causes. For example:

CauseDescription
Insufficient resources There is not enough CPU or memory available to create the pod.
Volume not created The pod is waiting for the persistent volume to be created.

Diagnosis

Usekubectl to describe the pod to determine the source of the error. For example:

kubectl -nnamespace describe podspod_name

For example:

kubectl -n apigee describe pods apigee-cassandra-0

The output may show one of these possible problems:

  • If the problem is insufficient resources, you will see a Warning message that indicates insufficient CPU or memory.
  • If the error message indicates that the pod has unbound immediate PersistentVolumeClaims (PVC), it means the pod is not able to create its Persistent volume.

Resolution

Insufficient resources

Modify the Cassandra node pool so that it has sufficient CPU and memory resources. See Resizing a node pool for details.

Persistent volume not created

If you determine a persistent volume issue, describe the PersistentVolumeClaim (PVC) to determine why it is not being created:

  1. List the PVCs in the cluster:
    kubectl -nnamespace get pvcNAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGEcassandra-data-apigee-cassandra-0   Bound    pvc-b247faae-0a2b-11ea-867b-42010a80006e   10Gi       RWO            standard       15m...
  2. Describe the PVC for the pod that is failing. For example, the following command describes the PVC bound to the podapigee-cassandra-0:
    kubectl apigee describe pvc cassandra-data-apigee-cassandra-0Events:  Type     Reason              Age                From                         Message  ----     ------              ----               ----                         -------  Warning  ProvisioningFailed  3m (x143 over 5h)  persistentvolume-controller  storageclass.storage.k8s.io "apigee-sc" not found

    Note that in this example, the StorageClass namedapigee-sc does not exist. To resolve this problem, create the missing StorageClass in the cluster, as explained in Change the default StorageClass.

See also Debugging Pods.

Cassandra pods are stuck in the CrashLoopBackoff state

Symptom

When starting up, the Cassandra pods remain in theCrashLoopBackoff state.

Error message

When you usekubectl to view the pod states, you see that one or more Cassandra pods are in theCrashLoopBackoff state. This state indicates that Kubernetes is unable to create the pod. For example:

kubectl get pods -nnamespaceNAME                                     READY   STATUS            RESTARTS   AGEadah-resources-install-4762w             0/4     Completed         0          10mapigee-cassandra-0                       0/1     CrashLoopBackoff           0          10m...

Possible causes

A pod stuck in theCrashLoopBackoff state can have multiple causes. For example:

CauseDescription
Data center differs from previous data center This error indicates that the Cassandra pod has a persistent volume that has data from a previous cluster, and the new pods are not able to join the old cluster.This usually happens when stale persistent volumes persist from the previous Cassandra cluster on the same Kubernetes node. This problem can occur if you delete and recreate Cassandra in the cluster.
Truststore directory not found This error indicates that the Cassandra pod is not able to create a TLS connection.This usually happens when the provided keys and certificates are invalid, missing, or have other issues.

Diagnosis

Check theCassandra error log to determine the cause of the problem.

  1. List the pods to get the ID of the Cassandra pod that is failing:
    kubectl get pods -nnamespace
  2. Check the failing pod's log:
    kubectl logspod_id -nnamespace

Resolution

Look for the following clues in the pod's log:

Data center differs from previous data center

If you see this log message:

Cannot start node if snitch's data center (us-east1) differs from previous data center
  • Check if there are any stale or old PVC in the cluster and delete them.
  • If this is a fresh install, delete all the PVCs and re-try the setup. For example:
    kubectl -nnamespace get pvckubectl -nnamespace delete pvc cassandra-data-apigee-cassandra-0

Truststore directory not found

If you see this log message:

Caused by: java.io.FileNotFoundException: /apigee/cassandra/ssl/truststore.p12(No such file or directory)

Verify the key and certificates if provided in your overrides file are correct and valid. For example:

cassandra:  sslRootCAPath:path_to_root_ca-file  sslCertPath:path-to-tls-cert-file  sslKeyPath:path-to-tls-key-file

Node failure

Symptom

When starting up, the Cassandra pods remain in the Pending state. Thisproblem can indicate an underlying node failure.

Diagnosis

  1. Determine which Cassandra pods are not running:
    $kubectlgetpods-nyour_namespaceNAMEREADYSTATUSRESTARTSAGEcassandra-00/1Pending013scassandra-11/1Running08dcassandra-21/1Running08d
  2. Check the worker nodes. If one is in theNotReady state, then that is the node that has failed:
    kubectl get nodes -nyour_namespaceNAME                          STATUS   ROLES    AGE   VERSIONip-10-30-1-190.ec2.internal   Ready    <none>   8d    v1.13.2ip-10-30-1-22.ec2.internal    Ready    master   8d    v1.13.2ip-10-30-1-36.ec2.internal    NotReady <none>   8d    v1.13.2ip-10-30-2-214.ec2.internal   Ready    <none>   8d    v1.13.2ip-10-30-2-252.ec2.internal   Ready    <none>   8d    v1.13.2ip-10-30-2-47.ec2.internal    Ready    <none>   8d    v1.13.2ip-10-30-3-11.ec2.internal    Ready    <none>   8d    v1.13.2ip-10-30-3-152.ec2.internal   Ready    <none>   8d    v1.13.2ip-10-30-3-5.ec2.internal     Ready    <none>   8d    v1.13.2

Resolution

  1. Remove the dead Cassandra pod from the cluster.
    $ kubectl exec -it apigee-cassandra-0 -- nodetool status$ kubectl exec -it apigee-cassandra-0 -- nodetool removenode deadnode_hostID
  2. Remove the VolumeClaim from the dead node to prevent the Cassandra pod from attempting to come up on the dead node because of the affinity:
    kubectl get pvc -nyour_namespacekubectl delete pvcvolumeClaim_name -nyour_namespace
  3. Update the volume template and create PersistentVolume for the newly added node. The following is an example volume template:
    apiVersion:v1kind:PersistentVolumemetadata:name:cassandra-data-3spec:capacity:storage:100GiaccessModes:-ReadWriteOncepersistentVolumeReclaimPolicy:RetainstorageClassName:local-storagelocal:path:/apigee/datanodeAffinity:"required":"nodeSelectorTerms":-"matchExpressions":-"key":"kubernetes.io/hostname""operator":"In""values":["ip-10-30-1-36.ec2.internal"]
  4. Replace the values with the new hostname/IP and apply the template:
    kubectl apply -f volume-template.yaml

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.