Troubleshoot IBM Spectrum Symphony connectors Stay organized with collections Save and categorize content based on your preferences.
This document helps you resolve common issues with the IBM Spectrum Symphonyintegration for Google Cloud. Specifically, this document providestroubleshooting guidance for theIBM Spectrum Symphony host factoryservice,the connectors forCompute Engine andGKEprovidersand theSymphony Operator forKubernetes.
Symphony host factory service issues
These issues relate to the centralSymphony host factory service. You can findthe main log file for this service at the following location on Linux:
$EGO_TOP/hostfactory/log/hostfactory.hostname.logYou set the$EGO_TOP environment variable when youload the host factory environmentvariables.In IBM Spectrum Symphony,$EGO_TOP points to theinstallation root of the Enterprise Grid Orchestrator (EGO), which is the coreresource manager for the cluster. The default installation path for$EGO_TOPon Linux is typically/opt/ibm/spectrumcomputing.
Cluster doesn't add new VMs for pending workloads
This issue occurs when the Symphony queue contains jobs, but the host factoryfails to provision new virtual machines (VMs) to manage the load. The hostfactory log file contains noSCALE-OUT messages.
This issue usually occurs when the Symphony requestor isn't correctlyconfigured or enabled. To resolve the issue, check the status of theconfigured requestor to verify that it is enabled and that there is a pendingworkload.
Locate the requestor configuration file. The file is typically located at:
$HF_TOP/conf/requestors/hostRequestors.jsonThe
$HF_TOPenvironment variable is defined in your environment when youuse thesourcecommand.The value is the path to the top-level installation directory for the IBMSpectrum Symphony host factory service.Open the
hostRequestors.jsonfile and locate thesymAinstentry. In that section, verify that theenabledparameter is set to avalue of1and that the providers list includes the name of yourconfigured Google Cloud provider instance.- For Compute Engine configurations,the provider list must show the name of the Compute Engine providerthat you created inEnable the provider instanceduring the Compute Engine provider installation.
- For GKE configurations, the provider list must show thename of the GKE provider that you created inEnable the provider instanceduring the GKE provider provider installation.
After you confirm that the symAinst requestor is enabled, check if a consumerhas a pending workload that requires a scale-out.
View a list of all consumers and their workload status:
egoshconsumerlistIn the output, look for the consumer associated with your workload andverify that the workload is pending. If the requestor is enabled and aworkload is pending, but the host factory service does not initiatescale-out requests, then check the HostFactory service logs for errors.
Host factory service not starting
If the host factory service doesn't run, follow these steps to resolve the issue:
Check the status of the
HostFactoryservice:egoshservicelistIn the output, locate the
HostFactoryservice and check that theSTATEfield shows a status ofSTARTED.If the
HostFactoryservice is not started, restart it:egoshservicestopHostFactoryegoshservicestartHostFactory
Other errors and logging
If you encounter other errors with the host factory service, then increase thelog verbosity to get more detailed logs. To do so, complete the following steps:
Open the
hostfactoryconf.jsonfile for editing. The file is typicallylocated at:$EGO_TOP/hostfactory/conf/For more information about the value of the
$EGO_TOPenvironment variable, seeSymphony host factory service issues.Update the
HF_LOGLEVELvalue fromLOG_INFOtoLOG_DEBUG:{..."HF_LOGLEVEL":"LOG_DEBUG",...}Save the file after you make the change.
To make the change take effect, restart the
HostFactoryservice:egoshservicestopHostFactoryegoshservicestartHostFactory
After you restart, theHostFactory service generates more detailed logs, whichyou can use to troubleshoot complex issues. You can view these logs in the mainhost factory log file, located at$EGO_TOP/hostfactory/log/hostfactory.hostname.log on Linux.
Host factory provider issues
The following issues occur within thehost factory providerscripts forCompute Engine or Google Kubernetes Engine.
Check the provider logs (hf-gce.log orhf-gke.log) for detailed errormessages. The location of thehf-gce.log andhf-gke.log files is determinedby theLOGFILE variable set in the provider's configuration file inEnablethe providerinstance.
Virtual machine or pod is not provisioned
This issue might occur after the host factory provider logs show a call to therequestMachines.sh script, but the resource doesn't appear in your Google Cloudproject.
To resolve this issue, follow these steps:
Check the provider script logs (
hf-gce.logorhf-gke.log) for error messages from the Google Cloud API. The location of thehf-gce.logandhf-gke.logfiles is determined by theLOGFILEvariable setin the provider's configuration file inEnable the providerinstance.Verify that the service account has the correct IAM permissions:
- Follow the instructions inView currentaccess.
- Verify that the service account has theCompute Instance Admin(v1)(roles/compute.instanceAdmin.v1) IAM role on the project.For more information about how to grant roles, seeManage access toprojects, folders, andorganizations.
To ensure that the Compute Engine parameters in your host templateare valid, you must verify the following:
The host template parameters must be in the
gcpgceinstprov_templates.jsonfile that you created when youset up a providerinstanceduring the Compute Engine provider installation. The most commonparameters to validate aregcp_zoneandgcp_instance_group.Verify that the instance group set by the
gcp_instance_groupparameterexists. To confirm the instance group, follow the instructions inView aMIG'sproperties,by using thegcp_instance_groupname andgcp_zonezone values from thetemplate file.
Pod gets stuck inPending orError state on GKE
This issue might occur after thehf-gke log shows it created theGCPSymphonyResource resource, but the corresponding pod in the GKE clusternever reaches aRunning state and might show a status likePending,ImagePullBackOff, orCrashLoopBackOff.
This issue occurs if there is a problem within the Kubernetes cluster, such asan invalid container image name, insufficient CPU or memory resources, or amisconfigured volume or network setting.
To resolve this issue, usekubectl describe to inspect the events for both thecustom resource and the pod to identify the root cause:
kubectldescribegcpsymphonyresourceRESOURCE_NAMEkubectldescribepodPOD_NAMEReplace the following:
RESOURCE_NAME: the name of the resource.POD_NAME: the name of the pod.
Troubleshoot Kubernetes operator issues
TheKubernetesoperatormanages the lifecycle of a Symphony pod. The followingsections can help you troubleshoot common issues you might encounter with theoperator and these resources.
Diagnose issues with resource status fields
The Kubernetes operator manages Symphony workloads in GKE withtwo primary resource types:
- The
GCPSymphonyResource(GCPSR) resource manages the lifecycle of compute pods forSymphony workloads. - The
MachineReturnRequest(MRR) resource handles the return and cleanup of computeresources.
Use these status fields to diagnose issues with theGCPSymphonyResourceresource:
phase: The current lifecycle phase of the resource. The options arePending,Running,WaitingCleanup, orCompleted.availableMachines: The number of compute pods that are ready.conditions: Detailed status conditions with timestamps.returnedMachines: A list of returned pods.
Use these status fields to diagnose issues with theMachineReturnRequestresource:
phase: The current phase of the return request. The options arePending,InProgress,Completed,Failed, orPartiallyCompleted.totalMachines: The total number of machines to return.returnedMachines: The number of successfully returned machines.failedMachines: The number of machines that failed to return.machineEvents: Per-machine status details.
GCPSymphonyResource resource stuck in thePending state
This issue occurs when theGCPSymphonyResource resource remains in thePending state and the value ofavailableMachines does not increase.
This issue might occur for one these reasons:
- Insufficient node capacity in your cluster.
- Problems with pulling the container image.
- Resource quota limitations.
To resolve this issue:
Check the status of the pods to identify any issues with image pulls orresource allocation:
kubectldescribepods-ngcp-symphony-lsymphony.requestId=REQUEST_IDReplace
REQUEST_IDwith your request ID.Inspect nodes to ensure sufficient capacity:
kubectlgetnodes-owidePods might show a
Pendingstatus. This issue usually occurs when the Kubernetescluster needs to scale up and takes longer than expected. Monitor the nodesto ensure the control plane can scale out.
Pods are not returned
This issue occurs when you create aMachineReturnRequest (MRR), but the numberofreturnedMachines does not increase.
This issue can occur for these reasons:
- Pods are stuck in a
Terminatingstate. - There are node connectivity issues.
To resolve this issue:
Check for pods stuck in the
Terminatingstate:kubectlgetpods-ngcp-symphony--field-selector=status.phase=TerminatingDescribe the
MachineReturnRequestto get details about the return process:kubectldescribemrrMRR_NAME-ngcp-symphonyReplace
MRR_NAMEwith the name of yourMachineReturnRequest.Manually delete the custom resource object. This deletion activates the finalcleanup logic:
kubectldeletegcpsymphonyresourceRESOURCE_NAMEReplace
RESOURCE_NAMEwith the name of theGCPSymphonyResourceresource.
High number of failed machines in aMachineReturnRequest
This issue occurs when thefailedMachines count in theMachineReturnRequeststatus is greater than0. This issue can occur for these reasons:
- Pod deletion has timed out.
- A node is unavailable.
To resolve this issue:
Check the
machineEventsin theMachineReturnRequeststatus for specificerror messages:kubectldescribemrrMRR_NAME-ngcp-symphonyLook for node failure events or control plane performance issues:
Get the status of all nodes:
kubectlgetnodes-owideInspect a specific node:
kubectldescribenodeNODE_NAME
Pods are not deleted
This issue occurs when deleted pods are stuck in aTerminating orErrorstate.
This issue can occur for these reasons:
- An overwhelmed control plane or operator, which can cause timeouts or APIthrottle events.
- The manual deletion of the parent
GCPSymphonyResourceresource.
To resolve this issue:
Check if the parent
GCPSymphonyResourceresource is still available and not in theWaitingCleanupstate:kubectldescribegcpsymphonyresourceRESOURCE_NAMEIf the parent
GCPSymphonyResourceresource is no longer on the system,manually remove the finalizer from the pod or pods. The finalizer tellsKubernetes to wait for the Symphony operator to complete its cleanup tasksbefore Kubernetes fully deletes the pod. First, inspect the YAML configurationto find the finalizer:kubectlgetpods-ngcp-symphony-lsymphony.requestId=REQUEST_ID-oyamlReplace
REQUEST_IDwith the request IDassociated with the pods.In the output, look for the finalizers field within the metadata section.You should see an output similar to this snippet:
metadata:...finalizers:- symphony-operator/finalizerTo manually remove the finalizer from the pod or pods, use the
kubectlpatchcommand:kubectlpatchpod-ngcp-symphony-lsymphony.requestId=REQUEST_ID--typejson-p'[{"op": "remove", "path": "/metadata/finalizers", "value": "symphony-operator/finalizer"}]'Replace
REQUEST_IDwith the request IDassociated with the pods.
Old Symphony resources are not automatically deleted from the GKE cluster
After a workload completes and GKE stops its pods, the associatedGCPSymphonyResource andMachineReturnRequest objects remain in yourGKE cluster for longer than the expected 24-hour cleanup period.
This issue occurs when aGCPSymphonyResource object lacks the requiredCompleted status condition. The operator's automatic cleanup process dependson this status to remove the object. To resolve this issue, complete thefollowing steps:
Review the details of the
GCPSymphonyResourceresource in question:kubectlgetgcpsrGCPSR_NAME-oyamlReplace
GCPSR_NAMEwith the name of theGCPSymphonyResourceresource with this issue.Review the conditions for one of type
Completedwith a status ofTrue:status: availableMachines: 0 conditions: - lastTransitionTime: "2025-04-14T14:22:40.855099+00:00" message: GCPSymphonyResource g555dc430-f1a3-46bb-8b69-5c4c481abc25-2pzvc has no pods. reason: NoPods status: "True" # This condition will ensure this type: Completed # custom resource is cleaned up by the operator phase: WaitingCleanup returnedMachines: - name: g555dc430-f1a3-46bb-8b69-5c4c481abc25-2pzvc-pod-0 returnRequestId: 7fd6805f-9a00-41f9-afe9-c38aa35002db returnTime: "2025-04-14T14:22:39.373216+00:00"If this condition is not seen on the
GCPSymphonyResourcedetails, but thephase: WaitingCleanupis shown instead, theCompletedevent has beenlost.Check for pods associated with the
GCPSymphonyResource:kubectlgetpods-lsymphony.requestId=REQUEST_IDReplace
REQUEST_IDwith the request ID.If no pods exist, safely delete the
GCPSymphonyResourceresource:kubectldeletegcpsrGCPSR_NAMEReplace
GCPSR_NAMEwith the name of yourGCPSymphonyResource.If the pods existed before you deleted the
GCPSymphonyResource, then you mustdelete them. If the pods still exist, then follow the steps in thePods are notdeleted section.
Pod does not join the Symphony cluster
This issue happens when a pod runs in GKE, but itdoesn't appear as a valid host in the Symphony cluster.
This issue occurs if the Symphony software runs inside the pod is unable toconnect and register with the Symphony primary host. This issue is often due to networkconnectivity issues or misconfiguration of the Symphony clientwithin the container.
To resolve this issue, check the logs of the Symphony services running insidethe pod.
Use SSH or exec to access the pod and view the logs:
kubectlexec-itPOD_NAME--/bin/bashReplace
POD_NAMEwith the name of the pod.When you have a sh inside the pod, the logs for the EGO and LIM daemonsare located in the
$EGO_TOP/kernel/log directory. The$EGO_TOPenvironment variable points to the root of the IBM Spectrum Symphonyinstallation:cd$EGO_TOP/kernel/logFor more information on the value of the
$EGO_TOPenvironment variable,seeSymphony host factory serviceissues.Examine logs for configuration or network errors that block the connectionfrom the GKE pod to the on-premises Symphony primary pod.
Machine return request fails
This issue might occur during scale-in operations when you create aMachineReturnRequest custom resource, but the object gets stuck, and theoperator does not terminate the corresponding Symphony pod.
A failure in the operator's finalizer logic prevents the clean deletion of thepod and its associated custom resource. This problem can lead to orphanedresources and unnecessary costs.
To resolve this issue, manually delete the custom resource, which should activatethe operator's cleanup logic:
kubectldeletegcpsymphonyresourceRESOURCE_NAMEReplaceRESOURCE_NAME with the name of theresource.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-05 UTC.