Tutorial: Local troubleshooting of a Cloud Run service

This tutorial shows how a service developer can troubleshoot a brokenCloud Run service using Google Cloud Observability tools for discovery and a localdevelopment workflow for investigation.

This step-by-step "case study" companion to thetroubleshooting guide uses a sample project thatresults in runtime errors when deployed, which you troubleshoot to find and fixthe problem.

Objectives

  • Write, build, and deploy a service to Cloud Run
  • Use Error Reporting and Cloud Logging to identify an error
  • Retrieve the container image from Container Registry for a root cause analysis
  • Fix the "production" service, then improve the service to mitigate future problems

Caution: Container Registry is deprecated. Effective March 18, 2025, Container Registry is shut down, and writing images to Container Registry is unavailable. For details on the deprecation and how to migrate to Artifact Registry, seeContainer Registry deprecation.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use thepricing calculator.

New Google Cloud users might be eligible for afree trial.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  5. Verify that billing is enabled for your Google Cloud project.

  6. Enable the Cloud Run Admin API
  7. Install and initialize the gcloud CLI.
  8. Update components:
    gcloudcomponentsupdate
  9. Follow the instructions toinstall Docker locally

Required roles

To get the permissions that you need to complete the tutorial, ask your administrator to grant you the following IAM roles on your project:

For more information about granting roles, seeManage access to projects, folders, and organizations.

You might also be able to get the required permissions throughcustom roles or otherpredefined roles.

Note:IAM basic roles might also contain permissions to complete the tutorial. You shouldn't grant basic roles in a production environment, but you can grant them in a development or test environment.

Setting up gcloud defaults

To configure gcloud with defaults for your Cloud Run service:

  1. Set your default project:

    gcloudconfigsetprojectPROJECT_ID

    ReplacePROJECT_ID with the name of the project you created forthis tutorial.

  2. Configure gcloud for your chosen region:

    gcloudconfigsetrun/regionREGION

    ReplaceREGION with the supported Cloud Runregionof your choice.

Cloud Run locations

Cloud Run is regional, which means the infrastructure thatruns your Cloud Run services is located in a specific region and ismanaged by Google to be redundantly available acrossall the zones within that region.

Meeting your latency, availability, or durability requirements are primaryfactors for selecting the region where your Cloud Run services are run.You can generally select the region nearest to your users but you should considerthe location of theother Google Cloudproducts that are used by your Cloud Run service.Using Google Cloud products together across multiple locations can affectyour service's latency as well as cost.

Cloud Run is available in the following regions:

Subject toTier 1 pricing

  • asia-east1 (Taiwan)
  • asia-northeast1 (Tokyo)
  • asia-northeast2 (Osaka)
  • asia-south1 (Mumbai, India)
  • asia-southeast3 (Bangkok)
  • europe-north1 (Finland)leaf iconLow CO2
  • europe-north2 (Stockholm)leaf iconLow CO2
  • europe-southwest1 (Madrid)leaf iconLow CO2
  • europe-west1 (Belgium)leaf iconLow CO2
  • europe-west4 (Netherlands)leaf iconLow CO2
  • europe-west8 (Milan)
  • europe-west9 (Paris)leaf iconLow CO2
  • me-west1 (Tel Aviv)
  • northamerica-south1 (Mexico)
  • us-central1 (Iowa)leaf iconLow CO2
  • us-east1 (South Carolina)
  • us-east4 (Northern Virginia)
  • us-east5 (Columbus)
  • us-south1 (Dallas)leaf iconLow CO2
  • us-west1 (Oregon)leaf iconLow CO2

Subject toTier 2 pricing

  • africa-south1 (Johannesburg)
  • asia-east2 (Hong Kong)
  • asia-northeast3 (Seoul, South Korea)
  • asia-southeast1 (Singapore)
  • asia-southeast2 (Jakarta)
  • asia-south2 (Delhi, India)
  • australia-southeast1 (Sydney)
  • australia-southeast2 (Melbourne)
  • europe-central2 (Warsaw, Poland)
  • europe-west10 (Berlin)
  • europe-west12 (Turin)
  • europe-west2 (London, UK)leaf iconLow CO2
  • europe-west3 (Frankfurt, Germany)
  • europe-west6 (Zurich, Switzerland)leaf iconLow CO2
  • me-central1 (Doha)
  • me-central2 (Dammam)
  • northamerica-northeast1 (Montreal)leaf iconLow CO2
  • northamerica-northeast2 (Toronto)leaf iconLow CO2
  • southamerica-east1 (Sao Paulo, Brazil)leaf iconLow CO2
  • southamerica-west1 (Santiago, Chile)leaf iconLow CO2
  • us-west2 (Los Angeles)
  • us-west3 (Salt Lake City)
  • us-west4 (Las Vegas)

If you already created a Cloud Run service, you can view theregion in the Cloud Run dashboard in theGoogle Cloud console.

Assembling the code

Build a new Cloud Run greeter service step-by-step.As a reminder, this service creates a runtime error on purpose for thetroubleshooting exercise.

  1. Create a new project:

    Node.js

    Create a Node.js project by defining the service package, initial dependencies,and some common operations.

    1. Create a newhello-service directory:

      mkdir hello-servicecd hello-service
    2. Create a new Node.js project by generating apackage.json file:

      npminit--yesnpminstallexpress@4
    3. Open the newpackage.json file in your editor and configure astartscript to runnode index.js. When you're done, the file will look like this:

      {"name":"hello-broken","description":"Broken Cloud Run service for troubleshooting practice","version":"1.0.0","private":true,"main":"index.js","scripts":{"start":"node index.js","test":"echo \"Error: no test specified\" && exit 0","system-test":"NAME=Cloud c8 mocha -p -j 2 test/system.test.js --timeout=360000 --exit"},"engines":{"node":">=16.0.0"},"author":"Google LLC","license":"Apache-2.0","dependencies":{"express":"^4.17.1"},"devDependencies":{"c8":"^10.0.0","google-auth-library":"^9.0.0","got":"^11.5.0","mocha":"^10.0.0"}}

    If you continue to evolve this service beyond the immediate tutorial, considerfilling in the description, author, and evaluate the license. For more details,read thepackage.json documentation.

    Python

    1. Create a newhello-service directory:

      mkdir hello-servicecd hello-service
    2. Create a requirements.txt file and copy your dependencies into it:

      Flask==3.0.3pytest==8.2.0;python_version >"3.0"# pin pytest to 4.6.11 for Python2.pytest==4.6.11;python_version <"3.0"gunicorn==23.0.0Werkzeug==3.0.3

    Go

    1. Create a newhello-service directory:

      mkdir hello-servicecd hello-service
    2. Create a Go project by initializing a newgo module:

      gomodinitexample.com/hello-service

    You can update the specific name as you wish: you should update the name ifthe code is published to a web-reachable code repository.

    Java

    1. Create a new maven project:

      mvnarchetype:generate\-DgroupId=com.example.cloudrun\-DartifactId=hello-service\-DarchetypeArtifactId=maven-archetype-quickstart\-DinteractiveMode=false
    2. Copy the dependencies into yourpom.xml dependency list (between the<dependencies> elements):

      <dependency><groupId>com.sparkjava</groupId><artifactId>spark-core</artifactId><version>2.9.4</version></dependency><dependency><groupId>org.slf4j</groupId><artifactId>slf4j-api</artifactId><version>2.0.12</version></dependency><dependency><groupId>org.slf4j</groupId><artifactId>slf4j-simple</artifactId><version>2.0.12</version></dependency>
    3. Copy the build setting into yourpom.xml (under the<dependencies> elements):

      <build><plugins><plugin><groupId>com.google.cloud.tools</groupId><artifactId>jib-maven-plugin</artifactId><version>3.4.0</version><configuration><to><image>gcr.io/PROJECT_ID/hello-service</image></to></configuration></plugin></plugins></build>

  2. Create an HTTP service to handle incoming requests:

    Node.js

    constexpress=require('express');constapp=express();app.get('/',(req,res)=>{console.log('hello: received request.');const{NAME}=process.env;if(!NAME){// Plain error logs do not appear in Stackdriver Error Reporting.console.error('Environment validation failed.');console.error(newError('Missing required server parameter'));returnres.status(500).send('Internal Server Error');}res.send(`Hello${NAME}!`);});constport=parseInt(process.env.PORT)||8080;app.listen(port,()=>{console.log(`hello: listening on port${port}`);});

    Python

    importjsonimportosfromflaskimportFlaskapp=Flask(__name__)@app.route("/",methods=["GET"])defindex():"""Example route for testing local troubleshooting.    This route may raise an HTTP 5XX error due to missing environment variable.    """print("hello: received request.")NAME=os.getenv("NAME")ifnotNAME:print("Environment validation failed.")raiseException("Missing required service parameter.")returnf"Hello{NAME}"if__name__=="__main__":PORT=int(os.getenv("PORT"))ifos.getenv("PORT")else8080# This is used when running locally. Gunicorn is used to run the# application on Cloud Run. See entrypoint in Dockerfile.app.run(host="127.0.0.1",port=PORT,debug=True)

    Go

    // Sample hello demonstrates a difficult to troubleshoot service.packagemainimport("fmt""log""net/http""os")funcmain(){log.Print("hello: service started")http.HandleFunc("/",helloHandler)port:=os.Getenv("PORT")ifport==""{port="8080"log.Printf("Defaulting to port %s",port)}log.Printf("Listening on port %s",port)log.Fatal(http.ListenAndServe(fmt.Sprintf(":%s",port),nil))}funchelloHandler(whttp.ResponseWriter,r*http.Request){log.Print("hello: received request")name:=os.Getenv("NAME")ifname==""{log.Printf("Missing required server parameter")// The panic stack trace appears in Cloud Error Reporting.panic("Missing required server parameter")}fmt.Fprintf(w,"Hello %s!\n",name)}

    Java

    import staticspark.Spark.get;import staticspark.Spark.port;importorg.slf4j.Logger;importorg.slf4j.LoggerFactory;publicclassApp{privatestaticfinalLoggerlogger=LoggerFactory.getLogger(App.class);publicstaticvoidmain(String[]args){intport=Integer.parseInt(System.getenv().getOrDefault("PORT","8080"));port(port);get("/",(req,res)->{logger.info("Hello: received request.");Stringname=System.getenv("NAME");if(name==null){// Standard error logs do not appear in Stackdriver Error Reporting.System.err.println("Environment validation failed.");Stringmsg="Missing required server parameter";logger.error(msg,newException(msg));res.status(500);return"Internal Server Error";}res.status(200);returnString.format("Hello %s!",name);});}}

  3. Create aDockerfile to define the container image used to deploy the service:

    Node.js

    #UsetheofficiallightweightNode.jsimage.#https://hub.docker.com/_/nodeFROMnode:20-slim#Createandchangetotheappdirectory.WORKDIR/usr/src/app#Copyapplicationdependencymanifeststothecontainerimage.#Awildcardisusedtoensurecopyingbothpackage.jsonANDpackage-lock.json(whenavailable).#Copyingthisfirstpreventsre-runningnpminstalloneverycodechange.COPYpackage*.json./#Installdependencies.#ifyouneedadeterministicandrepeatablebuildcreatea#package-lock.jsonfileandusenpmci:#RUNnpmci--omit=dev#ifyouneedtoincludedevelopmentdependenciesduringdevelopment#ofyourapplication,use:#RUNnpminstall--devRUNnpminstall--omit=dev#Copylocalcodetothecontainerimage.COPY../#Runthewebserviceoncontainerstartup.CMD["npm","start"]

    Python

    # Use the official Python image.# https://hub.docker.com/_/pythonFROMpython:3.11# Allow statements and log messages to immediately appear in the Cloud Run logsENVPYTHONUNBUFFEREDTrue# Copy application dependency manifests to the container image.# Copying this separately prevents re-running pip install on every code change.COPYrequirements.txt./# Install production dependencies.RUNpipinstall-rrequirements.txt# Copy local code to the container image.ENVAPP_HOME/appWORKDIR$APP_HOMECOPY../# Run the web service on container startup.# Use gunicorn webserver with one worker process and 8 threads.# For environments with multiple CPU cores, increase the number of workers# to be equal to the cores available.# Timeout is set to 0 to disable the timeouts of the workers to allow Cloud Run to handle instance scaling.CMDexecgunicorn--bind:$PORT--workers1--threads8--timeout0main:app

    Go

    #UsetheofficialGoimagetocreateabinary.#ThisisbasedonDebianandsetstheGOPATHto/go.#https://hub.docker.com/_/golangFROMgolang:1.24-bookwormasbuilder#Createandchangetotheappdirectory.WORKDIR/app#Retrieveapplicationdependencies.#Thisallowsthecontainerbuildtoreusecacheddependencies.#Expectingtocopygo.modandifpresentgo.sum.COPYgo.*./RUNgomoddownload#Copylocalcodetothecontainerimage.COPY../#Buildthebinary.RUNgobuild-v-oserver#UsetheofficialDebianslimimageforaleanproductioncontainer.#https://hub.docker.com/_/debian#https://docs.docker.com/develop/develop-images/multistage-build/#use-multi-stage-buildsFROMdebian:bookworm-slimRUNset-x &&apt-getupdate &&DEBIAN_FRONTEND=noninteractiveapt-getinstall-y\ca-certificates &&\rm-rf/var/lib/apt/lists/*#Copythebinarytotheproductionimagefromthebuilderstage.COPY--from=builder/app/server/server#Runthewebserviceoncontainerstartup.CMD["/server"]

    Java

    This sample usesJib to buildDocker images using common Java tools. Jib optimizes container builds withoutthe need for a Dockerfile or havingDockerinstalled. Learn more aboutbuilding Java containers with Jib.

    <plugin><groupId>com.google.cloud.tools</groupId><artifactId>jib-maven-plugin</artifactId><version>3.4.0</version><configuration><to><image>gcr.io/PROJECT_ID/hello-service</image></to></configuration></plugin>

Shipping the code

Shipping code consists of three steps: building a container image withCloud Build, uploading the container image to Container Registry, anddeploying the container image to Cloud Run.

To ship your code:

  1. Build your container and publish on Container Registry:

    Node.js

    gcloudbuildssubmit--taggcr.io/PROJECT_ID/hello-service

    WherePROJECT_ID is your Google Cloud project ID. You can check yourcurrent project ID withgcloud config get-value project.

    Upon success, you should see a SUCCESS message containing the ID, creationtime, and image name. The image is stored in Container Registry and can bere-used if desired.

    Python

    gcloudbuildssubmit--taggcr.io/PROJECT_ID/hello-service

    WherePROJECT_ID is your Google Cloud project ID. You can check yourcurrent project ID withgcloud config get-value project.

    Upon success, you should see a SUCCESS message containing the ID, creationtime, and image name. The image is stored in Container Registry and can bere-used if desired.

    Go

    gcloudbuildssubmit--taggcr.io/PROJECT_ID/hello-service

    WherePROJECT_ID is your Google Cloud project ID. You can check yourcurrent project ID withgcloud config get-value project.

    Upon success, you should see a SUCCESS message containing the ID, creationtime, and image name. The image is stored in Container Registry and can bere-used if desired.

    Java

    1. Use thegcloud credential helperto authorize Docker to push to your Container Registry.
      gcloudauthconfigure-docker
    2. Use the Jib Maven Plugin to build and push the container to Container Registry.
      mvncompilejib:build-Dimage=gcr.io/PROJECT_ID/hello-service

    WherePROJECT_ID is your Google Cloud project ID. You can check your current project ID withgcloud config get-value project.

    Upon success, you should see a BUILD SUCCESS message. The image is stored inContainer Registry and can be re-used if desired.

  2. Run the following command to deploy your app:

    gcloudrundeployhello-service--imagegcr.io/PROJECT_ID/hello-service

    ReplacePROJECT_ID with your Google Cloud project ID.hello-service isboth the container image name and name of the Cloud Run service.Notice that the container image is deployed to the service andregion that you configured previously underSetting up gcloud

    Respondy, "Yes", to theallow unauthenticated prompt. SeeManaging Access for more details onIAM-based authentication.

    Wait until the deployment is complete: this can take about half a minute.On success, the command line displays the service URL.

Trying it out

Try out the service to confirm you have successfully deployed it. Requestsshould fail with a HTTP 500 or 503 error (members of the class5xx Server errors).The tutorial walks through troubleshooting this error response.

The service is auto-assigned a navigable URL.

  1. Navigate to this URL with your web browser:

    1. Open a web browser

    2. Find the service URL output by the earlier deploy command.

      If the deploy command did not provide a URL then something went wrong.Review the error message and act accordingly: if no actionable guidanceis present, review thetroubleshooting guideand possibly retry the deployment command.

    3. Navigate to this URL by copying it into your browser's address bar andpressingENTER.

  2. View the HTTP 500 or HTTP 503 error.

    If you receive a HTTP 403 error, you may have rejectedallow unauthenticated invocations at the deployment prompt.Grant public access to the service to fix this:

    gcloud run services add-iam-policy-binding hello-service \  --member="allUsers" \  --role="roles/run.invoker"

For more information, readAllowing public (unauthenticated) access.

Investigating the problem

Visualize that the HTTP 5xx error encountered above inTrying it outwas encountered as a production runtime error. This tutorial walks through aformal process for handling it. Although production error resolution processesvary widely, this tutorial presents a particular sequence of steps to show theapplication of useful tools and techniques.

To investigate this problem you will work through these phases:

  • Collect more details on the reported error to support further investigation and set a mitigation strategy.
  • Relieve user impact by deciding to push forward in a fix or rollback to a known-healthy version.
  • Reproduce the error to confirm the correct details have been gathered and thatthe error is not a one-time glitch
  • Perform a root cause analysis on the bug to find the code, configuration, orprocess which created this error

At the start of the investigation you have a URL, timestamp, and the message"Internal Server Error".

Gathering further details

Gather more information about the problem to understand what happened anddetermine next steps.

Use available Google Cloud Observability tools to collect more details:

  1. Use the Error Reporting console, which provides a dashboard withdetails and recurrence tracking for errors with a recognizedstack trace.

    Go to Error Reporting console

    Screenshot of the error list including columnns 'Resolution Status', Occurrences, Error, and 'Seen in'.
    List of recorded errors. Errors are grouped by message across revisions, services, and platforms.
  2. Click on the error to see the stack trace details, noting the function callsmade just prior to the error.

    Screenshot of a single parsed stack trace, demonstrating a common profile of this error.
    The "Stack trace sample" in the error details page shows a single instance of the error. You can review each individual instances.
  3. Use Cloud Logging to review the sequence of operations leading to theproblem, including error messages that are not included in theError Reporting console because of a lack of a recognizederror stack trace:

    Go to Cloud Logging console

    SelectCloud Run Revision > hello-service from the first drop-downbox. This will filter the log entries to those generated by your service.

Read more aboutviewing logs in Cloud Run

Rollback to a healthy version

If this is an established service, known to work, there will be a previousrevision of the service on Cloud Run. This tutorial uses a new servicewith no previous versions, so you cannot do a rollback.

However, if you have a service with previous versions you can roll back to,followViewing revision detailsto extract the container name and configuration details necessary to create anew working deployment of your service.

Reproducing the error

Using thedetails you obtained previously, confirm theproblem consistently occurs under test conditions.

Send the same HTTP request bytrying it out again, and see ifthe same error and details are reported. It may take some time for error detailsto show up.

Because the sample service in this tutorial is read-only and doesn't trigger anycomplicating side effects, reproducing errors in production is safe. However,for many real services, this won't be the case: you may need to reproduce errorsin a test environment or limit this step to local investigation.

Reproducing the error establishes the context for further work. For example,if developers cannot reproduce the error further investigation may requireadditional instrumentation of the service.

Performing a root cause analysis

Root cause analysis is an important step ineffective troubleshootingto ensure you fix the problem instead of a symptom.

Previously in this tutorial, you reproduced the problem on Cloud Runwhich confirms the problem is active when the service is hosted onCloud Run. Now reproduce the problem locally to determine if theproblem is isolated to the code or if it only emerges in production hosting.

  1. If you have not used Docker CLI locally with Container Registry, authenticateit with gcloud:

    gcloudauthconfigure-docker

    For alternative approaches seeContainer Registry authentication methods.

  2. If the most recently used container image name is not available, the servicedescription has the information of the most recently deployed container image:

    gcloudrunservicesdescribehello-service

    Find the container image name inside thespec object. A more targetedcommand can directly retrieve it:

    gcloudrunservicesdescribehello-service\--format="value(spec.template.spec.containers.image)"

    This command reveals a container image name such asgcr.io/PROJECT_ID/hello-service.

  3. Pull the container image from the Container Registry to your environment, thisstep might take several minutes as it downloads the container image:

    dockerpullgcr.io/PROJECT_ID/hello-service

    Later updates to the container image that reuse this name can be retrievedwith the same command. If you skip this step, thedocker run command belowpulls a container image if one is not present on the local machine.

  4. Run locally to confirm the problem is not unique to Cloud Run:

    PORT=8080&&dockerrun--rm-ePORT=$PORT-p9000:$PORT\gcr.io/PROJECT_ID/hello-service

    Breaking down the elements of the command above,

    • ThePORT environment variable is used by the service to determine theport to listen on inside the container.
    • Therun command starts the container, defaulting to the entrypointcommand defined in the Dockerfile or a parent container image.
    • The--rm flag deletes the container instance on exit.
    • The-e flag assigns a value to an environment variable.-e PORT=$PORTis propagating thePORT variable from the local system into the containerwith the same variable name.
    • The-p flag publishes the container as a service available onlocalhost at port 9000. Requests to localhost:9000 will be routed to thecontainer on port 8080. This means output from the service about the portnumber in use will not match how the service is accessed.
    • The final argumentgcr.io/PROJECT_ID/hello-serviceis a container imagetag, a human-readable label for a container image'ssha256 hash identifier. If not available locally, docker attempts toretrieve the image from a remote registry.

    In your browser, openhttp://localhost:9000. Check the terminal output forerror messages that match those on {ops_name}}.

    If the problem is not reproducible locally, it may be unique to theCloud Run environment. Review theCloud Run troubleshooting guidefor specific areas to investigate.

    In this case the error is reproduced locally.

Now that the error is doubly-confirmed as persistent and caused by the servicecode instead of the hosting platform, it's time to investigate the code more closely.

For purposes of this tutorial it is safe to assume the code inside the containerand the code in the local system is identical.

Revisit the error report's stack trace and cross-reference with the code to findthe specific lines at fault.

Node.js

Find the source of the error message in the fileindex.js around the linenumber called out in the stack trace shown in the logs:
const{NAME}=process.env;if(!NAME){// Plain error logs do not appear in Stackdriver Error Reporting.console.error('Environment validation failed.');console.error(newError('Missing required server parameter'));returnres.status(500).send('Internal Server Error');}

Python

Find the source of the error message in the filemain.py around the linenumber called out in the stack trace shown in the logs:
NAME=os.getenv("NAME")ifnotNAME:print("Environment validation failed.")raiseException("Missing required service parameter.")

Go

Find the source of the error message in the filemain.go around the linenumber called out in the stack trace shown in the logs:

name:=os.Getenv("NAME")ifname==""{log.Printf("Missing required server parameter")// The panic stack trace appears in Cloud Error Reporting.panic("Missing required server parameter")}

Java

Find the source of the error message in the fileApp.java around the line number called out in the stack trace shown in the logs:

Stringname=System.getenv("NAME");if(name==null){// Standard error logs do not appear in Stackdriver Error Reporting.System.err.println("Environment validation failed.");Stringmsg="Missing required server parameter";logger.error(msg,newException(msg));res.status(500);return"Internal Server Error";}

Examining this code, the following actions are taken when theNAME environmentvariable is not set:

  • An error is logged to Google Cloud Observability
  • An HTTP error response is sent

The problem is caused by a missing variable, but the root cause is more specific:the code change adding the hard dependency on an environment variable did notinclude related changes to deployment scripts and runtime requirements documentation.

Fixing the root cause

Now that we have collected the code and identified the potential root cause,we can take steps to fix it.

  • Check whether the service works locally with theNAME environment availablein place:

    1. Run the container locally with the environment variable added:

      PORT=8080&&dockerrun--rm-ePORT=$PORT-p9000:$PORT\-eNAME="Local World!"\gcr.io/PROJECT_ID/hello-service
    2. Navigate your browser tohttp://localhost:9000

    3. See "Hello Local World!" appear on the page

  • Modify the running Cloud Run service environment to include this variable:

    1. Run the services update command to add an environment variable:

      gcloudrunservicesupdatehello-service\--set-env-varsNAME=Override
    2. Wait a few seconds while Cloud Run creates a new revision based on theprevious revision with the new environment variable added.

  • Confirm the service is now fixed:

    1. Navigate your browser to the Cloud Run service URL.
    2. See "Hello Override!" appear on the page.
    3. Verify that no unexpected messages or errors appear in Cloud Logging orError Reporting.

Improving future troubleshooting speed

In this sample production problem, the error was related to operationalconfiguration. There are code changes that will minimize the impact of thisproblem in the future.

  • Improve the error log to include more specific details.
  • Instead of returning an error, have the service fall back to a safe default.If using a default represents a change to normal functionality, use a warningmessage for monitoring purposes.

Let's step through removing theNAME environment variable as a hard dependency.

  1. Remove the existingNAME-handling code:

    Node.js

    const{NAME}=process.env;if(!NAME){// Plain error logs do not appear in Stackdriver Error Reporting.console.error('Environment validation failed.');console.error(newError('Missing required server parameter'));returnres.status(500).send('Internal Server Error');}

    Python

    NAME=os.getenv("NAME")ifnotNAME:print("Environment validation failed.")raiseException("Missing required service parameter.")

    Go

    name:=os.Getenv("NAME")ifname==""{log.Printf("Missing required server parameter")// The panic stack trace appears in Cloud Error Reporting.panic("Missing required server parameter")}

    Java

    Stringname=System.getenv("NAME");if(name==null){// Standard error logs do not appear in Stackdriver Error Reporting.System.err.println("Environment validation failed.");Stringmsg="Missing required server parameter";logger.error(msg,newException(msg));res.status(500);return"Internal Server Error";}

  2. Add new code that sets a fallback value:

    Node.js

    constNAME=process.env.NAME||'World';if(!process.env.NAME){console.log(JSON.stringify({severity:'WARNING',message:`NAME not set, default to '${NAME}'`,}));}

    Python

    NAME=os.getenv("NAME")ifnotNAME:NAME="World"error_message={"severity":"WARNING","message":f"NAME not set, default to{NAME}",}print(json.dumps(error_message))

    Go

    name:=os.Getenv("NAME")ifname==""{name="World"log.Printf("warning: NAME not set, default to %s",name)}

    Java

    Stringname=System.getenv().getOrDefault("NAME","World");if(System.getenv("NAME")==null){logger.warn(String.format("NAME not set, default to %s",name));}

  3. Test locally by re-building and running the container through the affectedconfiguration cases:

    Node.js

    dockerbuild--taggcr.io/PROJECT_ID/hello-service.

    Python

    dockerbuild--taggcr.io/PROJECT_ID/hello-service.

    Go

    dockerbuild--taggcr.io/PROJECT_ID/hello-service.

    Java

    mvncompilejib:build

    Confirm theNAME environment variable still works:

    PORT=8080&&dockerrun--rm-ePORT=$PORT-p9000:$PORT\-eNAME="Robust World"\gcr.io/PROJECT_ID/hello-service

    Confirm the service works without theNAME variable:

    PORT=8080&&dockerrun--rm-ePORT=$PORT-p9000:$PORT\gcr.io/PROJECT_ID/hello-service

    If the service does not return a result, confirm the removal of code in thefirst step did not remove extra lines, such as those used to write the response.

  4. Deploy this by revisiting theDeploy your code section.

    Each deployment to a service creates a new revision and automatically startsserving traffic when ready.

    To clear the environment variables set earlier:

    gcloud run services update hello-service --clear-env-vars

Add the new functionality for the default value to automated test coverage forthe service.

Success: You completed troubleshooting a broken Cloud Run service using Google Cloud Observability tools.

Finding other issues in the logs

You may see other issues in the Log Viewer for this service. For example, anunsupported system call will appear in the logs as a "Container Sandbox Limitation".

For example, the Node.js services sometimes result in this log message:

Container Sandbox Limitation: Unsupported syscall statx(0xffffff9c,0x3e1ba8e86d88,0x0,0xfff,0x3e1ba8e86970,0x3e1ba8e86a90). Please, refer to https://gvisor.dev/c/linux/amd64/statx for more information.

In this case, the lack of support does not impact the hello-service sample service.

Terraform troubleshooting

For Terraform-related troubleshooting or questions, seeTerraform policy validation troubleshootingor contactTerraform support.

Clean up

To avoid additional charges to your Google Cloud account, delete all the resourcesyou deployed with this tutorial.

Delete the project

If you created a new project for this tutorial, delete the project.If you used an existing project and need to keep it without the changes you addedin this tutorial,delete resources that you created for the tutorial.

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

    Caution: Deleting a project has the following effects:
    • Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
    • Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.

    If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

  1. In the Google Cloud console, go to theManage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then clickDelete.
  3. In the dialog, type the project ID, and then clickShut down to delete the project.

Delete tutorial resources

  1. Delete the Cloud Run service you deployed in this tutorial.Cloud Run services don't incur costs until they receive requests.

    To delete your Cloud Run service, run the following command:

    gcloudrunservicesdeleteSERVICE-NAME

    ReplaceSERVICE-NAME with the name of your service.

    You can also delete Cloud Run services from theGoogle Cloud console.

  2. Remove thegcloud default region configuration you added during tutorialsetup:

    gcloudconfigunsetrun/region
  3. Remove the project configuration:

     gcloud config unset project
  4. Delete other Google Cloud resources created in this tutorial:

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.