General development tips

This guide provides best practices for designing, implementing, testing, anddeploying a Cloud Run service.For more tips, seeMigrating an Existing Service.

Write effective services

This section describes general best practices for designing and implementing aCloud Run service.

Background activity

Background activity is anything that happens after your HTTP response has beendelivered. To determine whether there is background activity in your servicethat is not readily apparent, check your logs for anything that islogged after the entry for the HTTP request.

Configure instance-based billing to use background activities

If you want to support background activities in your Cloud Runservice, set your Cloud Run service toinstance-based billing so you canrun background activities outside of requests and still have CPU access.

Avoid background activities if using request-based billing

If you need to set your service torequest-based billing,when the Cloud Run service finishes handling a request, theinstance's access to CPU will be disabled or severely limited. You shouldn'tstart background threads or routines that run outside the scope of the requesthandlers if you use this type of billing.

Review your code to make sure all asynchronous operations finish before youdeliver your response.

Running background threads with request-based billing enabled can result inunexpected behavior because any subsequent request to the same containerinstance resumes any suspended background activity.

Delete temporary files

In the Cloud Run environment, disk storage is an in-memory filesystem.Files written to disk consume memory otherwise available to your service, andcan persist between invocations. Failing to delete these files can eventuallylead to an out-of-memory error and a subsequent slow container startup times.

Report errors

Handle all exceptions and do not let your service crash on errors. A crash leadsto a slow container startup while traffic is queued for a replacement instance.

See theError reporting guide for information onhow to properly report errors.

Optimize performance

This section describes best practices for optimizing performance.

Start containers quickly

Because instances arescaled as needed,their startup time has impact on the latency of your service.Cloud Run de-couples instance startup and request processing, so insome cases a request must wait for a new instance to startbefore the request is processed. This commonly happens when a service scalesfrom zero.

The startup routine consists of:

Downloading the container image (using Cloud Run's container imagestreaming technology)
Starting the container by running theentrypointcommand.
Waiting for the container to start listening on theconfigured port.

Optimizing for container startup speed minimizes the request processing latency.

Use startup CPU boost to reduce startup latency

You canenable startup CPU boost totemporarily increase CPU allocation during instance startup in orderto reduce startup latency.

Use minimum instances to reduce container startup times

You can configureminimum instances andconcurrency to minimize container startuptimes. For example, using a minimum instances of 1 means that your service isready to receive up to the number of concurrent requests configured for yourservice without needing to start a new instance. When using minimum instances,avoid using system exits that will shut down an instance and potentiallyincrease cold starts.

Note that a request waiting for an instance to start will be kept pending in aqueue as follows:

Requests will pend for up to 3.5 times average startup time of container instances of this service, or 10 seconds, whichever is greater.

Use dependencies wisely

If you use a dynamic language with dependent libraries, such as importing modulesin Node.js, the load time for those modules adds to the startup latency.

Reduce startup latency in these ways:

Minimize the number and size of dependencies to build a lean service.
Lazily load code that is infrequently used, if your language supports it.
Use code-loading optimizations such as PHP'scomposer autoloader optimization.

Use global variables

In Cloud Run, you cannot assume that service state is preserved betweenrequests. However, Cloud Run does reuse individual instancesto serve ongoing traffic, so you can declare a variable in global scope to allowits value to be reused in subsequent invocations. Whether any individual requestreceives the benefit of this reuse cannot be known ahead of time.

You can also cache objects in memory if they are expensive to recreate on eachservice request. Moving this from the request logic to global scope results inbetter performance.

Node.js

constfunctions=require('@google-cloud/functions-framework');// TODO(developer): Define your own computationsconst{lightComputation,heavyComputation}=require('./computations');// Global (instance-wide) scope// This computation runs once (at instance cold-start)constinstanceVar=heavyComputation();/** * HTTP function that declares a variable. * * @param {Object} req request context. * @param {Object} res response context. */functions.http('scopeDemo',(req,res)=>{// Per-function scope// This computation runs every time this function is calledconstfunctionVar=lightComputation();res.send(`Per instance:${instanceVar}, per function:${functionVar}`);});

Python

importtimeimportfunctions_framework# Placeholderdefheavy_computation():returntime.time()# Placeholderdeflight_computation():returntime.time()# Global (instance-wide) scope# This computation runs at instance cold-startinstance_var=heavy_computation()@functions_framework.httpdefscope_demo(request):"""    HTTP Cloud Function that declares a variable.    Args:        request (flask.Request): The request object.        <http://flask.pocoo.org/docs/1.0/api/#flask.Request>    Returns:        The response text, or any set of values that can be turned into a        Response object using `make_response`        <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>.    """# Per-function scope# This computation runs every time this function is calledfunction_var=light_computation()returnf"Instance:{instance_var}; function:{function_var}"

Go

// h is in the global (instance-wide) scope.varhstring// init runs during package initialization. So, this will only run during an// an instance's cold start.funcinit(){h=heavyComputation()functions.HTTP("ScopeDemo",ScopeDemo)}// ScopeDemo is an example of using globally and locally// scoped variables in a function.funcScopeDemo(whttp.ResponseWriter,r*http.Request){l:=lightComputation()fmt.Fprintf(w,"Global: %q, Local: %q",h,l)}

Java

importcom.google.cloud.functions.HttpFunction;importcom.google.cloud.functions.HttpRequest;importcom.google.cloud.functions.HttpResponse;importjava.io.IOException;importjava.io.PrintWriter;importjava.util.Arrays;publicclassScopesimplementsHttpFunction{// Global (instance-wide) scope// This computation runs at instance cold-start.// Warning: Class variables used in functions code must be thread-safe.privatestaticfinalintINSTANCE_VAR=heavyComputation();@Overridepublicvoidservice(HttpRequestrequest,HttpResponseresponse)throwsIOException{// Per-function scope// This computation runs every time this function is calledintfunctionVar=lightComputation();varwriter=newPrintWriter(response.getWriter());writer.printf("Instance: %s; function: %s",INSTANCE_VAR,functionVar);}privatestaticintlightComputation(){int[]numbers=newint[]{1,2,3,4,5,6,7,8,9};returnArrays.stream(numbers).sum();}privatestaticintheavyComputation(){int[]numbers=newint[]{1,2,3,4,5,6,7,8,9};returnArrays.stream(numbers).reduce((t,x)->t*x).getAsInt();}}

Perform lazy initialization of global variables

The initialization of global variables always occurs during startup, whichincreases container startup time. Use lazy initialization for infrequently usedobjects to defer the time cost and decrease container startup times.

One drawback of lazy initialization is an increased latency for first requeststo new instances. This can cause overscaling and dropped requests when youdeploy a new revision of a service that is actively handling many requests.

Node.js

constfunctions=require('@google-cloud/functions-framework');// Always initialized (at cold-start)constnonLazyGlobal=fileWideComputation();// Declared at cold-start, but only initialized if/when the function executesletlazyGlobal;/** * HTTP function that uses lazy-initialized globals * * @param {Object} req request context. * @param {Object} res response context. */functions.http('lazyGlobals',(req,res)=>{// This value is initialized only if (and when) the function is calledlazyGlobal=lazyGlobal||functionSpecificComputation();res.send(`Lazy global:${lazyGlobal}, non-lazy global:${nonLazyGlobal}`);});

Python

importfunctions_framework# Always initialized (at cold-start)non_lazy_global=file_wide_computation()# Declared at cold-start, but only initialized if/when the function executeslazy_global=None@functions_framework.httpdeflazy_globals(request):"""    HTTP Cloud Function that uses lazily-initialized globals.    Args:        request (flask.Request): The request object.        <http://flask.pocoo.org/docs/1.0/api/#flask.Request>    Returns:        The response text, or any set of values that can be turned into a        Response object using `make_response`        <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>.    """globallazy_global,non_lazy_global# noqa: F824# This value is initialized only if (and when) the function is calledifnotlazy_global:lazy_global=function_specific_computation()returnf"Lazy:{lazy_global}, non-lazy:{non_lazy_global}."

Go

// Package tips contains tips for writing Cloud Functions in Go.packagetipsimport("context""log""net/http""sync""cloud.google.com/go/storage""github.com/GoogleCloudPlatform/functions-framework-go/functions")// client is lazily initialized by LazyGlobal.varclient*storage.ClientvarclientOncesync.Oncefuncinit(){functions.HTTP("LazyGlobal",LazyGlobal)}// LazyGlobal is an example of lazily initializing a Google Cloud Storage client.funcLazyGlobal(whttp.ResponseWriter,r*http.Request){// You may wish to add different checks to see if the client is needed for// this request.clientOnce.Do(func(){// Pre-declare an err variable to avoid shadowing client.varerrerrorclient,err=storage.NewClient(context.Background())iferr!=nil{http.Error(w,"Internal error",http.StatusInternalServerError)log.Printf("storage.NewClient: %v",err)return}})// Use client.}

Java

importcom.google.cloud.functions.HttpFunction;importcom.google.cloud.functions.HttpRequest;importcom.google.cloud.functions.HttpResponse;importjava.io.IOException;importjava.io.PrintWriter;importjava.util.Arrays;publicclassLazyFieldsimplementsHttpFunction{// Always initialized (at cold-start)// Warning: Class variables used in Servlet classes must be thread-safe,// or else might introduce race conditions in your code.privatestaticfinalintNON_LAZY_GLOBAL=fileWideComputation();// Declared at cold-start, but only initialized if/when the function executes// Uses the "initialization-on-demand holder" idiom// More information: https://en.wikipedia.org/wiki/Initialization-on-demand_holder_idiomprivatestaticclassLazyGlobalHolder{// Making the default constructor private prohibits instantiation of this classprivateLazyGlobalHolder(){}// This value is initialized only if (and when) the getLazyGlobal() function below is calledprivatestaticfinalIntegerINSTANCE=functionSpecificComputation();privatestaticIntegergetInstance(){returnLazyGlobalHolder.INSTANCE;}}@Overridepublicvoidservice(HttpRequestrequest,HttpResponseresponse)throwsIOException{IntegerlazyGlobal=LazyGlobalHolder.getInstance();varwriter=newPrintWriter(response.getWriter());writer.printf("Lazy global: %s; non-lazy global: %s%n",lazyGlobal,NON_LAZY_GLOBAL);}privatestaticintfunctionSpecificComputation(){int[]numbers=newint[]{1,2,3,4,5,6,7,8,9};returnArrays.stream(numbers).sum();}privatestaticintfileWideComputation(){int[]numbers=newint[]{1,2,3,4,5,6,7,8,9};returnArrays.stream(numbers).reduce((t,x)->t*x).getAsInt();}}

Use a different execution environment

You may experience faster startup times by using adifferent execution environment.

Optimize concurrency

Cloud Run instances can serve multiple requests simultaneously,"concurrently", up to aconfigurable maximum concurrency.

Cloud Run automatically adjusts the concurrency up to the configuredmaximum.

The default maximum concurrency of 80 isa good fit for many container images. However, you should:

Lower it if your container is not able to process many concurrent requests.
Increase it if your container is able to handle a large volume ofrequests.

Tune concurrency for your service

The number of concurrent requests that each instance can serve can belimited by the technology stack and the use of shared resources such asvariables and database connections.

To optimize your service for maximum stable concurrency:

Optimize your service performance.
Set your expected level of concurrency support in any code-level concurrencyconfiguration. Not all technology stacks require such a setting.
Deploy your service.
Set Cloud Run concurrency for your service equal or less than anycode-level configuration. If there is no code-level configuration, use yourexpected concurrency.
Useload testing tools that support a configurable concurrency. You need to confirm thatyour service remains stable under expected load and concurrency.
If the service does poorly, go to step 1 to improve the service or step 2 toreduce the concurrency. If the service does well, go back to step 2 andincrease the concurrency.

Continue iterating until you find the maximum stable concurrency.

Match memory to concurrency

Each request your service handles requires some amount of additional memory.So, when you adjust concurrency up or down, make sure you adjust your memorylimit as well.

Avoid mutable global state

If you want to leverage mutable global state in a concurrent context, take extrasteps in your code to ensure this is done safely. Minimize contention by limitingglobal variables to one-time initialization and reuse as described above underPerformance.

If you use mutable global variables in a service that serves multiple requestsat the same time, make sure to use locks or mutexes to prevent race conditions.

Throughput versus latency versus cost tradeoffs

Tuning the maximum concurrent requests setting can help balance the tradeoffbetween throughput, latency, and cost for your service.

In general, a lower maximum concurrent requests setting results in lower latencyand lower throughput per instance. With lower maximimum concurrent requests,fewer requests compete for resources inside each instance and each requestachieves better performance. But because each instance can serve fewer requestsat once, the per instance throughput is lower and the service needs moreinstances to serve the same traffic.

In the opposite direction, a higher maximum concurrent requests settinggenerally results in higher latency and higher throughput per instance. Requestsmight need to wait for access to resources like CPU, GPU, and memory bandwidthinside the instance, which leads to increased latency. But each instance canprocess more requests at once such that the service needs less instances overallto process the same traffic.

Cost considerations

Cloud Run pricing is per instance time. If you setinstance-based billing,the instance time is the total lifetime of each instance. If you setrequest-based billing, the instance time is the time each instance spendsprocessing at least one request.

The impact of maximum concurrent requests on billing depends on your trafficpattern. Lowering maximum concurrent requestscan result in a lower billifthe lower setting leads to

Decreased latency
Instances completing their work faster
Instances shutting down faster even if more total instances are required

But the opposite is also possible: lowering maximum concurrent requests canincrease billing if the increase in number of instances is not outweighed bythe reduction in time that each instance is running, due to the improved latency.

The best way to optimize billing is throughload testingusing different maximum concurrent requests settings to identify the settingthat results in the lowest billable instance time, as seen in thecontainer/billable_instance_time monitoring metric.

Container security

Many general purpose software security practices apply to containerized services.There are some practices that are either specific to containers or thatalign with the philosophy and architecture of containers.

To improve container security:

Use actively maintained and secure base images such as Googlebase images or Docker Hub'sofficial images.
Note: As of November 1, 2020, Docker Hub rate limits apply to unauthenticated or authenticated pull requests on theDocker Free plan. To avoid disruptions and have greater control over yoursoftware supply chain, you can migrate your dependencies toArtifact Registry.
Apply security updates to your services by regularly rebuilding containerimages and redeploying your services.
Include in the container only what is necessary to run your service. Extracode, packages, and tools are potential security vulnerabilities. See abovefor the relatedperformance impact.
Implement adeterministic build process that includes specific software and library versions. This prevents unverifiedcode from being included in your container.
Set your container to run as a user other thanroot with theDockerfileUSER statement.Some container images may already have a specific user configured.
Prevent the use of Preview features by usingcustom organization policies.

Automate security scanning

Enablevulnerability scanningfor security scanning of container images stored inArtifact Registry.

Build minimal container images

Large container images likely increase security vulnerabilities because theycontain more than what the code needs.

Because of Cloud Run's container image streaming technology, the sizeof your container imagedoes not affect container startup times or request processing time.The container image size also does not count towards theavailable memoryof your container.

To build a minimal container, consider working from a lean base image such as:

Ubuntu is larger in size, but is a commonlyused base image with a more complete out-of-box server environment.

If your service has a tool-heavy build process consider usingmulti-stage builds to keep your container light at run time.

These resources provide further information on creating lean container images:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

General development tips Stay organized with collections Save and categorize content based on your preferences.

Write effective services

Background activity

Configure instance-based billing to use background activities

Avoid background activities if using request-based billing

Delete temporary files

Report errors

Optimize performance

Start containers quickly

Use startup CPU boost to reduce startup latency

Use minimum instances to reduce container startup times

Use dependencies wisely

Use global variables

Node.js

Python

Go

Java

Perform lazy initialization of global variables

Node.js

Python

Go

Java

Use a different execution environment

Optimize concurrency

Tune concurrency for your service

Match memory to concurrency

Avoid mutable global state

Throughput versus latency versus cost tradeoffs

Cost considerations

Container security

Automate security scanning

Build minimal container images

General development tips