Optimize Java applications for Cloud Run

This guide describes optimizations for Cloud Run serviceswritten in the Java programming language, along with background information tohelp you understand the tradeoffs involved in some of the optimizations. Theinformation on this page supplements thegeneral optimization tips, which also apply to Java.

Conventional Java web-based applications are designed to serve requests with highconcurrency and low latency, and tend to be long-running applications. The JVMitself also optimizes the execution code over time with JIT, so that hot pathsare optimized and applications run more efficiently over time.

Many of the best practices and optimizations in these conventional Javaweb-based application revolve around:

Handling concurrent requests (both thread-based and non-blocking I/O)
Reducing response latency using connection pooling and batching non-criticalfunctions, for example sending traces and metrics to background tasks.

While many of these conventional optimizations work well forlong-running applications, they may not work as well in a Cloud Runservice, which runs only when actively serving requests. This page takes youthrough a few different optimizations and tradeoffs for Cloud Runthat you can use to reduce startup time and memory usage.

Use startup CPU boost to reduce startup latency

You canenable startup CPU boost totemporarily increase CPU allocation during instance startup in orderto reduce startup latency.

Google's metrics have shown that Java apps benefit if they use startup CPUboost, which can reduce startup times by up to 50%.

Optimize your Java application's container image

By optimizing the container image, you can reduce load and startup times. Youcan optimize the image by:

Minimizing the container image
Avoiding use of nested library archive JARs
Using Jib

Minimize container image

Refer to the general tips page onminimizing container for morecontext on this issue. The general tips page recommends reducing containerimage content to only what's needed. For example, make sure your container imagedoes not contain :

Source code
Maven build artifacts
Build tools
Git directories
Unused binaries or utilities

If you are building the code from within a Dockerfile, use Docker multi-stagebuild so that the final container image only has the JRE and the application JARfile itself.

Avoid nested library archives JARs

Some popular frameworks, like Spring Boot, create an application archive (JAR)file that contains additional library JAR files (nested JARs). These files needto be unpacked (decompressed) during startup time, which can negatively impactstartup speed in Cloud Run. So, when possible, create a thin JAR withexternalized libraries: this can be automated byusing Jib to containerizeyour application

Use Jib

Use theJib plugin to create aminimal container and flatten the application archive automatically. Jib workswith both Maven and Gradle, and works with Spring Boot applications out of thebox. Some application frameworks may require additional Jib configurations.

JVM optimizations for Cloud Run Java applications

Optimizing the JVM for a Cloud Run service can result in betterperformance and memory usage.

Use container-aware JVM Versions

In VM and machines, for CPU and memory allocations, the JVM understands the CPUand memory it can use from well known locations, for example, in Linux,/proc/cpuinfo, and/proc/meminfo. However, when running in a container, theCPU and memory constraints are stored in/proc/cgroups/.... Older version ofthe JDK continue to look in/proc instead of/proc/cgroups, which canresult in more CPU and memory usage than was assigned. This can cause:

An excessive number of threads because thread pool size is configured byRuntime.availableProcessors()
A default max heap that exceeds the container memory limit. The JVMaggressively uses the memory before it garbage collects. This can easily causethe container to exceed the container memory limit, and get OOMKilled.

So, use a container aware JVM version. OpenJDK versions greater or equal toversion8u192 is container aware by default.

How to understand JVM Memory Usage

The JVM memory usage is composed of native memory usage and heap usage. Yourapplication working memory is usually in the heap. The size of the heap isconstrained by the Max Heap configuration. With a Cloud Run 256MB RAMinstance, you cannot assign all 256MB to the Max Heap, because the JVM and theOS also require native memory, for example, thread stack, code caches,file handles, buffers, etc. If your application is getting OOMKilled and youneed to know the JVM memory usage (native memory + heap), turn on Native MemoryTracking to see usages upon a successful application exit. If your applicationgets OOMKilled, then it won't be able to print out the information. In that case,run the application with more memory first so that it can successfully generate the output.

Native Memory Tracking cannot be turned on via theJAVA_TOOL_OPTIONS environment variable. You need to add the Java commandline startup argument to your container image entrypoint, so that yourapplication is started with these arguments:

java-XX:NativeMemoryTracking=summary\-XX:+UnlockDiagnosticVMOptions\-XX:+PrintNMTStatistics\...

Consider using an open sourceJava Memory Calculatorto estimate memory needs.

Turn off the optimization compiler

By default, JVM has several phases of JIT compilation. Although these phasesimprove the efficiency of your application over time, they can also add overheadto memory usage, and increase the startup time.

For short-running, serverless applications (for example, functions), considerturning off the optimization phases to trade long term efficiency for reducedstartup time.

For a Cloud Run service, configure the environmental variable:

JAVA_TOOL_OPTIONS="-XX:+TieredCompilation -XX:TieredStopAtLevel=1"

Use application class-data sharing

To further reduce JIT time and memory usage, consider usingapplication class data sharing (AppCDS) toshare the ahead-of-time compiled Java classes as an archive.The AppCDS archive can be re-used when starting another instance of the same Javaapplication. The JVM can re-use the pre-computed data from the archive, whichreduces startup time.

The following considerations apply to using AppCDS:

The AppCDS archive to be re-used must be reproduced by exactly the same OpenJDKdistribution, version, and architecture that was originally used to produce it.
You must run your application at least once to generate the list of classesto be shared, and then use that list to generate the AppCDS archive.
The coverage of the classes depends on the codepath executed duringthe run of the application. To increase coverage, programmatically trigger morecodepaths.
The application must exit successfully to generate this classes list. Considerimplementing an application flag that is used to indicate generation of AppCDSarchive, and so it can exit immediately.
The AppCDS archive can only be re-used if you launch new instances in exactlythe same way that the archive was generated.
The AppCDS archive only works with a regular JAR file package; you can't usenested JARs.

Spring Boot example using a shaded JAR file

Spring Boot applications use a nested uber JAR by default, which won't work forAppCDS. So, if you're using AppCDS, you need to create a shaded JAR. Forexample, using Maven and the Maven Shade Plugin:

<build><finalName>helloworld</finalName><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-shade-plugin</artifactId><configuration><keepDependenciesWithProvidedScope>true</keepDependenciesWithProvidedScope><createDependencyReducedPom>true</createDependencyReducedPom><filters><filter><artifact>*:*</artifact><excludes><exclude>META-INF/*.SF</exclude><exclude>META-INF/*.DSA</exclude><exclude>META-INF/*.RSA</exclude></excludes></filter></filters></configuration><executions><execution><phase>package</phase><goals><goal>shade</goal></goals><configuration><transformers><transformerimplementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"><resource>META-INF/spring.handlers</resource></transformer><transformerimplementation="org.springframework.boot.maven.PropertiesMergingResourceTransformer"><resource>META-INF/spring.factories</resource></transformer><transformerimplementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"><resource>META-INF/spring.schemas</resource></transformer><transformerimplementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/><transformerimplementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"><mainClass>${mainClass}</mainClass></transformer></transformers></configuration></execution></executions></plugin></plugins></build>

If your shaded JAR contains all the dependencies, you can produce asimple archive during the container build using aDockerfile:

# Use Docker's multi-stage buildFROMeclipse-temurin:11-jreasAPPCDSCOPYtarget/helloworld.jar/helloworld.jar# Run the application, but with a custom trigger that exits immediately.# In this particular example, the application looks for the '--appcds' flag.# You can implement a similar flag in your own application.RUNjava-XX:DumpLoadedClassList=classes.lst-jarhelloworld.jar--appcds=true# From the captured list of classes (based on execution coverage),# generate the AppCDS archive file.RUNjava-Xshare:dump-XX:SharedClassListFile=classes.lst-XX:SharedArchiveFile=appcds.jsa--class-pathhelloworld.jarFROMeclipse-temurin:11-jre# Copy both the JAR file and the AppCDS archive file to the runtime container.COPY--from=APPCDS/helloworld.jar/helloworld.jarCOPY--from=APPCDS/appcds.jsa/appcds.jsa# Enable Application Class-Data sharingENTRYPOINTjava-Xshare:on-XX:SharedArchiveFile=appcds.jsa-jarhelloworld.jar

Reduce thread stack size

Most Java web applications are thread-per-connection based. Each Java threadconsumes native memory (not in heap). This is known as the thread stack, and itis defaulted to 1MB per thread. If your application handles 80 concurrentrequests, then it may have at least 80 threads, which translates to 80MB ofthread stack space used. The memory is in addition to the heap size. Thedefault may be larger than necessary. You can reduce the thread stack size.

If you reduce too much, then you will seejava.lang.StackOverflowError. Youcan profile your application and find the optimal thread stack size to configure.

For a Cloud Run service, configure the environmental variable:

JAVA_TOOL_OPTIONS="-Xss256k"

Thread reduction for Java application performance

You can optimize memory by reducing the number of threads, by using non-blockingreactive strategies and avoiding background activities.

Reduce number of threads

Each Java thread may increase the memory usage due to the Thread Stack.Cloud Run allows a maximum of 1000 concurrentrequests. With thread-per-connection model, you need at maximum1000 threads to handle all the concurrent requests.Most web servers and frameworks allow you to configure the max number ofthreads and connections. For example, in Spring Boot, you can cap the maximumconnections in theapplications.properties file:

server.tomcat.max-threads=80

Write non-blocking reactive code to optimize memory and startup

To truly reduce the number of threads, consider adopting a non-blocking reactiveprogramming model, so that the number of threads can be significantly reducedwhile handling more concurrent requests. Application frameworks like Spring Bootwith Webflux, Micronaut, and Quarkus support reactive web applications.

Reactive frameworks such as Spring Boot with Webflux, Micronaut, Quarkusgenerally have faster startup times.

If you continue to write blocking code in a non-blocking framework, thethroughput and error rates will be significantly worse in a Cloud Runservice. This is because non-blocking frameworks will only have a few threads,for example, 2 or 4. If your code is blocking, then it can handle very fewconcurrent requests.

These non-blocking frameworks may also offload blocking code to anunbounded thread pool - meaning that while it can accept many concurrentrequests, the blocking code will execute in new threads. If threads accumulatein an unbounded way, you will exhaust the CPU resource and start to thrash.Latency will be severely impacted. If you use a non-blocking framework, be sureto understand the thread pool models and bound the pools accordingly.

Configure instance-based billing if you use background activities

Background activity is anything that happens after your HTTP response has beendelivered. Traditional workloads that have background tasks need specialconsideration when running in Cloud Run.

Configure instance-based billing

If you want to support background activities in your Cloud Runservice, set your Cloud Run service toinstance-based billing so you canrun background activities outside of requests and still have CPU access.

Avoid background activities if using request-based billing

If you need to set your service torequest-based billing,you need to be aware of potential issues with background activities. Forexample, if you are collecting application metrics and batching the metricsin the background to send periodically, then those metrics won't send whenrequest-based billing is configured. If your application is constantly receivingrequests, you may see fewer issues. If your application has low QPS, then thebackgrounded task may never execute.

Some well known patterns that are backgrounded that you need to pay attention toif you choose request-based billing:

JDBC Connection Pools - clean ups and connection checks usually happens in the background
Distributed Trace Senders - Distributed traces are usually batched and sent periodically or when the buffer is full in the background.
Metrics Senders - Metrics are usually batched and sent periodically in the background.
For Spring Boot, any methods annotated with@Async annotation
Timers - any Timer-based triggers (e.g., ScheduledThreadPoolExecutor,Quartz, or@Scheduled Spring annotation) may not execute whenrequest-based billing is configured.
Message receivers - For example, Pub/Sub streaming pull clients, JMS clients, or Kafka clients, usually run in the background threads without need of requests. These will not work when your application has no requests. Receiving messages this way is not recommended in Cloud Run.

Application optimizations

In your Cloud Run service code, you can also optimize for fasterstartup times and memory usage.

Reduce startup tasks

During startup, Java web applications often have to handle multiple tasks likepreloading data, warming up caches, and establishing connection pools.When you execute these tasks sequentially, your applicationmight become slow. To execute these tasks in parallel, increase the number ofCPU cores.

Cloud Run sends a real user request to trigger a cold-start instance.Users who have a request assigned to a newly started instance might experiencelong delays.

For applications with long startup times, consider using a startup probe. Astartup check ensures that Cloud Run sends user requests to an instanceonly after it fully initializes and passes the startup health check. For moreinformation, seeConfigure container health checks forservices.

Use connection pooling

If you use connection pools, be aware that connection pools may evict unneededconnections in the background (seeAvoiding background tasks).If your application has low QPS, and can tolerate high latency, consideropening and closing connections per request. If your application has high QPS,then background evictions may continue to execute as long as there are activerequests.

In both cases, the application's database access will be bottlenecked by themaximum connections allowed by the database. Calculate the maximum connectionsyou can establish per Cloud Run instance, andconfigure Cloud Run maximum instancesso that the maximum instances times connections per instance is less than the maximumconnections allowed.

If you use Spring Boot

Note:Enabling CPU Boost can resultin a 50% startup time reduction.

If you use Spring Boot, you need to consider the following optimizations

Use Spring Boot version 2.2 or greater

Starting with version 2.2, Spring Boot has been heavily optimized for startupspeed. If you are using Spring Boot versions less than 2.2, consider upgrading,orapply individual optimizations manually.

Use lazy initialization

There is a global lazy initialization flag that can be turned on in Spring Boot2.2 and greater. This will improve the startup speed, but with the trade offthat the first request may have longer latency because it will need to wait forcomponents to initialize for the first time.

You can turn on lazy initialization inapplication.properties:

spring.main.lazy-initialization=true

Or, by using an environmental variable:

SPRING_MAIN_LAZY_INITIALIZATIION=true

However, if you are using min-instances, then lazy initialization is not goingto help, since initialization should have occurred when the min-instancestarted.

Avoid class scanning

Class scanning will cause additional disk reads in Cloud Run becausein Cloud Run, disk access is generally slower than a regular machine.Make sure that Component Scan is limited or completely avoided.

Use Spring Boot developer tools not in production

If you useSpring Boot Developer Toolduring development, make sure it is not packaged in the production containerimage. This may happen if you built the Spring Boot application without theSpring Boot build plugins (for example, using the Shade plugin, or using Jib tocontainerize).

In these cases, make sure the build tool excludes Spring Boot Dev toolexplicitly. Or,turn off the Spring Boot Developer Tool explicitly).

What's next

For more tips, see

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Optimize Java applications for Cloud Run Stay organized with collections Save and categorize content based on your preferences.