Python 2.7 has reached end of supportand will bedeprecatedon January 31, 2026. After deprecation, you won't be able to deploy Python 2.7applications, even if your organization previously used an organization policy tore-enable deployments of legacy runtimes. Your existing Python2.7 applications will continue to run and receive traffic after theirdeprecation date. We recommend thatyoumigrate to the latest supported version of Python.

Troubleshoot elevated latency in your app

In many cases, elevated latency in your application eventually results in5xx server errors.Because the root cause of both the error and the latency spikes might be the same,apply the following strategies for troubleshooting latency issues:

  1. Scope the latency issue
  2. Identify the cause
  3. Troubleshoot

Scope the latency issue

Define the scope of the issue by asking the following questions:

  • Which applications, services, and versions does this issue impact?
  • Which specific endpoints on the service does this issue impact?
  • Does this impact all clients globally, or a specific subset of clients?
  • What is the start and end time of the incident? Consider specifying thetimezone.
  • What are the specific errors?
  • What is the observed latency delta, which is usually specified as anincrease at a specific percentile? For example, latency increased by 2seconds at the 90th percentile.
  • How did you measure the latency? In particular, did you measure it at theclient, or is it visible in Cloud Logging or in the Cloud Monitoringlatency data that the App Engine serving infrastructure provides?
  • What are the dependencies of your service, and do any of them experienceincidents?
  • Did you make any recent code, configuration, or workload changes that triggeredthis issue?

A service might have its own custom monitoring and logging that you can useto further narrow down the scope of the issue. Definingthe scope of the problem will guide you towards the likely root cause anddetermine your next troubleshooting steps.

Identify the cause

Determine which component in the request path is mostly likely to becausing the latency or errors. The main components in the request path areas follows:

Client --> Internet --> Google Front End (GFE) --> App Engine serving infrastructure --> Service instance

If the previous informationdoesn't point you to the source of the failure, apply thefollowing strategies while reviewing your service instance's health and performance:

  1. Monitor the App Engine request logs. If you see HTTP status codeerrors or elevated latency in those logs, the issue likelylies in the instance that runs your service.

  2. If the number of service instances hasn't scaled up to matchtraffic levels, your instances might be overloaded, resulting in elevatederrors and latency.

  3. If you see elevated errors or latency inCloud Monitoring,the problem might be from the upstream of the load balancer, which records the App Engine metrics. In most cases, this points to a problem inthe service instances.

  4. If you see elevated latency or errors in monitoring metrics but not in therequest logs, it either indicates a load balancing failure, ora severe instance failure that's preventing the load balancerfrom routing requests. To distinguish between these cases, look atthe request logs before the incident starts. If the request logs showincreasing latency before failure, the application instances were beginningto fail before the load balancer stopped routing requests to them.

Troubleshoot

This section describes troubleshooting strategies for elevated latency issuesfrom the following components in therequest path:

  1. Internet
  2. Google Front End (GFE)
  3. App Engine serving infrastructure
  4. Application instance
  5. Application dependencies

Internet

Your application might encounter latency issues due to poor connectivity or lower bandwidth.

Poor internet connectivity

To determine if the issue is poor internet connectivity, run the followingcommand on your client:

$curl-s-o/dev/null-w'%{time_connect}\n'<hostname>

The value fortime_connect represents the latency of the client'sconnection to the nearestGoogle Front End.For slow connections, troubleshoot further usingtraceroute to determine which hop on the network causes the delay.

Run tests from clients in different geographical locations. App Engineautomatically routes requests to the closest Google data center, which variesbased on the client's location.

Low bandwidth

The application might be responding quickly; however, network bottlenecksdelay the App Engine serving infrastructure from sending packets across thenetwork quickly, slowing down responses.

Google Front End (GFE)

Your application might encounter latency issues due to incorrect routing, parallelrequests sent from HTTP/2 clients, or termination of SSL connections.

Map a client IP to a geographical region

Google resolves the App Engine application's hostname to theclosest GFE to the client, based on the client IP address it uses in the DNSlookup.If the client's DNS resolver isn't using theEDNS0 protocol, Google might not route theclient requests to the closest GFE.

HTTP/2 head-of-line blocking

HTTP/2 clients sending multiple requests in parallel might see elevated latencydue to head-of-line blocking at the GFE. To resolve this issue, the clients mustuse theQUIC protocol.

SSL termination for custom domains

The GFE might terminate the SSL connection. If you're using a custom domain instead of anappspot.comdomain, SSLtermination requires an extra hop. This might add latency for applications running in some regions. Formore information, seeMapping custom domains.

App Engine serving infrastructure

You might see elevated latency in your application due to service-wide incidents orautoscaling.

Service-wide incidents

Google posts details of a severe service-wide in theService Health dashboard. However, Google does rollouts gradually, so aservice-wide incident is unlikely to affect all your instances at once.

Autoscaling

Elevated latency or errors can result from the following autoscaling scenarios:

  • Scale up traffic too fast: App Engine autoscaling might not scale yourinstances as fast as traffic increases, leading to temporary overloading.Typically, overloading occurs when traffic is generated by a computer programinstead of end users. To resolve this issue, throttle the system that generatesthe traffic.

  • Spikes in traffic: spikes in traffic might cause elevated latency in cases where anautoscaled service needs to scale up more quickly than is possible, withoutaffecting latency. End user traffic doesn't usually cause frequent trafficspikes. If you see traffic spikes, then you should investigate the cause. If a batchsystem is running at intervals, you might be able to smooth out the trafficor use different scaling settings.

  • Autoscaler settings: the autoscaler can be configured based on the scalingcharacteristics of your service. The scaling parameters might becomenon-optimal during the following scenarios:

    • App Engine standard environmentscaling settings might cause latency if set too aggressively. Ifyou see server responses with the status code500 and the message"Request was aborted after waiting too long to attempt to service your request"in your logs, it means that the request timed out on the pending queue whilewaiting for an idle instance.

    • You might see increased pending time with manual scaling even whenyou have provisioned sufficient instances. We recommend that you don'tusemanual scalingif your application serves end-user traffic. Manual scaling is better for workloadssuch as task queues.

    • Basic scaling minimizes costs at the expense of latency. We recommend thatyou don't usebasic scaling for latency sensitiveservices.

    • App Engine's default scaling setting provides optimallatency for most services. If you still see requests with high pendingtime, specify a minimum number of instances. If you tune the scaling settingsto reduce costs by minimizing idle instances, you run the risk oflatency spikes if the load increases suddenly.

We recommend that you benchmark performance with the default scaling settings, andthen run a new benchmark after each change to these settings.

Deployments

Elevated latency shortly after a deployment indicates that you haven'tsufficiently scaled up before migrating traffic. Newer instances might not havewarmed up local caches, and they serve more slowly than older instances.

To avoid latency spikes, don't deploy an App Engine service using the same versionname as an existing version of the service. If you reuse an existing version name,you won't be able to slowly migrate traffic to the new version. Requests might beslower because App Engine restarts every instance within a short period oftime. You also have to redeploy if you want to revert to the previousversion.

Application instance

This section describes the common strategies you can apply to your applicationinstances, and source code to optimize performance and reduce latency.

Application code

Issues in application code can be challenging to debug, particularly ifthey are intermittent or not reproducible.

To resolve issues, do the following:

  • For diagnosing your issues, we recommend instrumenting your applicationusinglogging,monitoring,andtracing. You can also useCloud Profiler.

  • Try to reproduce the issue in a local development environment whichmight allow you to run language-specific debugging tools that might not bepossible to run within App Engine.

  • To better understand how your application fails and what bottlenecks occur, loadtest your application until failure. Set a maximum instance count, and thengradually increase load until the application fails.

  • If the latency issue correlates with deployment of a new version of yourapplication code, roll back to determine whether the new versioncaused the incident. However, if you deploy continuously, then thefrequent deployments make it hard to determine whether or not thedeployment caused the incident based on time of onset.

  • Your application might store configuration settings within theDatastore or elsewhere. Create a timeline of configuration changes todetermine whether any of these line up with the onset of elevated latency.

Workload change

A workload change might cause elevated latency. Some monitoring metrics thatindicate workload changes includeqps, API usage, andlatency. Check also for changes in request and response sizes.

Memory pressure

If monitoring shows either a saw-tooth pattern in memory usage, or a drop inmemory usage that correlates to deployments, a memory leak might be the causeof performance issues. A memory leak might also cause frequent garbage collectionthat leads to higher latency. If you aren't able to trace this issue to a problemin the code, try provisioning larger instances with more memory.

Resource leak

If an instance of your application shows rising latency that correlates withinstance age, then you might have a resource leak that causes performanceissues. The latency drops after a deployment is complete. For example, a datastructure that gets slower over time due to higher CPU usage might cause anyCPU-bound workload to get slower.

Code optimization

To reduce latency on App Engine, optimize code by using the following methods:

  • Offline work: useCloud Tasks to prevent userrequests from blocking the application waiting for completion of work, such as sendingmail.

  • Asynchronous API calls: ensure that your code isn't blocked waiting foran API call to complete.

  • Batch API calls: the batch version of API calls is usually faster thansending individual calls.

  • Denormalize data models: reduce the latency of calls made to the datapersistence layer by denormalizing your data models.

Application dependencies

Monitor dependencies of your application to detect if latency spikescorrelate with a dependency failure.

A workload change and an increase in traffic might cause a dependency's latency to increase.

Non-scaling dependency

If your application's dependency doesn't scale as the number of App Engineinstances scales up, the dependency might overload when trafficincreases. An example of a dependency that might not scale is a SQL database. Ahigher number of application instances leads to a higher number of databaseconnections, which might cause a cascading failure by preventing the databasefrom starting up. To resolve this issue, do the following:

  1. Deploy a new default version that doesn't connect to the database.
  2. Shut down the previous default version.
  3. Deploy a new non-default version that connects to the database.
  4. Slowly migrate traffic to the new version.

As a preventive measure, design your application to drop requeststo the dependency usingadaptive throttling.

Caching layer failure

To speed up requests, use multiple caching layers, such as edge caching,Memcache, and in-instance memory. A failure in one of these caching layersmight cause a sudden latency increase. For example, a Memcache flush mightcause more requests to go to a slower Datastore.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.