How requests are handled

Region ID

TheREGION_ID is an abbreviated code that Google assignsbased on the region you select when you create your app. The code does notcorrespond to a country or province, even though some region IDs may appearsimilar to commonly used country and province codes. For apps created after February 2020,REGION_ID.r is included in App Engine URLs. For existing apps created before this date, the region ID is optional in the URL.

Learn moreabout region IDs.

This document describes how your App Engine application receives requests andsends responses. For more details, see theRequest Headers and Responses reference.

If your application usesservices,you can address requests to a specific service or a specific version of thatservice. For more information about service addressability, seeHow Requests areRouted.

Handling requests

Your application is responsible for starting a webserver and handling requests.You can use any web framework that is available for your development language.

App Engine runs multiple instances of your application, and eachinstance has its own web server for handling requests. Any request can be routedto any instance, so consecutive requests from the same user are not necessarilysent to the same instance. An instance can handle multiple requestsconcurrently. The number of instances can be adjusted automatically as trafficchanges. You can also change the number of concurrent requests an instance can handleby setting themax_concurrent_requests element in yourapp.yaml file,orappengine-web.xml filefile if using the App Engine legacy bundled services.

The following example is a Python script that responds to any HTTP request with themessage 'Hello World!'

# Copyright 2018 Google LLC## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at##     http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.fromflaskimportFlask# If `entrypoint` is not defined in app.yaml, App Engine will look for an app# called `app` in `main.py`.app=Flask(__name__)@app.route("/")defhello():"""Return a friendly HTTP greeting.    Returns:        A string with the words 'Hello World!'.    """return"Hello World!"if__name__=="__main__":# This is used when running locally only. When deploying to Google App# Engine, a webserver process such as Gunicorn will serve the app. You# can configure startup instructions by adding `entrypoint` to app.yaml.app.run(host="127.0.0.1",port=8080,debug=True)

Quotas and limits

App Engine automatically allocates resources to your application astraffic increases. However, this is bound by the following restrictions:

  • App Engine reserves automatic scaling capacity for applications withlow latency, where the application responds to requests in less than onesecond.

  • Applications that are heavily CPU-bound may also incur some additional latencyin order to efficiently share resources with other applications on the sameservers. Requests for static files are exempt from these latency limits.

Each incoming request to the application counts toward theRequests limit.Data sent in response to a request counts toward theOutgoing Bandwidth (billable) limit.

Both HTTP and HTTPS (secure) requests count toward theRequests,IncomingBandwidth (billable), andOutgoing Bandwidth (billable) limits. TheGoogle Cloud consoleQuota Details page also reportsSecure Requests,Secure Incoming Bandwidth, andSecure Outgoing Bandwidth as separate values for informational purposes.Only HTTPS requests count toward these values. For more information, see theQuotas page.

The following limits apply specifically to the use of request handlers:

LimitAmount
Request size32 megabytes
Response size32 megabytes
Request timeoutDepends on the type of scaling your app uses
Maximum total number of files (app files and static files)10,000 total
1,000 per directory
Maximum size of an application file32 megabytes
Maximum size of a static file32 megabytes
Maximum total size of all application and static filesFirst 1 gigabyte is free
$ 0.026 per gigabyte per month after first 1 gigabyte
Pending request timeout10 seconds
Maximum size of a single request header field8 kilobytes forsecond-generation runtimes in the standard environment. Requests to these runtimes with header fields exceeding 8 kilobytes will return HTTP 400 errors.

Request limits

All HTTP/2 requests will be translated into HTTP/1.1 requests when forwarded tothe application server.

Response limits

  • Dynamic responses are limited to 32 MB. If a script handler generates a responselarger than this limit, the server sends back an empty response with a 500Internal Server Error status code. This limitation doesn't apply to responsesthat serve data fromCloud Storageor the legacy Blobstore API if it is available in your runtime.

  • The response header limit is 8 KB forsecond-generation runtimes.Response headers that exceed this limit will return HTTP 502 errors, withlogs showingupstream sent too big header while reading response header from upstream.

Request headers

An incoming HTTP request includes the HTTP headers sent by the client. Forsecurity purposes, some headers are sanitized or amended by intermediate proxiesbefore they reach the application.

For more information, see theRequest headers reference.

Handling request timeouts

App Engine is optimized for applications with short-lived requests,typically those that take a few hundred milliseconds. An efficient app respondsquickly for the majority of requests. An app that doesn't respond quickly, won't scale wellwith App Engine's infrastructure. To ensure this level of performance,there is a system-imposed maximumrequesttimeout that every app must respond by.

If your app exceeds this deadline, App Engine interruptsthe request handler.

Responses

App Engine calls the handler script with aRequest and waits for thescript to return; all data written to the standard output stream is sent as theHTTP response.

There aresize limits that apply to the response yougenerate, and the response may be modified before it is returned to the client.

For more information, see theRequest responsesreference.

Streaming Responses

App Engine doesn't support streaming responses where data is sent inincremental chunks to the client while a request is being processed. All datafrom your code is collected as described above and sent as a single HTTPresponse.

Response compression

For responses that are returned by your code, App Engine compressesdata in the response if both of the following conditions are true:

  • The request contains theAccept-Encoding header that includesgzip asa value.
  • The response contains text-based data such as HTML, CSS, or JavaScript.

For responses that are returned by an App Enginestatic file or directoryhandler,response data is compressed if all of the following conditions are true:

  • The request includesAccept-Encoding withgzip as one of its values.
  • The client is capable of receiving the response data in a compressed format.The Google Front End (GFE) maintains a list of clients that are known to haveproblems with compressed responses. These clients won't receive compresseddata from static handlers in your app, even if the request headers containAccept-Encoding: gzip.
  • The response contains text-based data such as HTML, CSS, or JavaScript.

Note the following:

  • A client can force text-based content types to be compressed by setting bothof theAccept-Encoding andUser-Agent request headers togzip.

  • If a request doesn't specifygzip in theAccept-Encoding header,App Engine won't compress the response data.

  • The Google Front End caches responses from App Engine static file anddirectory handlers. Depending on a variety of factors, such as which type ofresponse data is cached first, whichVary headers you have specified in theresponse, and which headers are included in the request, a client could requestcompressed data but receive uncompressed data, and the other way around. Formore information, seeResponse caching.

Response caching

The Google Front End, and potentially the user's browser and other intermediatecaching proxy servers, will cache your app's responses as instructed bystandard caching headers that you specify in the response. You canspecify these response headers either through your framework, directly in yourcode, or through App Enginestatic file and directoryhandlers.

In the Google Front End, the cache key is the full URL of the request.

Caching static content

To ensure that clients always receive updated static content as soon as it ispublished, we recommend that you serve static content from versioneddirectories, such ascss/v1/styles.css. The Google Front End won't validatethe cache (check for updated content) until the cache expires. Even after thecache expires, the cache won't be updated until the content at the requestURL changes.

The following response headers that you canset inapp.yamlinfluence how and when the Google Front End caches content:

  • Cache-Control should be set topublic for the Google Front End to cachecontent; it may also be cached by the Google Front End unless you specify aCache-Controlprivate orno-store directive. If you don't set thisheader inapp.yaml, App Engine automatically adds it for allresponses handled by a static file or directory handler. For moreinformation, seeHeaders added orreplaced.

  • Vary: To enable the cache to return different responses for a URL based onheaders that are sent in the request, set one or more of the following valuesin theVary response header:Accept,Accept-Encoding,Origin, orX-Origin

    Due to the potential for high cardinality, data won't be cached for otherVary values.

    For example:

    1. You specify the following response header:

      Vary: Accept-Encoding

    2. You app receives a request that contains theAccept-Encoding: gzip header.App Engine returns a compressed response and the Google Front Endcaches the gzipped version of the response data. All subsequent requestsfor this URL that contain theAccept-Encoding: gzip header will receivethe gzipped data from the cache until the cache becomes invalidated (due tothe content changing after the cache expires).

    3. Your app receives a request that doesn't contain theAccept-Encodingheader. App Engine returns an uncompressed response and GoogleFrontend caches the uncompressed version of the response data. All subsequentrequests for this URL that don't contain theAccept-Encoding headerreceive uncompressed data from the cache until the cache becomesinvalidated.

    If you don't specify aVary response header, the Google Front End createsa single cache entry for the URL and will use it for all requests regardlessof the headers in the request. For example:

    1. You don't specify theVary: Accept-Encoding response header.
    2. A request contains theAccept-Encoding: gzip header, and the gzippedversion of the response data will be cached.
    3. A second request doesn't contain theAccept-Encoding: gzip header.However, because the cache contains a gzipped version of the response data,the response will be gzipped even though the client requested uncompresseddata.

The headers in the request also influence caching:

  • If the request contains anAuthorization header, the content won't becached by the Google Front End.

Cache expiration

By default, the caching headers that App Engine static file anddirectory handlers add to responses instruct clients and web proxies such as theGoogle Front End to expire the cache after 10 minutes.

After a file is transmitted with a given expiration time, there is generallyno way to clear it out of web-proxy caches, even if the user clears theirown browser cache. Re-deploying a new version of the app willnot reset anycaches. Therefore, if you ever plan to modify a static file, it should have ashort (less than one hour) expiration time. In most cases, the default 10-minuteexpiration time is appropriate.

You can change the default expiration for all static file and directory handlersby specifying thedefault_expirationelement in yourapp.yaml file. To set specific expiration times for individiualhandlers,specify theexpirationelement within the handler element in yourapp.yaml file.

The value you specify in the expiration elements time will be used toset theCache-Control andExpires HTTP response headers.

Forcing HTTPS connections

For security reasons, all applications should encourage clients to connect overhttps. To instruct the browser to preferhttps overhttp for a given pageor entire domain, set theStrict-Transport-Security header in your responses.For example:

Strict-Transport-Security:max-age=31536000;includeSubDomains

To set this header for any static content that is served by your app, add theheader to your app'sstatic file and directoryhandlers.

To set this header for responses that are generated from your code, use theflask-talisman library.

Caution: Clients that have received the header in the past will refuse to connect ifhttps becomes non-functional or is disabled for any reason. To learn more, seethisCheat Sheet on HTTP Strict Transport Security.

Handling asynchronous background work

Background work is any work that your app performs for a request after you havedelivered your HTTP response. Avoid performing background work in your app, andreview your code to make sure all asynchronous operations finish before youdeliver your response.

For long-running jobs, we recommend usingCloud Tasks. WithCloud Tasks, HTTP requests are long-lived and return a response onlyafter any asynchronous work ends.

Warning: Performing asynchronous background work can result in higher billing.App Engine might scale up additional instances due to high CPU load,even if there are no active requests. Users may also experience increasedlatency because of requests waiting in the pending queue for available instances.

App Engine pending queue prioritization

During periods of heavy traffic, App Engine might placerequests in a pending queue while waiting for an available instance with thefollowing prioritization:

  • App Engine prioritizes other queued requests over pendingqueued requests fromTask queue.Requests fromApp Engine Cloud Tasks also share this pendingqueue priority behavior for compatibility reasons.

  • Within the pending queue, App Engine treats requests fromHTTP targetCloud Tasks as regular HTTPtraffic. The HTTP target requests aren't at a lower priority.

  • When a service receives standard HTTP traffic at high volume while alsoserving Task queue or Cloud Tasks traffic at much lower volume,there is a disproportionate impact on the latency of the Task queue or theCloud Tasks traffic. We recommend splitting the traffic types toseparate versions or using HTTP target tasks to avoid priority queuing. Youshould also consider serving latency sensitive requests from Cloud Taskswith a dedicated major version or service.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.