Python 2.7 has reached end of supportand will bedeprecatedon January 31, 2026. After deprecation, you won't be able to deploy Python 2.7applications, even if your organization previously used an organization policy tore-enable deployments of legacy runtimes. Your existing Python2.7 applications will continue to run and receive traffic after theirdeprecation date. We recommend thatyoumigrate to the latest supported version of Python.

How Requests are Handled

Region ID

TheREGION_ID is an abbreviated code that Google assignsbased on the region you select when you create your app. The code does notcorrespond to a country or province, even though some region IDs may appearsimilar to commonly used country and province codes. For apps created after February 2020,REGION_ID.r is included in App Engine URLs. For existing apps created before this date, the region ID is optional in the URL.

Learn moreabout region IDs.

This document describes how your App Engine application receives requests andsends responses.

For more details, see theRequest Headers and Responses reference.

If your application usesservices,you can address requests to a specific service or a specific version of thatservice. For more information about service addressability, seeHow Requests areRouted.

Handling requests

Your application is responsible for starting a webserver and handling requests.You can use any web framework that is available for your development language.

App Engine runs multiple instances of your application, and eachinstance has its own web server for handling requests. Any request can be routedto any instance, so consecutive requests from the same user are not necessarilysent to the same instance. An instance can handle multiple requestsconcurrently. The number of instances can be adjusted automatically as trafficchanges.You can also change the number of concurrent requests an instance can handleby setting themax_concurrent_requestselement in your app.yaml file.

When App Engine receives a web request for your application, itcalls the handler script that corresponds to the URL, as described in theapplication'sapp.yamlconfiguration file. The Python 2.7 runtime supports theWSGIstandard and theCGIstandard forbackwards compatibility. WSGI is preferred, and some features of Python 2.7 donot work without it. The configuration of your application'sscripthandlersdetermines whether a request is handled using WSGI or CGI.

The following Python script responds to a request with an HTTP header and themessageHello, World!.

importwebapp2classMainPage(webapp2.RequestHandler):defget(self):self.response.headers["Content-Type"]="text/plain"self.response.write("Hello, World!")app=webapp2.WSGIApplication([("/",MainPage),],debug=True,)

To dispatch multiple requests to each web server in parallel, mark yourapplication as threadsafe by adding athreadsafe: trueto yourapp.yaml file. Concurrent requests are not available if any scripthandler uses CGI.

Quotas and limits

App Engine automatically allocates resources to your application astraffic increases. However, this is bound by the following restrictions:

  • App Engine reserves automatic scaling capacity for applications withlow latency, where the application responds to requests in less than onesecond.

  • Applications that are heavily CPU-bound may also incur some additional latencyin order to efficiently share resources with other applications on the sameservers. Requests for static files are exempt from these latency limits.

Each incoming request to the application counts toward theRequests limit.Data sent in response to a request counts toward theOutgoing Bandwidth (billable) limit.

Both HTTP and HTTPS (secure) requests count toward theRequests,IncomingBandwidth (billable), andOutgoing Bandwidth (billable) limits. TheGoogle Cloud consoleQuota Details page also reportsSecure Requests,Secure Incoming Bandwidth, andSecure Outgoing Bandwidth as separate values for informational purposes.Only HTTPS requests count toward these values. For more information, see theQuotas page.

The following limits apply specifically to the use of request handlers:

LimitAmount
Request size32 megabytes
Response size32 megabytes
Request timeoutDepends on the type of scaling your app uses
Maximum total number of files (app files and static files)10,000 total
1,000 per directory
Maximum size of an application file32 megabytes
Maximum size of a static file32 megabytes
Maximum total size of all application and static filesFirst 1 gigabyte is free
$ 0.026 per gigabyte per month after first 1 gigabyte
Pending request timeout10 seconds
Maximum size of a single request header field8 kilobytes forsecond-generation runtimes in the standard environment. Requests to these runtimes with header fields exceeding 8 kilobytes will return HTTP 400 errors.

Request limits

All HTTP/2 requests will be translated into HTTP/1.1 requests when forwarded tothe application server.

Response limits

  • Dynamic responses are limited to 32 MB. If a script handler generates a responselarger than this limit, the server sends back an empty response with a 500Internal Server Error status code. This limitation does not apply to responsesthat serve data fromthe legacy Blobstore orCloud Storage.

  • The response header limit is 8 KB forsecond-generation runtimes.Response headers that exceed this limit will return HTTP 502 errors, withlogs showingupstream sent too big header while reading response header from upstream.

Request headers

An incoming HTTP request includes the HTTP headers sent by the client. Forsecurity purposes, some headers are sanitized or amended by intermediate proxiesbefore they reach the application.

For more information, see theRequest headers reference.

Handling request timeouts

App Engine is optimized for applications with short-lived requests,typically those that take a few hundred milliseconds. An efficient app respondsquickly for the majority of requests. An app that doesn't will not scale wellwith App Engine's infrastructure. To ensure this level of performance,there is a system-imposed maximumrequesttimeout that every app must respond by.

If your app exceeds this deadline, App Engine interruptsthe request handler.The Python runtime environment accomplishes this by raising aDeadlineExceededError exception fromgoogle.appengine.runtime. If therequest handler does not catch this exception, as with all uncaught exceptions,the runtime environment will return an HTTP 500 server error to the client.

The request handler can catch this error to customize the response. The runtimeenvironment gives the request handler a little bit more time (less than asecond) after raising the exception to prepare a custom response.

classTimerHandler(webapp2.RequestHandler):defget(self):fromgoogle.appengine.runtimeimportDeadlineExceededErrortry:time.sleep(70)self.response.write("Completed.")exceptDeadlineExceededError:self.response.clear()self.response.set_status(500)self.response.out.write("The request did not complete in time.")

If the handler hasn't returned a response or raised an exception by the seconddeadline, the handler is stopped and a default error response is returned.Warning: TheDeadlineExceededError can potentially be raised from anywhere inyour program, includingfinally blocks, so it could leave your program in aninvalid state. This can cause deadlocks or unexpected errors in threaded code(including the built-inthreading library), because locks may not be released.Note that (unlike in Java) the runtimemay not stop the process, so thiscould cause problems for future requests to the same instance. To be safe, youshould not rely on theDeadlineExceededError, and instead ensure that yourrequests complete well before the time limit.

Responses

App Engine calls the handler script with aRequest and waits for thescript to return; all data written to the standard output stream is sent as theHTTP response.

There aresize limits that apply to the response yougenerate, and the response may be modified before it is returned to the client.

For more information, see theRequest responses reference.

Streaming Responses

App Engine does not support streaming responses where data is sent inincremental chunks to the client while a request is being processed. All datafrom your code is collected as described above and sent as a single HTTPresponse.

Response compression

App Engine does its best to serve compressed (gzipped) content toclients that support it. To determine if content should be compressed,App Engine does the following when it receives a request:

  1. Confirms if the client can reliably receive compressed responses by viewingboth theAccept-Encoding andUser-Agent headers in the request. Thisapproach avoids some well-known bugs with gzipped content in popular browsers.

  2. Confirms if compressing the content is appropriate by viewing theContent-Type header that you haveconfigured for theresponse handler.In general, compression is appropriate for text-based content types, and notfor binary content types.

Note the following:

  • A client can force text-based content types to be compressed by setting bothof theAccept-Encoding andUser-Agent request headers togzip.

  • If a request doesn't specifygzip in theAccept-Encoding header,App Engine will not compress the response data.

  • The Google Frontend caches responses from App Engine static file anddirectory handlers. Depending on a variety of factors, such as which type ofresponse data is cached first, whichVary headers you have specified in theresponse, and which headers are included in the request, a client could requestcompressed data but receive uncompressed data, and the other way around. Formore information, seeResponse caching.

Response caching

The Google Frontend, and potentially the user's browser and other intermediatecaching proxy servers, will cache your app's responses as instructed bystandard caching headers that you specify in the response. You canspecify these response headers either through your framework, directly in yourcode, or through App Enginestatic file and directoryhandlers.

In the Google Frontend, the cache key is the full URL of the request.

Caching static content

To ensure that clients always receive updated static content as soon as it ispublished, we recommend that you serve static content from versioneddirectories, such ascss/v1/styles.css. The Google Frontend will not validatethe cache (check for updated content) until the cache expires. Even after thecache expires, the cache will not be updated until the content at the requestURL changes.

The following response headers that you canset inapp.yamlinfluence how and when the Google Frontend caches content:

  • Cache-Control should be set topublic for the Google Frontend to cachecontent; it may also be cached by the Google Frontend unless you specify aCache-Controlprivate orno-store directive. If you don't set thisheader inapp.yaml, App Engine automatically adds it for allresponses handled by a static file or directory handler. For moreinformation, seeHeaders added orreplaced.

  • Vary: To enable the cache to return different responses for a URL based onheaders that are sent in the request, set one or more of the following valuesin theVary response header:Accept,Accept-Encoding,Origin, orX-Origin

    Due to the potential for high cardinality, data will not be cached for otherVary values.

    For example:

    1. You specify the following response header:

      Vary: Accept-Encoding

    2. You app receives a request that contains theAccept-Encoding: gzip header.App Engine returns a compressed response and the Google Frontendcaches the gzipped version of the response data. All subsequent requestsfor this URL that contain theAccept-Encoding: gzip header will receivethe gzipped data from the cache until the cache becomes invalidated (due tothe content changing after the cache expires).

    3. Your app receives a request that does not contain theAccept-Encodingheader. App Engine returns an uncompressed response and GoogleFrontend caches the uncompressed version of the response data. All subsequentrequests for this URL that do not contain theAccept-Encoding headerwill receive the compressed data from the cache until the cache becomesinvalidated.

    If you do not specify aVary response header, the Google Frontend createsa single cache entry for the URL and will use it for all requests regardlessof the headers in the request. For example:

    1. You do not specify theVary: Accept-Encoding response header.
    2. A request contains theAccept-Encoding: gzip header, and the gzippedversion of the response data will be cached.
    3. A second request does not contain theAccept-Encoding: gzip header.However, because the cache contains a gzipped version of the response data,the response will be gzipped even though the client requested uncompresseddata.

The headers in the request also influence caching:

  • If the request contains anAuthorization header, the content will not becached by the Google Frontend.

Cache expiration

By default, the caching headers that App Engine static file anddirectory handlers add to responses instruct clients and web proxies such as theGoogle Frontend to expire the cache after 10 minutes.

After a file is transmitted with a given expiration time, there is generallyno way to clear it out of web-proxy caches, even if the user clears theirown browser cache. Re-deploying a new version of the app willnot reset anycaches. Therefore, if you ever plan to modify a static file, it should have ashort (less than one hour) expiration time. In most cases, the default 10-minuteexpiration time is appropriate.

You can change the default expiration for all static file and directory handlersby specifying thedefault_expirationelement in yourapp.yaml file. To set specific expiration times for individiualhandlers,specify theexpirationelement within the handler element in yourapp.yaml file.

The value you specify in the expiration elements time will be used toset theCache-Control andExpires HTTP response headers.

App caching

The Python runtime environment caches imported modules between requests on asingle web server, similar to how a standalone Python application loads a moduleonly once even if the module is imported by multiple files. Since WSGI handlersare modules, they are cached between requests. CGI handler scripts are onlycached if they provide amain() routine; otherwise, the CGI handler script isloaded for every request.

App caching provides a significant benefit in response time. We recommend thatall CGI handler scripts use a main() routine, as described below.

Imports are cached

For efficiency, the web server keeps imported modules in memory and does not re-load or re-evaluate them on subsequent requests to the same application on thesame server. Most modules do not initialize any global data or have other sideeffects when they are imported, so caching them does not change the behavior ofthe application.

If your application imports a module that depends on the module being evaluatedfor every request, the application must accommodate this caching behavior.

Caching CGI handlers

You can tell App Engine to cache the CGI handler script itself, inaddition to imported modules. If the handler script defines a function namedmain(), then the script and its global environment will be cached like animported module. The first request for the script on a given web serverevaluates the script normally. For subsequent requests, App Engine callsthemain() function in the cached environment.

To cache a handler script, App Engine must be able to callmain() withno arguments. If the handler script does not define amain() function, or themain() function requires arguments (that don't have defaults), thenApp Engine loads and evaluates the entire script for every request.

Keeping the parsed Python code in memory saves time and allows for fasterresponses. Caching the global environment has other potential uses as well:

  • Compiled regular expressions. All regular expressions are parsed and stored ina compiled form. You can store compiled regular expressions in globalvariables, then use app caching to re-use the compiled objects betweenrequests.

  • GqlQuery objects.The GQL query string is parsed when the GqlQuery object is created. Re-using aGqlQuery object with parameter binding and thebind()method is faster than re-constructing the object each time. You can store aGqlQuery object with parameter binding for the values in a global variable,then re-use it by binding new parameter values for each request.

  • Configuration and data files. If your application loads and parsesconfiguration data from a file, it can retain the parsed data in memory toavoid having to re-load the file with each request.

The handler script should callmain() when imported. App Engineexpects that importing the script callsmain(), so App Engine does notcall it when loading the request handler for the first time on a server.

Note: Be careful to not "leak" user-specific information between requests. Avoidglobal variables unless caching is desired, and always initialize request-specific data inside themain() routine.

App caching withmain() provides a significant improvement in your CGIhandler's response time. We recommend it for all applications that use CGI.

Logging

The App Engine web server captures everything the handler script writesto the standard output stream for the response to the web request. It alsocaptures everything the handler script writes to the standard error stream, andstores it as log data. Each request is assigned arequest_id, aglobally unique identifier based on the request's start time. Log data for yourapplication can be viewed in the Google Cloud consoleusing
Cloud Logging.

The App Engine Python runtime environment includes special support forthelogging modulefrom the Python standard library to understand logging concepts such as loglevels ("debug", "info", "warning", "error", "critical").

importloggingimportwebapp2classMainPage(webapp2.RequestHandler):defget(self):logging.debug("This is a debug message")logging.info("This is an info message")logging.warning("This is a warning message")logging.error("This is an error message")logging.critical("This is a critical message")try:raiseValueError("This is a sample value error.")exceptValueError:logging.exception("A example exception log.")self.response.out.write("Logging example.")app=webapp2.WSGIApplication([("/",MainPage)],debug=True)

The environment

The execution environment automatically sets several environment variables; youcan set more inapp.yaml. Of the automatically-set variables, some are specialto App Engine, while others are part of the WSGI or CGI standards. Python codecan access these variables using theos.environ dictionary.

The following environment variables are specific to App Engine:

  • CURRENT_VERSION_ID: The major and minor version of the currently runningapplication, as "X.Y". The major version number ("X") is specified in the app'sapp.yaml file. The minor version number ("Y") is set automatically when eachversion of the app is uploaded to App Engine. On the development webserver, the minor version is always "1".

  • AUTH_DOMAIN: The domain used for authenticating users with the Users API.Apps hosted on appspot.com have anAUTH_DOMAIN ofgmail.com, and accept anyGoogle account. Apps hosted on a custom domain have anAUTH_DOMAIN equal tothe custom domain.

  • INSTANCE_ID: Contains the instance ID of the frontend instance handling arequest. The ID is a hex string (for example,00c61b117c7f7fd0ce9e1325a04b8f0df30deaaf). A logged-in admin can use the id ina url:https://INSTANCE_ID-dot-VERSION_ID-dot-SERVICE_ID-dot-PROJECT_ID.REGION_ID.r.appspot.com. The request will be routed tothat specific frontend instance. If the instance cannot handle the request itreturns an immediate 503.

The following environment variables are part of the WSGI and CGI standards, withspecial behavior in App Engine:

  • SERVER_SOFTWARE: In the development web server, this value is"Development/X.Y" where "X.Y" is the version of the runtime. When running on AppEngine, this value is "Google App Engine/X.Y.Z".

Additional environment variables are set according to the WSGI or CGI standard.For more information on these variables, seethe WSGIstandard orthe CGIstandard, as appropriate.

You can also set environment variables in theapp.yamlfile:

env_variables:DJANGO_SETTINGS_MODULE:'myapp.settings'

The following webapp2 request handler displays every environment variablevisible to the application in the browser:

classPrintEnvironmentHandler(webapp2.RequestHandler):defget(self):self.response.headers["Content-Type"]="text/plain"forkey,valueinos.environ.iteritems():self.response.out.write("{} ={}\n".format(key,value))

Request IDs

At the time of the request, you can save the request ID, which is unique to thatrequest. The request ID can be used later to look up the logs for that request inCloud Logging.

The following sample code shows how to get the request ID in the context of arequest:

classRequestIdHandler(webapp2.RequestHandler):defget(self):self.response.headers["Content-Type"]="text/plain"request_id=os.environ.get("REQUEST_LOG_ID")self.response.write("REQUEST_LOG_ID={}".format(request_id))

Forcing HTTPS connections

For security reasons, all applications should encourage clients to connect overhttps. To instruct the browser to preferhttps overhttp for a given pageor entire domain, set theStrict-Transport-Security header in your responses.For example:

Strict-Transport-Security:max-age=31536000;includeSubDomains
To set this header for any static content that is served by your app, add theheader to your app'sstatic file and directoryhandlers.

To set this header for responses that are generated from your code, use theflask-talisman library.

Caution: Clients that have received the header in the past will refuse to connect ifhttps becomes non-functional or is disabled for any reason. To learn more, seethisCheat Sheet on HTTP Strict Transport Security.

Handling asynchronous background work

Background work is any work that your app performs for a request after you havedelivered your HTTP response. Avoid performing background work in your app, andreview your code to make sure all asynchronous operations finish before youdeliver your response.

For long-running jobs, we recommend usingCloud Tasks. WithCloud Tasks, HTTP requests are long-lived and return a response onlyafter any asynchronous work ends.

Warning: Performing asynchronous background work can result in higher billing.App Engine might scale up additional instances due to high CPU load,even if there are no active requests. Users may also experience increasedlatency because of requests waiting in the pending queue for available instances.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.