How instances are managed Stay organized with collections Save and categorize content based on your preferences.
Note: For new projects you create after March 2025, App Engine sets theautomatic scaling maximum instances default for standard environment deploymentsto 20. This change doesn't impact existing apps. To override the default, specifya new Instances are the basic building blocks of App Engine, providing all theresources needed to successfully host your application. At any given time, yourapplication can be running on one or many instances with requests being spreadacross all of them. Each instance includes a security layer to ensure thatinstances cannot inadvertently affect each other. App Engine can automatically create and shut down instances as trafficfluctuates, or you can specify a number of instances to run regardless of theamount of traffic. To determine how and when new instances are created, youspecify ascaling type for your app. The scaling settings are applied at theApp Engine version level as part of theapp.yamlfile. App Engine supports the followingscaling types, which controls howand when instances are created: You specify the scaling type in your app'smax_instancesvalue in yourapp.yaml file, and deploy a new version or redeploy over anexisting version.Scaling types
app.yaml.By default, your app uses automatic scaling, which means App Engine willmanage the number of idle instances.automatic_scaling element.
This table compares the performance features of the three scaling types:
| Feature | Automatic scaling | Basic scaling | Manual scaling |
|---|---|---|---|
| Request timeout | 10 minutes for HTTP requests and task queue tasks. If your app doesn't return a request within this time limit, App Engine interrupts the request handler andemits an error for your code to handle. Forlegacy runtimes (Java 8, PHP 5, and Python 2):
| 24 hours for HTTP requests and task queue tasks. If your app doesn't return a request within this time limit, App Engine interrupts the request handler andemits an error for your code to handle. A basic-scaled instance can choose to handle | Same as basic scaling. |
| Background threads (Java only) | Not allowed | Allowed | Allowed |
| Residence | Instances are shut down based on usage patterns. | Instances are shut down based on theidle_timeout parameter. If an instance has been idle, for example it has not received a request for more thanidle_timeout, then the instance is shut down. | Instances remain in memory and state is preserved across requests. When instances are stopped, an/_ah/stop request appears in the logs.If there is an /_ah/stop handler or a registered shutdown hook (Java,Python), it has 30 seconds to complete before shutdown occurs. |
| Startup and shutdown | Instances are created on demand to handle requests and automatically turned down when idle. | Instances are created on demand to handle requests and automatically shut down when idle, based on theidle_timeout configuration parameter. An instance that ismanually stopped has 30 seconds to finish handling requests before it is forcibly terminated. | Instances are sent a start request automatically by App Engine in the form of an empty GET request to/_ah/start. As with basic scaling, an instance that ismanually stopped has 30 seconds to finish handling requests before it is forcibly terminated. |
| Instance addressability | Instances are anonymous. | Instance "i" of version "v" of service "s" is addressable at the URL:https://i-dot-v-dot-s-dot-app_id.REGION_ID.r.appspot.com. If you have set up a wildcard subdomain mapping for a custom domain, you can also address a service or any of its instances via a URL of the formhttps://s.domain.com orhttps://i.s.domain.com. You can reliably cache state in each instance and retrieve it in subsequent requests. | Same as basic scaling. |
| Scaling | App Engine scales the number of instances automatically in response to processing volume. This scaling factors in theautomatic_scaling settings that are provided on a per-version basis in the configuration file. | A service with basic scaling is configured by setting the maximum number of instances in themax_instances parameter of thebasic_scaling setting. The number of live instances scales with the processing volume. | You configure the number of instances of each version in that service's configuration file. The number of instances usually corresponds to the size of a dataset being held in memory or the desired throughput for offline work. |
Scaling dynamic instances
App Engine applications that use basic or automatic scaling are poweredby any number of dynamic instances at a given time, depending on the volume ofincoming requests. As requests for your application increase, the number ofdynamic instances may increase as well.
Apps with basic scaling
If you use basic scaling, App Engine attempts to keep your cost low,even though that may result in higher latency as the volume of incoming requestsincreases.
When none of the existing instances are available to serve an incoming request,App Engine starts a new instance. Even after starting a new instance,some requests may need to be queued until the new instance completes itsstartupprocess.If you require the lowest latency possible consider using automatic scaling,which creates new instances preemptively to minimize latency.
Apps with automatic scaling
If you use automatic scaling, each instance in your app has its own queue forincoming requests. Before the queues become long enough to have a noticeableeffect on your app's latency, App Engine automatically creates one ormore new instances to handle the increasing load.
Note: If your service is infrequently used, App Engine standard environment scales to zero instances.To reduce the impact of cold-start latency for infrequently used services,you can change the configuration to allow a minimum of one instance to always beactive or enablewarmup requests.You can configure the settings for automatic scaling to achieve a trade-offbetween the performance you want and the cost you can incur. The following tabledescribes these settings.
| Automatic scaling settings | Description |
|---|---|
| Target CPU utilization | Sets the CPU utilization ratio threshold to specify the CPU usage threshold at which more instances will be started to handle traffic. |
| Target throughput utilization | Sets the throughput threshold for the number of concurrent requests after which more instances will be started to handle traffic. |
| Max concurrent requests | Sets the max concurrent requests an instance can accept before the scheduler spawns a new instance. |
Watch the App EngineScheduler settingsvideo to see the effects of these settings.
Scaling down
When request volumes decrease, App Engine reduces the number of instances.This downward scaling helps ensure that all of your application's currentinstances are being used for optimal efficiency and cost effectiveness.
When an application is not being used at all, App Engine turns off itsassociated dynamic instances, but readily reloads them as soon as they areneeded. Reloading instances can result in loading requests and additionallatency for users.
You can specify a minimum number of idle instances. Setting an appropriatenumber of idle instances for your application based on request volume allowsyour application to serve every request with little latency, unless you areexperiencing abnormally high request volume.
Scaling down in automatic scaling
If your app uses automatic scaling, it takes approximately 15 minutes ofinactivity for the idle instances to start shutting down. To keep one or moreidle instances running, set the value ofmin_idle_instancesto1 or higher.
Scaling and batches of requests
If you are sending batches of requests to your services, for example, to a taskqueue for processing, a large number of instances will be created quickly. Werecommend controlling this by rate limiting the number of request sent persecond, if possible. For example, if you use Google Tasks, youcancontrol the rate at which tasks arepushed.
Instance life cycle
Instance states
An instance of an auto-scaled service is always running. However, an instance ofa manual or basic scaled service can be either running or stopped. All instancesof the same service and version share the same state. You change the state ofyour instances by managing your versions. You can:
- Use theVersions page in theGoogle Cloud console
- Use
gcloud app versions startandgcloud app versions stopcommands
Startup
Each service instance is created in response to a start request, which is anempty HTTPGET request to/_ah/start. App Engine sends this requestto bring an instance into existence; users cannot send a request to/_ah/start. Manual and basic scaling instances must respond to the startrequest before they can handle another request. The start request can be usedfor two purposes:
- To start a program that runs indefinitely, without accepting further requests.
- To initialize an instance before it receives additional traffic.
Manual, basic, and automatically scaling instances startup differently. When youstart a manual scaling instance, App Engine immediately sends a/_ah/start request to each instance. When you start an instance of a basicscaling service, App Engine allows it to accept traffic, but the/_ah/start request is not sent to an instance until it receives its first userrequest. Multiple basic scaling instances are only started as necessary, inorder to handle increased traffic. Automatically scaling instances do notreceive any/_ah/start request.
When an instance responds to the/_ah/start request with an HTTP status codeof200–299 or404, it is considered to have successfully started and canhandle additional requests. Otherwise, App Engine terminates theinstance. Manual scaling instances are restarted immediately, while basicscaling instances are restarted only when needed for serving traffic.
Shutdown
The shutdown process might be triggered by a variety of planned and unplannedevents, such as:
- There are too many instances and not enough app requests (traffic).
- You manually stop an instance.
- You deploy an updated version to the service.
- The instance exceeds the maximum memory for its configured
instance_class. - Your application runs out of Instance Hours quota.
- Your instance is moved to a different machine, either because the currentmachine that is running the instance is restarted, or App Engine movedyour instance to improve load distribution.
One of the benefits App Engine standard environment's "pay for only what you need" platform asdescribed earlier inScaling Down is that the system autoscalesthe number of instances down to zero when there is no traffic. This helps makeApp Engine a cost-effective solution for small applications that don't receivecontinuous requests. When an instance needs to be shut down, new incomingrequests are routed to other instances (if any) and requests that are currentlybeing processed are given time to complete.
App Engine normally sends aSTOP (SIGTERM) signal to the app container.Your app does not need to respond to this event, but it can use this to performany necessary clean-up actions before the container is shut down. Undernormal conditions, the system waits up to 2 seconds for theapp to stop and then sends aKILL (SIGKILL) signal. If your app does notcatch theSIGTERM signal, the instance is immediately shut down.
Some instance shutdown log messages you might see include:
[start]Quittingonterminatedsignal[INFO]Handlingsignal:term[INFO]Workerexiting(pid:21)[INFO]Workerexiting(pid:24)[INFO]Shuttingdown:Master[start]Startprogramfailed:terminationtriggeredbynginxexitThese log messages do not indicate any error condition but are indications ofthe normal instance shut down process. Note that[start] andStart in thelogs refer to a platform process namedstart and has nothing to do withstarting an instance or an app.
Loading requests
When App Engine creates a new instance for your application, theinstance must first load any libraries and resources required to handle therequest. This happens during the first request to the instance, called aLoading Request. During a loading request, your application undergoesinitialization which causes the request to take longer.
The following best practices allow you to reduce the duration of loadingrequests:
- Load only the code needed for startup.
- Access the disk as little as possible.
- In some cases, loading code from a zip or jar file is faster than loading frommany separate files.
Warmup requests
Warmup requests are a specific type of loading request that load applicationcode into an instance ahead of time, before any live requests are made.Manual or basic scaling instances do not receive an/_ah/warmup request.
To learn more about how to use warmup requests, seeConfiguring warmuprequests.
Instance uptime
App Engine attempts to keep manual and basic scaling instances runningindefinitely. However, at this time there is no guaranteed uptime for manual andbasic scaling instances.
NTP with App Engine standard environment
The App Engine standard environment has network time protocol (NTP) services which use Google NTP servers. However, the NTP serviceis not editable.
Manage services
Depending on thescaling type of your instance, you can manageservices and versions in the Google Cloud console or Google Cloud CLI.
Stop a version
Each version in App Engine runs within one or more instances, dependingon how much traffic you configured it to handle.
Click the tab for instructions on using the tool of your choice:
Console
To stop or disable a version for your service:
Go to the App EngineVersions page in the Google Cloud console:
Select a version from the table, and clickStop.
gcloud
Run the following:
gcloud app versions stop --service=SERVICEVERSIONReplace:
- SERVICE with the name of your service.
- VERSION with the version name of your service.
Delete a service
Each service can be configured to use different runtimes and to operate withdifferent performance settings. You can't delete the default service. Deleting aservice also deletes all of its accompanying versions in your project.
Click the tab for instructions on using the tool of your choice:
Console
To delete a service:
Go to the App EngineServices page in the Google Cloud console:
Select a service from the table, and clickDelete.
gcloud
Run the following:
gcloud app services deleteSERVICEReplace:
- SERVICE with the name of your service.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.