Best practices for the Compute Engine API

This document describes the recommended best practices for using theCompute Engine API and is intended for users who are already familiar with it.If you are a beginner, learn about theprerequisitesandusing the Compute Engine API.

Following these best practices can help you save time, prevent errors, andmitigate the effects ofrate quotas.

Use client libraries

Client libraries are the recommended way of programmatically accessing theCompute Engine API. Client libraries provide code that lets you access theAPI through common programming languages, which can save you time andimprove your code's performance.

Learn more aboutCompute Engine client libraries andClient library best practices.

Generate REST requests by using the Cloud console

When creating a resource, generate the REST request using the resource creationpages or details pages in the Google Cloud console. Using a generated RESTrequest saves time and helps prevent syntax errors.

Learn how toGenerate REST requests.

Wait for operations to be done

Don't assume that an operation—any API request that changes aresource—is complete or successful. Instead, use await method for theOperation resource to verify that the operation is done. (You don't need toverify a request that doesn't modify resources—such as a read requestusing aGET HTTP verb—because the API response already indicates ifthe request was successful. Consequently, the Compute Engine API does notreturnOperation resources for these requests.)

Whenever an API request is successfully initiated, it returns an HTTP200 status code. Although receiving a200 indicates that the serverreceived your API request successfully, this status code doesn't indicateif the requested operation has been completed successfully or not. For example,you can receive a200, but the operation might not be complete yet orthe operation might have failed.

Any request to create, update, or delete for along-running operationreturns anOperation resource,which captures the status of that request. An operation is done when thestatus field of theOperation resource isDONE. To check the status,use thewait method that matches thescopeof the returnedOperation resource:

Thewait method returns when the operation is done or when the request isapproaching the 2-minute deadline. When using thewait method,avoid short polling, which is when your clients continuously make requests tothe server without waiting for a response. Using thewait method in a retryloop withexponential backoff to check thestatus of your request, instead of using theget method with short pollingfor theOperation resource, helps preserve yourrate quotasand reduces latency.

For more information about and examples of using thewait method, seeHandling API responses.

To check the status of a requested operation, seeChecking operation status.

While waiting for an operation to complete, account for theoperation minimum retention period,as completed operations might be removed from the database after this period.

Paginate list results

When using alist method(such as a*.list method, a*.aggregatedList method, or any other methodthat returns a list), paginate the results whenever possible to ensure thatyou read the entire response. If you don't paginate, you can only receive upto the first 500 elements as determined by themaxResults query parameter.

For more information about pagination on Google Cloud, seeList Pagination.For specific details and examples, see the reference documentation for thelist method that you want to use, such asinstances.list.

You can also use Cloud Client Libraries tohandle pagination.

Use client-side list filters to avoid quota errors

When you use filters with*.list or*.aggregatedList methods, you incuradditional quota charges if there are more than 10k filtered resources from therequests.For more information, seefiltered_list_cost_overheadin Rate quotas.

If your project exceeds this rate quota, youreceive a 403 error with the reasonrateLimitExceeded. To avoid this error,use client-side filters for the list requests.

Note: You cannot request a higher limit for thefiltered_list_cost_overhead quota.

Rely on error codes, not error messages

Google APIs must use the canonical error codes defined bygoogle.rpc.Code,buterror messagescan be subject to change without notice. Error messages are generally intendedfor developers to read, not programs.

Learn more aboutAPI errors.

Minimize client-side retries to preserve rate quotas

Minimize the number of client-side retries for a project to preventrateLimitExceeded errors and to maximize the utilization of yourrate quotas. The following practicescan help you preserve the rate quotas for your projects:

  • Avoid short polling.
  • Use bursting sparingly and selectively.
  • Always make your calls in a retry loop with exponential backoff.
  • Use a client-side rate limiter.
  • Split your applications across multiple projects.

Avoid short polling

Avoid short polling, where your clients continuously make requests to theserver without waiting for a response. If you short poll, it is more difficultto catch bad requests that count against your quota, even if they do notreturn useful data.

Instead of short polling, you shouldwait for operations to be done.

Use bursting sparingly and selectively

Use bursting sparingly and selectively. Bursting is the act of allowing aspecific client to make many API requests in a short time. Usually, burstingis done in response to exceptional scenarios, such as cases where yourapplication needs to handle more traffic than usual. Bursting burns throughyour rate quota quickly so make sure you use it only when necessary.

When bursting is required, use dedicated batch APIs when possible, such asthebulk instance API ormanaged instance groups.

Learn more aboutbatching requests.

Always make your calls in a retry loop with exponential backoff

Useexponential backoff to progressively space out requests when they timeout or whenever you reachyour rate quota.

Any retry loop should have an exponential backoff that ensures frequentretries don't overload your application or exceed your rate quotas. Otherwise,you risk negatively impacting all other systems in the same project.

If you need a retry loop for an operation that failed because you have reachedthe rate quota, your exponential backoff strategy should allow enough timebetween retries for the quota bucket to be refilled (usually every minute).

Alternatively, if you need a retry loop for whenwaiting for an operationreaches timeout, the maximum interval of your exponential backoff strategyshouldn't exceed the operation minimum retention period. Otherwise, you mightreceive an operationNot Found error.

For an example of implementing exponential backoff, see theexponential backoff algorithm for theIdentity and Access Management API.

Use a client-side rate limiter

Use a client-side rate limiter. A client-side rate limiter sets an artificiallimit so that the client in question can only use a certain amount of quota,which prevents any one client from consuming all your quota.

Split up your applications across multiple projects

Splitting up your applications across multiple projects can help minimizethe number of requests for your quota buckets. Since quotas are appliedon a per-project level, you can split up your applications so each applicationhas its own dedicated quota bucket.

Checklist summary

The following checklist summarizes the best practices for using theCompute Engine API.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.