Errors
Number193
Permalinkgoogle.aip.dev/193
StateApproved
Created2019-07-26
Updated2019-07-26
Contents

AIP-193

Errors

Effective error communication is an important part of designing simpleand intuitive APIs. Services returning standardized error responsesenable API clients to construct centralized common error handling logic.This common logic simplifies API client applications and eliminates theneed for cumbersome custom error handling code.

Guidance

Servicesmust return agoogle.rpc.Status message when anAPI error occurs, andmust use the canonical error codes defined ingoogle.rpc.Code. More information about the particular codesis available in thegRPC status code documentation.

Error messagesshould help a reasonably technical userunderstandandresolve the issue, andshould not assume that the user is anexpert in your particular API. Additionally, error messagesmust notassume that the user will know anything about its underlyingimplementation.

Error messagesshould be brief but actionable. Any extra informationshould be provided in thedetails field. If even more informationis necessary, youshould provide a link where a reader can get moreinformation or ask questions to help resolve the issue. It is alsoimportant toset the right tone when writing messages.

The following sections describe the fields ofgoogle.rpc.Status.

Status.message

Themessage field is a developer-facing, human-readable "debug message"whichshould be in English. (Localized messages are expressed usingaLocalizedMessage within thedetails field. SeeLocalizedMessage for more details.) Any dynamic aspects ofthe messagemust be included as metadata within theErrorInfo that appearsindetails.

The message is considered a problem description. It is intended fordevelopers to understand the problem and is more detailed thanErrorInfo.reason, discussedlater.

Messagesshould use simple descriptive language that is easy to understand(without technical jargon) to clearly state the problem that results in anerror, and offer an actionable resolution to it.

For pre-existing (brownfield) APIs which have previously returned errorswithout machine-readable identifiers, the value ofmessagemustremain the same for any given error. For more information, seeChanging Error Messages.

Status.code

Thecode field is the status code, whichmust be the numeric value ofone of the elements of thegoogle.rpc.Code enum.

For example, the value5 is the numeric value of theNOT_FOUNDenum element.

Status.details

Thedetails field allows messages with additional error information tobe included in the error response, each packed in agoogle.protobuf.Anymessage.

Google defines a set ofstandard detail payloads for errordetails, which cover most common needs for API errors. Servicesshould use these standard detail payloads when feasible.

Each type of detail payloadmust be included at most once. Forexample, theremust not be more than oneBadRequestmessage in thedetails, but theremay be aBadRequest and aPreconditionFailure.

All error responsesmust include anErrorInfo withindetails. Thisprovides machine-readable identifiers so that users can write code againstspecific aspects of the error.

The following sections describe the most common standard detail payloads.

ErrorInfo

TheErrorInfo message is the primary way to send amachine-readable identifier. Contextual informationshould beincluded inmetadata inErrorInfo andmust be included if itappears within an error message.

Thereason field is a short snake_case description of the cause of theerror. Error reasons are unique within a particular domain of errors.The reasonmust be at most 63 characters and match a regular expression of[A-Z][A-Z0-9_]+[A-Z0-9]. (This is UPPER_SNAKE_CASE, without leadingor trailing underscores, and without leading digits.)

The reasonshould be terse, but meaningful enough for a human reader tounderstand what the reason refers to.

Good examples:

  • CPU_AVAILABILITY
  • NO_STOCK
  • CHECKED_OUT
  • AVAILABILITY_ERROR

Bad examples:

  • THE_BOOK_YOU_WANT_IS_NOT_AVAILABLE (overly verbose)
  • ERROR (too general)

Thedomain field is the logical grouping to which thereason belongs.The domainmust be a globally unique value, and is typically the name of the servicethat generated the error, e.g.pubsub.googleapis.com.

The (reason, domain) pair form a machine-readable way of identifying a particular error.Servicesmust use the same (reason, domain) pair for the same error, andmust not use the same (reason, domain) pair for logically different errors.The decision about whether two errors are "the same" or not is not always clear, butshould generally be considered in terms of the expected action a client might taketo resolve them.

Themetadata field is a map of key/value pairs providing additionaldynamic information as context. Each key withinmetadatamust be at most64 characters long, and conform to the regular expression[a-z][a-zA-Z0-9-_]+.

Any request-specific information which contributes to theStatus.message orLocalizedMessage.message messagesmust be represented withinmetadata.This practice is critical so that machine actors do not need to parse errormessages to extract information.

For example consider the following message:

An <e2-medium> VM instance with <local-ssd=3,nvidia-t4=2> is currently unavailablein the <us-east1-a> zone. Consider trying your request in the <us-central1-f,us-central1-c>zone(s), which currently has/have capacity to accommodate your request. Alternatively,you can try your request again with a different VM hardware configurationor at a later time. For more information, see the troubleshooting documentation.

TheErrorInfo.metadata map for the same error could be:

  • "zone": "us-east1-a"
  • "vmType": "e2-medium"
  • "attachment": "local-ssd=3,nvidia-t4=2"
  • "zonesWithCapacity": "us-central1-f,us-central1-c"

Additional contextual information that does not appear in an error messagemay also be included inmetadata to allow programmatic use by the client.

The metadata included for any given (reason,domain) pair can evolve over time:

  • New keysmay be included
  • All keys that have been includedmust continue to be included (but may have empty values)

In other words, once a user has observed a given key for a (reason, domain) pair, theservicemust allow them to rely on it continuing to be present in the future.

The set of keys provided in each (reason, domain) pair is independent from other pairs,but servicesshould aim for consistent key naming. For example, two error reasonswithin the same domain should not use metadata keys ofvmType andvirtualMachineType.

LocalizedMessage

google.rpc.LocalizedMessage is used to provide an errormessage whichshould be localized to a user-specified locale wherepossible.

If theStatus.message field has a sub-optimal valuewhich cannot be changed due to the constraints in theChanging Error Messages section,LocalizedMessagemay be used to provide a better error message even when no user-specifiedlocale is available.

Regardless of how the locale for the message was determined, both thelocaleandmessage fieldsmust be populated.

Thelocale field specifies the locale of the message,followingIETF bcp47 (Tags forIdentifying Languages). Example values:"en-US","fr-CH","es-MX".

Themessage field contains the localized text itself. Thisshould include a brief description of the error and a call to actionto resolve the error. The messageshould include contextual informationto make the message as specific as possible. Any contextual informationin the messagemust be included inErrorInfo.metadata. SeeErrorInfo for more details of how contextual informationmay be included in a message and the corresponding metadata.

TheLocalizedMessage payloadshould contain the complete resolutionto the error. If more information is needed than can reasonably fit in thispayload, then additional resolution informationmust be provided inaHelp payload. See theHelp section for guidance.

Help

When other textual error messages (inStatus.message orLocalizedMessage.message) don't provide the user sufficientcontext or actionable next steps, or if there are multiple points offailure that need to be considered in troubleshooting, a link tosupplemental troubleshooting documentationmust be provided in theHelp payload.

Provide this information in addition to a clear problem definition andactionable resolution, not as an alternative to them. The linkeddocumentationmust clearly relate to the error. If a single pagecontains information about multiple errors, theErrorInfo.reason valuemust be used to narrow downthe relevant information.

Thedescription field is a textual description of the linked information.Thismust be suitable to display to a user as text for a hyperlink.Thismust be plain text (not HTML, Markdown etc).

Exampledescription value:"Troubleshooting documentation for STOCKOUT errors"

Theurl field is the URL to link to. Thismust be an absolute URL,including scheme.

Exampleurl value:"https://cloud.google.com/compute/docs/resource-error"

For publicly-documented services, even those with access controls on actualusage, the linked contentmust be accessible without authentication.

For privately-documented services, the linked contentmay requireauthentication.

Error messages

Textual error messages can be present in bothStatus.message andLocalizedMessage.message fields. Messagesshould be succinct butactionable, with request-specific information (such as a resource nameor region) providing precise details where appropriate. Any request-specificdetailsmust be present inErrorInfo.metadata.

Changing error messages

Changing the content ofStatus.message over time must be done carefully,to avoid breaking clients who have previously had to rely on the messagefor all information. Seethe rationale sectionfor more details.

For a given RPC:

  • If the RPC hasalways returnedErrorInfo with machine-readable information, the content ofStatus.messagemay change over time. (For example, the API producer may provide a clearer explanation, or more request-specific information.)
  • Otherwise, the content ofStatus.messagemust be stable, providing the same text with the same request-specific information. Instead of changingStatus.message, the APIshould include aLocalizedMessage withinStatus.details.

Even if an RPC has always returnedErrorInfo, the APImay keepthe existingStatus.message stable and add aLocalizedMessage withinStatus.details.

The content ofLocalizedMessage.detailsmay change over time.

Partial errors

APIsshould not support partial errors. Partial errors addsignificant complexity for users, because they usually sidestep the useof error codes, or move those error codes into the response message,where the usermust write specialized error handling logic toaddress the problem.

However, occasionally partial errors are necessary, particularly in bulkoperations where it would be hostile to users to fail an entire largerequest because of a problem with a single entry.

Methods that require partial errorsshould uselong-runningoperations, and the methodshould put partial failure informationin the metadata message. The errors themselvesmust still berepresented with agoogle.rpc.Status object.

Permission Denied

If the user does not have permission to access the resource or parent,regardless of whether or not it exists, the servicemust error withPERMISSION_DENIED (HTTP 403). Permissionmust be checked prior tochecking if the resource or parent exists.

If the user does have proper permission, but the requested resource orparent does not exist, the servicemust error withNOT_FOUND (HTTP404).

HTTP/1.1+JSON representation

When clients use HTTP/1.1 as perAIP-127, the error informationis returned in the body of the response, as a JSON object. For backwardcompatibility reasons, this does not map precisely togoogle.rpc.Status,but contains the same core information. The schema is defined in the following proto:

messageError{messageStatus{// The HTTP status code that corresponds to `google.rpc.Status.code`.int32code=1;// This corresponds to `google.rpc.Status.message`.stringmessage=2;// This is the enum version for `google.rpc.Status.code`.google.rpc.Codestatus=4;// This corresponds to `google.rpc.Status.details`.repeatedgoogle.protobuf.Anydetails=5;}Statuserror=1;}

The most important difference is that thecode field in the JSON is an HTTP status code,not the direct value ofgoogle.rpc.Status.code. For example, agoogle.rpc.Statusmessage with acode value of 5 would be mapped to an object including the followingcode-related fields (as well as the message, details etc):

{"error":{"code":404,//TheHTTPstatuscodefor"not found""status":"NOT_FOUND"//Thenameingoogle.rpc.Codeforvalue5}}

The following JSON shows a fully populated HTTP/1.1+JSON representation of an error response.

{"error":{"code":429,"message":"The zone 'us-east1-a' does not have enough resources available to fulfill the request. Try a different zone, or try again later.","status":"RESOURCE_EXHAUSTED","details":[{"@type":"type.googleapis.com/google.rpc.ErrorInfo","reason":"RESOURCE_AVAILABILITY","domain":"compute.googleapis.com","metadata":{"zone":"us-east1-a","vmType":"e2-medium","attachment":"local-ssd=3,nvidia-t4=2","zonesWithCapacity":"us-central1-f,us-central1-c"}},{"@type":"type.googleapis.com/google.rpc.LocalizedMessage","locale":"en-US","message":"An <e2-medium> VM instance with <local-ssd=3,nvidia-t4=2> is currently unavailable in the <us-east1-a> zone. Consider trying your request in the <us-central1-f,us-central1-c> zone(s), which currently has/have capacity to accommodate your request. Alternatively, you can try your request again with a different VM hardware configuration or at a later time. For more information, see the troubleshooting documentation."},{"@type":"type.googleapis.com/google.rpc.Help","links":[{"description":"Additional information on this error","url":"https://cloud.google.com/compute/docs/resource-error"}]}]}}

Rationale

Requiring ErrorInfo

ErrorInfo is required because it further identifies an error. Withonly approximately twentyavailable values forStatus.status,it is difficult to disambiguate one error from another across an entireAPI Service.

Also, error messages often contain dynamic segments that expressvariable information, so there needs to be machine-readable component ofevery error response that enables clients to use such informationprogrammatically.

Including LocalizedMessage

LocalizedMessage was selected as the location to present alternateerror messages. WhileLocalizedMessagemay use a locale specifiedin the request, a servicemay provide aLocalizedMessage even withouta user-specified locale, typically to provide a better error message insituations whereStatus.message cannot be changed.Where the locale is not specified by the user, itshould been-US(US English).

A servicemay includeLocalizedMessage even when the same message isprovided inStatus.message and when localization into a user-specified localeis not supported. Reasons for this include:

  • An intention to support user-specified localization in the near future, allowing clients to consistently useLocalizedMessage and not change their error-reporting code when the functionality is introduced.
  • Consistency across all RPCs within a service: if some RPCs includeLocalizedMessage and some only useStatus.message for error messages, clients have to be aware of which RPCs will do what, or implement a fall-back mechanism. ProvidingLocalizedMessage on all RPCs allows simple and consistent client code to be written.

Updating Status.message

If a client has ever observed an error withStatus.message populated(which it always will be) but withoutErrorInfo, the developer of that clientmay well have had to resort to parsingStatus.message in order to find outinformation beyond just whatStatus.code conveys. That information may befound by matching specific text (e.g. "Connection closed with unknown cause")or by parsing the message to find out metadata values (e.g. a region withinsufficient resources). At that point,Status.message is implicitly partof the API contract, somust not be updated - that would be a breakingchange. This is one reason for introducingLocalizedMessage into theStatus.details.

RPCs which havealways includedErrorInfo are in a better position:the contract is then more about the stability ofErrorInfo for any givenerror. The reason and domain need to be consistent over time, and themetadata provided for any given (reason,domain) can only be expanded.It's still possible that clients could be parsingStatus.message instead ofusingErrorInfo, but they will always have had a more robust optionavailable to them.

Further reading

  • For which error codes to retry, seeAIP-194.
  • For how to retry errors in client libraries, seeAIP-4221.

Changelog

  • 2024-10-18: Rewrite/restructure for clarity.
  • 2024-01-10: Incorporate guidance for writing effective messages.
  • 2023-05-17: Change the recommended language forStatus.message to be the service's native language rather than English.
  • 2023-05-17: Specify requirements for changing error messages.
  • 2023-05-10: RequireErrorInfo for all error responses.
  • 2023-05-04: Require uniqueness by message type for error details.
  • 2022-11-04: Added guidance around PERMISSION_DENIED errors previously found in other AIPs.
  • 2022-08-12: Reworded/Simplified intro to add clarity to the intent.
  • 2020-01-22: Added a reference to theErrorInfo message.
  • 2019-10-14: Added guidance restricting error message mutability to if there is a machine-readable identifier present.
  • 2019-09-23: Added guidance about error message strings being able to change.