Language Guide (proto 3)

Covers how to use the proto3 revision of the Protocol Buffers language in your project.

This guide describes how to use the protocol buffer language to structure yourprotocol buffer data, including.proto file syntax and how to generate dataaccess classes from your.proto files. It covers theproto3 revision ofthe protocol buffers language.

For information oneditions syntax, see theProtobuf Editions Language Guide.

For information on theproto2 syntax, see theProto2 Language Guide.

This is a reference guide – for a step by step example that uses many of thefeatures described in this document, see thetutorialfor your chosen language.

Defining A Message Type

First let’s look at a very simple example. Let’s say you want to define a searchrequest message format, where each search request has a query string, theparticular page of results you are interested in, and a number of results perpage. Here’s the.proto file you use to define the message type.

syntax="proto3";messageSearchRequest{stringquery=1;int32page_number=2;int32results_per_page=3;}
  • The first line of the file specifies that you’re using the proto3 revisionof the protobuf language spec.

    • Theedition (orsyntax for proto2/proto3) must be the firstnon-empty, non-comment line of the file.
    • If noedition orsyntax is specified, the protocol buffer compilerwill assume you are usingproto2.
  • TheSearchRequest message definition specifies three fields (name/valuepairs), one for each piece of data that you want to include in this type ofmessage. Each field has a name and a type.

Specifying Field Types

In the earlier example, all the fields arescalar types: two integers(page_number andresults_per_page) and a string (query). You can alsospecifyenumerations and composite types like other message types foryour field.

Assigning Field Numbers

You must give each field in your message definition a number between1 and536,870,911 with the following restrictions:

  • The given numbermust be unique among all fields for that message.
  • Field numbers19,000 to19,999 are reserved for the Protocol Buffersimplementation. The protocol buffer compiler will complain if you use one ofthese reserved field numbers in your message.
  • You cannot use any previouslyreserved field numbers orany field numbers that have been allocated toextensions.

This numbercannot be changed once your message type is in use because itidentifies the field in themessage wire format.“Changing” a field number is equivalent to deleting that field and creating anew field with the same type but a new number. SeeDeleting Fieldsfor how to do this properly.

Field numbersshould never be reused. Never take a field number out of thereserved list for reuse with a new field definition. SeeConsequences of Reusing Field Numbers.

You should use the field numbers 1 through 15 for the most-frequently-setfields. Lower field number values take less space in the wire format. Forexample, field numbers in the range 1 through 15 take one byte to encode. Fieldnumbers in the range 16 through 2047 take two bytes. You can find out more aboutthis inProtocol Buffer Encoding.

Consequences of Reusing Field Numbers

Reusing a field number makes decoding wire-format messages ambiguous.

The protobuf wire format is lean and doesn’t provide a way to detect fieldsencoded using one definition and decoded using another.

Encoding a field using one definition and then decoding that same field with adifferent definition can lead to:

  • Developer time lost to debugging
  • A parse/merge error (best case scenario)
  • Leaked PII/SPII
  • Data corruption

Common causes of field number reuse:

  • renumbering fields (sometimes done to achieve a more aesthetically pleasingnumber order for fields). Renumbering effectively deletes and re-adds allthe fields involved in the renumbering, resulting in incompatiblewire-format changes.
  • deleting a field and notreserving the number to preventfuture reuse.

The field number is limited to 29 bits rather than 32 bits because three bitsare used to specify the field’s wire format. For more on this, see theEncoding topic.

Specifying Field Cardinality

Message fields can be one of the following:

  • Singular:

    In proto3, there are two types of singular fields:

    • optional: (recommended) Anoptional field is in one of two possiblestates:

      • the field is set, and contains a value that was explicitly set orparsed from the wire. It will be serialized to the wire.
      • the field is unset, and will return the default value. It will notbe serialized to the wire.

      You can check to see if the value was explicitly set.

      optional is recommended overimplicit fields for maximumcompatibility with protobuf editions and proto2.

    • implicit: (not recommended) An implicit field has no explicitcardinality label and behaves as follows:

      • if the field is a message type, it behaves just like anoptionalfield.

      • if the field is not a message, it has two states:

        • the field is set to a non-default (non-zero) value that wasexplicitly set or parsed from the wire. It will be serialized tothe wire.
        • the field is set to the default (zero) value. It will not beserialized to the wire. In fact, you cannot determine whetherthe default (zero) value was set or parsed from the wire or notprovided at all. For more on this subject, seeField Presence.
  • repeated: this field type can be repeated zero or more times in awell-formed message. The order of the repeated values will be preserved.

  • map: this is a paired key/value field type. SeeMaps for more onthis field type.

Repeated Fields are Packed by Default

In proto3,repeated fields of scalar numeric types usepacked encoding bydefault.

You can find out more aboutpacked encoding inProtocol Buffer Encoding.

Message Type Fields Always Have Field Presence

In proto3, message-type fields already have field presence. Because of this,adding theoptional modifier doesn’t change the field presence for the field.

The definitions forMessage2 andMessage3 in the following code samplegenerate the same code for all languages, and there is no difference inrepresentation in binary, JSON, and TextFormat:

syntax="proto3";packagefoo.bar;messageMessage1{}messageMessage2{Message1foo=1;}messageMessage3{optionalMessage1bar=1;}

Well-formed Messages

The term “well-formed,” when applied to protobuf messages, refers to the bytesserialized/deserialized. The protoc parser validates that a given protodefinition file is parseable.

Singular fields can appear more than once in wire-format bytes. The parser willaccept the input, but only the last instance of that field will be accessiblethrough the generated bindings. SeeLast One Winsfor more on this topic.

Adding More Message Types

Multiple message types can be defined in a single.proto file. This is usefulif you are defining multiple related messages – so, for example, if you wantedto define the reply message format that corresponds to yourSearchResponsemessage type, you could add it to the same.proto:

messageSearchRequest{stringquery=1;int32page_number=2;int32results_per_page=3;}messageSearchResponse{...}

Combining Messages leads to bloat While multiple message types (such asmessage, enum, and service) can be defined in a single.proto file, it canalso lead to dependency bloat when large numbers of messages with varyingdependencies are defined in a single file. It’s recommended to include as fewmessage types per.proto file as possible.

Adding Comments

To add comments to your.proto files:

  • Prefer C/C++/Java line-end-style comments ‘//’ on the line before the .protocode element

  • C-style inline/multi-line comments/* ... */ are also accepted.

    • When using multi-line comments, a margin line of ‘*’ is preferred.
/** * SearchRequest represents a search query, with pagination options to * indicate which results to include in the response. */messageSearchRequest{stringquery=1;// Which page number do we want?int32page_number=2;// Number of results to return per page.int32results_per_page=3;}

Deleting Fields

Deleting fields can cause serious problems if not done properly.

When you no longer need a field and all references have been deleted from clientcode, you may delete the field definition from the message. However, youmustreserve the deleted field number. If you do notreserve the field number, it is possible for a developer to reuse that number inthe future.

You should also reserve the field name to allow JSON and TextFormat encodings ofyour message to continue to parse.

Reserved Field Numbers

If youupdate a message type by entirely deleting a field, orcommenting it out, future developers can reuse the field number when makingtheir own updates to the type. This can cause severe issues, as described inConsequences of Reusing Field Numbers. To make sure thisdoesn’t happen, add your deleted field number to thereserved list.

The protoc compiler will generate error messages if any future developers try touse these reserved field numbers.

messageFoo{reserved2,15,9to11;}

Reserved field number ranges are inclusive (9 to 11 is the same as9, 10, 11).

Reserved Field Names

Reusing an old field name later is generally safe, except when using TextProtoor JSON encodings where the field name is serialized. To avoid this risk, youcan add the deleted field name to thereserved list.

Reserved names affect only the protoc compiler behavior and not runtimebehavior, with one exception: TextProto implementations may discard unknownfields (without raising an error like with other unknown fields) with reservednames at parse time (only the C++ and Go implementations do so today). RuntimeJSON parsing is not affected by reserved names.

messageFoo{reserved2,15,9to11;reserved"foo","bar";}

Note that you can’t mix field names and field numbers in the samereservedstatement.

What’s Generated from Your.proto?

When you run theprotocol buffer compiler on a.proto, thecompiler generates the code in your chosen language you’ll need to work with themessage types you’ve described in the file, including getting and setting fieldvalues, serializing your messages to an output stream, and parsing your messagesfrom an input stream.

  • ForC++, the compiler generates a.h and.cc file from each.proto, with a class for each message type described in your file.
  • ForJava, the compiler generates a.java file with a class for eachmessage type, as well as a specialBuilder class for creating messageclass instances.
  • ForKotlin, in addition to the Java generated code, the compilergenerates a.kt file for each message type with an improved Kotlin API.This includes a DSL that simplifies creating message instances, a nullablefield accessor, and a copy function.
  • Python is a little different — the Python compiler generates a modulewith a static descriptor of each message type in your.proto, which isthen used with ametaclass to create the necessary Python data accessclass at runtime.
  • ForGo, the compiler generates a.pb.go file with a type for eachmessage type in your file.
  • ForRuby, the compiler generates a.rb file with a Ruby modulecontaining your message types.
  • ForObjective-C, the compiler generates apbobjc.h andpbobjc.m filefrom each.proto, with a class for each message type described in yourfile.
  • ForC#, the compiler generates a.cs file from each.proto, with aclass for each message type described in your file.
  • ForPHP, the compiler generates a.php message file for each messagetype described in your file, and a.php metadata file for each.protofile you compile. The metadata file is used to load the valid message typesinto the descriptor pool.
  • ForDart, the compiler generates a.pb.dart file with a class for eachmessage type in your file.

You can find out more about using the APIs for each language by following thetutorial for your chosen language. For even more APIdetails, see the relevantAPI reference.

Scalar Value Types

A scalar message field can have one of the following types – the table shows thetype specified in the.proto file, and the corresponding type in theautomatically generated class:

Proto TypeNotes
double
float
int32Uses variable-length encoding. Inefficient for encoding negativenumbers – if your field is likely to have negative values, use sint32instead.
int64Uses variable-length encoding. Inefficient for encoding negativenumbers – if your field is likely to have negative values, use sint64instead.
uint32Uses variable-length encoding.
uint64Uses variable-length encoding.
sint32Uses variable-length encoding. Signed int value. These moreefficiently encode negative numbers than regular int32s.
sint64Uses variable-length encoding. Signed int value. These moreefficiently encode negative numbers than regular int64s.
fixed32Always four bytes. More efficient than uint32 if values are oftengreater than 228.
fixed64Always eight bytes. More efficient than uint64 if values are oftengreater than 256.
sfixed32Always four bytes.
sfixed64Always eight bytes.
bool
stringA string must always contain UTF-8 encoded or 7-bit ASCII text, and cannotbe longer than 232.
bytesMay contain any arbitrary sequence of bytes no longer than 232.
Proto TypeC++ TypeJava/Kotlin Type[1]Python Type[3]Go TypeRuby TypeC# TypePHP TypeDart TypeRust Type
doubledoubledoublefloatfloat64Floatdoublefloatdoublef64
floatfloatfloatfloatfloat32Floatfloatfloatdoublef32
int32int32_tintintint32Fixnum or Bignum (as required)intintegerinti32
int64int64_tlongint/long[4]int64Bignumlonginteger/string[6]Int64i64
uint32uint32_tint[2]int/long[4]uint32Fixnum or Bignum (as required)uintintegerintu32
uint64uint64_tlong[2]int/long[4]uint64Bignumulonginteger/string[6]Int64u64
sint32int32_tintintint32Fixnum or Bignum (as required)intintegerinti32
sint64int64_tlongint/long[4]int64Bignumlonginteger/string[6]Int64i64
fixed32uint32_tint[2]int/long[4]uint32Fixnum or Bignum (as required)uintintegerintu32
fixed64uint64_tlong[2]int/long[4]uint64Bignumulonginteger/string[6]Int64u64
sfixed32int32_tintintint32Fixnum or Bignum (as required)intintegerinti32
sfixed64int64_tlongint/long[4]int64Bignumlonginteger/string[6]Int64i64
boolboolbooleanboolboolTrueClass/FalseClassboolbooleanboolbool
stringstd::stringStringstr/unicode[5]stringString (UTF-8)stringstringStringProtoString
bytesstd::stringByteStringstr (Python 2), bytes (Python 3)[]byteString (ASCII-8BIT)ByteStringstringListProtoBytes

[1] Kotlin uses the corresponding types from Java, even for unsignedtypes, to ensure compatibility in mixed Java/Kotlin codebases.

[2] In Java, unsigned 32-bit and 64-bit integers are representedusing their signed counterparts, with the top bit simply being stored in thesign bit.

[3] In all cases, setting values to a field will perform typechecking to make sure it is valid.

[4] 64-bit or unsigned 32-bit integers are always represented as longwhen decoded, but can be an int if an int is given when setting the field. Inall cases, the value must fit in the type represented when set. See [2].

[5] Python strings are represented as unicode on decode but can bestr if an ASCII string is given (this is subject to change).

[6] Integer is used on 64-bit machines and string is used on 32-bitmachines.

You can find out more about how these types are encoded when you serialize yourmessage inProtocol Buffer Encoding.

Default Field Values

When a message is parsed, if the encoded message bytes do not contain aparticular field, accessing that field in the parsed object returns the defaultvalue for that field. The default values are type-specific:

  • For strings, the default value is the empty string.
  • For bytes, the default value is empty bytes.
  • For bools, the default value is false.
  • For numeric types, the default value is zero.
  • For message fields, the field is not set. Its exact value islanguage-dependent. See thegenerated code guide for details.
  • For enums, the default value is thefirst defined enum value, which mustbe 0. SeeEnum Default Value.

The default value for repeated fields is empty (generally an empty list in theappropriate language).

The default value for map fields is empty (generally an empty map in theappropriate language).

Note that for implicit-presence scalar fields, once a message is parsed there’sno way of telling whether that field was explicitly set to the default value(for example whether a boolean was set tofalse) or just not set at all: youshould bear this in mind when defining your message types. For example, don’thave a boolean that switches on some behavior when set tofalse if you don’twant that behavior to also happen by default. Also note that if a scalar messagefieldis set to its default, the value will not be serialized on the wire.If a float or double value is set to +0 it will not be serialized, but -0 isconsidered distinct and will be serialized.

See thegenerated code guide for yourchosen language for more details about how defaults work in generated code.

Enumerations

When you’re defining a message type, you might want one of its fields to onlyhave one of a predefined list of values. For example, let’s say you want to addacorpus field for eachSearchRequest, where the corpus can beUNIVERSAL,WEB,IMAGES,LOCAL,NEWS,PRODUCTS orVIDEO. You can do this verysimply by adding anenum to your message definition with a constant for eachpossible value.

In the following example we’ve added anenum calledCorpus with all thepossible values, and a field of typeCorpus:

enumCorpus{CORPUS_UNSPECIFIED=0;CORPUS_UNIVERSAL=1;CORPUS_WEB=2;CORPUS_IMAGES=3;CORPUS_LOCAL=4;CORPUS_NEWS=5;CORPUS_PRODUCTS=6;CORPUS_VIDEO=7;}messageSearchRequest{stringquery=1;int32page_number=2;int32results_per_page=3;Corpuscorpus=4;}

Enum Default Value

The default value for theSearchRequest.corpus field isCORPUS_UNSPECIFIEDbecause that is the first value defined in the enum.

In proto3, the first value defined in an enum definitionmust have the valuezero and should have the nameENUM_TYPE_NAME_UNSPECIFIED orENUM_TYPE_NAME_UNKNOWN. This is because:

  • There must be a zero value, so that we can use 0 as a numericdefault value.
  • The zero value needs to be the first element, for compatibility with theproto2 semantics wherethe first enum value is the default unless a different value is explicitlyspecified.

It is also recommended that this first, default value have no semantic meaningother than “this value was unspecified”.

Enum Value Aliases

You can define aliases by assigning the same value to different enum constants.To do this you need to set theallow_alias option totrue. Otherwise, theprotocol buffer compiler generates a warning message when aliases arefound. Though all alias values are valid for serialization, only the first valueis used when deserializing.

enumEnumAllowingAlias{optionallow_alias=true;EAA_UNSPECIFIED=0;EAA_STARTED=1;EAA_RUNNING=1;EAA_FINISHED=2;}enumEnumNotAllowingAlias{ENAA_UNSPECIFIED=0;ENAA_STARTED=1;// ENAA_RUNNING = 1;  // Uncommenting this line will cause a warning message.ENAA_FINISHED=2;}

Enumerator constants must be in the range of a 32-bit integer. Sinceenumvalues usevarint encoding on thewire, negative values are inefficient and thus not recommended. You can defineenums within a message definition, as in the earlier example, or outside –theseenums can be reused in any message definition in your.proto file. Youcan also use anenum type declared in one message as the type of a field in adifferent message, using the syntax_MessageType_._EnumType_.

When you run the protocol buffer compiler on a.proto that uses anenum, thegenerated code will have a correspondingenum for Java, Kotlin, or C++, or aspecialEnumDescriptor class for Python that’s used to create a set ofsymbolic constants with integer values in the runtime-generated class.

Important

Thegenerated code may be subject to language-specific limitations on the number ofenumerators (low thousands for one language). Review thelimitations for the languages you plan to use.

During deserialization, unrecognized enum values will be preserved in themessage, though how this is represented when the message is deserialized islanguage-dependent. In languages that support open enum types with valuesoutside the range of specified symbols, such as C++ and Go, the unknown enumvalue is simply stored as its underlying integer representation. In languageswith closed enum types such as Java, a case in the enum is used to represent anunrecognized value, and the underlying integer can be accessed with specialaccessors. In either case, if the message is serialized the unrecognized valuewill still be serialized with the message.

Important

Forinformation on how enums should work contrasted with how they currently work indifferent languages, seeEnum Behavior.

For more information about how to work with messageenums in yourapplications, see thegenerated code guidefor your chosen language.

Reserved Values

If youupdate an enum type by entirely removing an enum entry, orcommenting it out, future users can reuse the numeric value when making theirown updates to the type. This can cause severe issues if they later load oldinstances of the same.proto, including data corruption, privacy bugs, and soon. One way to make sure this doesn’t happen is to specify that the numericvalues (and/or names, which can also cause issues for JSON serialization) ofyour deleted entries arereserved. The protocol buffer compiler will complainif any future users try to use these identifiers. You can specify that yourreserved numeric value range goes up to the maximum possible value using themax keyword.

enumFoo{reserved2,15,9to11,40tomax;reserved"FOO","BAR";}

Note that you can’t mix field names and numeric values in the samereservedstatement.

Using Other Message Types

You can use other message types as field types. For example, let’s say youwanted to includeResult messages in eachSearchResponse message – to dothis, you can define aResult message type in the same.proto and thenspecify a field of typeResult inSearchResponse:

messageSearchResponse{repeatedResultresults=1;}messageResult{stringurl=1;stringtitle=2;repeatedstringsnippets=3;}

Importing Definitions

In the earlier example, theResult message type is defined in the same file asSearchResponse – what if the message type you want to use as a field type isalready defined in another.proto file?

You can use definitions from other.proto files byimporting them. To importanother.proto’s definitions, you add an import statement to the top of yourfile:

import"myproject/other_protos.proto";

By default, you can use definitions only from directly imported.proto files.However, sometimes you may need to move a.proto file to a new location.Instead of moving the.proto file directly and updating all the call sites ina single change, you can put a placeholder.proto file in the old location toforward all the imports to the new location using theimport public notion.

Note: The public import functionality available in Java is most effectivewhen moving an entire .proto file or when usingjava_multiple_files = true. Inthese cases, generated names remain stable, avoiding the need to updatereferences in your code. While technically functional when moving a subset of a.proto file withoutjava_multiple_files = true, doing so requires simultaneousupdates to many references, thus might not significantly ease migration. Thefunctionality is not available in Kotlin, TypeScript, JavaScript, GCL, or withC++ targets that use protobuf static reflection.

import public dependencies can be transitively relied upon by any codeimporting the proto containing theimport public statement. For example:

// new.proto// All definitions are moved here
// old.proto// This is the proto that all clients are importing.importpublic"new.proto";import"other.proto";
// client.protoimport"old.proto";// You use definitions from old.proto and new.proto, but not other.proto

The protocol compiler searches for imported files in a set of directoriesspecified on the protocol compiler command line using the-I/--proto_pathflag. If no flag was given, it looks in the directory in which the compiler wasinvoked. In general you should set the--proto_path flag to the root of yourproject and use fully qualified names for all imports.

Using proto2 Message Types

It’s possible to importproto2 message types anduse them in your proto3 messages, and vice versa. However, proto2 enums cannotbe used directly in proto3 syntax (it’s okay if an imported proto2 message usesthem).

Nested Types

You can define and use message types inside other message types, as in thefollowing example – here theResult message is defined inside theSearchResponse message:

messageSearchResponse{messageResult{stringurl=1;stringtitle=2;repeatedstringsnippets=3;}repeatedResultresults=1;}

If you want to reuse this message type outside its parent message type, yourefer to it as_Parent_._Type_:

messageSomeOtherMessage{SearchResponse.Resultresult=1;}

You can nest messages as deeply as you like. In the example below, note that thetwo nested types namedInner are entirely independent, since they are definedwithin different messages:

messageOuter{// Level 0messageMiddleAA{// Level 1messageInner{// Level 2int64ival=1;boolbooly=2;}}messageMiddleBB{// Level 1messageInner{// Level 2int32ival=1;boolbooly=2;}}}

Updating A Message Type

If an existing message type no longer meets all your needs – for example, you’dlike the message format to have an extra field – but you’d still like to usecode created with the old format, don’t worry! It’s very simple to updatemessage types without breaking any of your existing code when you use the binarywire format.

Note

Ifyou use ProtoJSON orproto text formatto store your protocol buffer messages, the changes that you can make in yourproto definition are different. The ProtoJSON wire format safe changes aredescribedhere.

CheckProto Best Practices and thefollowing rules:

Binary Wire-unsafe Changes

Wire-unsafe changes are schema changes that will break if you use parse datathat was serialized using the old schema with a parser that is using the newschema (or vice versa). Only make wire-unsafe changes if you know that allserializers and deserializers of the data are using the new schema.

  • Changing field numbers for any existing field is not safe.
    • Changing the field number is equivalent to deleting the field and addinga new field with the same type. If you want to renumber a field, see theinstructions fordeleting a field.
  • Moving fields into an existingoneof is not safe.

Binary Wire-safe Changes

Wire-safe changes are ones where it is fully safe to evolve the schema in thisway without risk of data loss or new parse failures.

Note that any wire-safe changes may be a breaking change to application code ina given language. For example, adding a value to a preexisting enum would be acompilation break for any code with an exhaustive switch on that enum. For thatreason, Google may avoid making some of these types of changes on publicmessages: the AIPs contain guidance for which of these changes are safe to makethere.

  • Adding new fields is safe.
    • If you add new fields, any messages serialized by code using your “old”message format can still be parsed by your new generated code. Youshould keep in mind thedefault values for these elements sothat new code can properly interact with messages generated by old code.Similarly, messages created by your new code can be parsed by your oldcode: old binaries simply ignore the new field when parsing. See theUnknown Fields section for details.
  • Removing fields is safe.
    • The same field number must not used again in your updated message type.You may want to rename the field instead, perhaps adding the prefix“OBSOLETE_”, or make the field numberreserved, sothat future users of your.proto can’t accidentally reuse the number.
  • Adding additional values to an enum is safe.
  • Changing a single explicit presence field or extension into a member of anewoneof is safe.
  • Changing aoneof which contains only one field to an explicit presencefield is safe.
  • Changing a field into an extension of same number and type is safe.

Binary Wire-compatible Changes (Conditionally Safe)

Unlike Wire-safe changes, wire-compatible means that the same data can be parsedboth before and after a given change. However, a parse of the data may be lossyunder this shape of change. For example, changing an int32 to an int64 is acompatible change, but if a value larger than INT32_MAX is written, a clientthat reads it as an int32 will discard the high order bits of the number.

You can make compatible changes to your schema only if you manage the roll outto your system carefully. For example, you may change an int32 to an int64 butensure you continue to only write legal int32 values until the new schema isdeployed to all endpoints, and then subsequently start writing larger valuesafter that.

If your schema is published outside of your organization, you should generallynot make wire-compatible changes, as you cannot manage the deployment of the newschema to know when the different range of values may be safe to use.

  • int32,uint32,int64,uint64, andbool are all compatible.
    • If a number is parsed from the wire which doesn’t fit in thecorresponding type, you will get the same effect as if you had cast thenumber to that type in C++ (for example, if a 64-bit number is read asan int32, it will be truncated to 32 bits).
  • sint32 andsint64 are compatible with each other but arenotcompatible with the other integer types.
    • If the value written was between INT_MIN and INT_MAX inclusive it willparse as the same value with either type. If an sint64 value was writtenoutside of that range and parsed as an sint32, the varint is truncatedto 32 bits and then zigzag decoding occurs (which will cause a differentvalue to be observed).
  • string andbytes are compatible as long as the bytes are valid UTF-8.
  • Embedded messages are compatible withbytes if the bytes contain anencoded instance of the message.
  • fixed32 is compatible withsfixed32, andfixed64 withsfixed64.
  • Forstring,bytes, and message fields, singular is compatible withrepeated.
    • Given serialized data of a repeated field as input, clients that expectthis field to be singular will take the last input value if it’s aprimitive type field or merge all input elements if it’s a message typefield. Note that this isnot generally safe for numeric types,including bools and enums. Repeated fields of numeric types areserialized in thepackedformat by default, which will not be parsed correctly when a singularfield is expected.
  • enum is compatible withint32,uint32,int64, anduint64
    • Be aware that client code may treat them differently when the message isdeserialized: for example, unrecognized proto3enum values will bepreserved in the message, but how this is represented when the messageis deserialized is language-dependent.
  • Changing a field between amap<K, V> and the correspondingrepeatedmessage field is binary compatible (seeMaps, below, for themessage layout and other restrictions).
    • However, the safety of the change is application-dependent: whendeserializing and reserializing a message, clients using therepeatedfield definition will produce a semantically identical result; however,clients using themap field definition may reorder entries and dropentries with duplicate keys.

Unknown Fields

Unknown fields are well-formed protocol buffer serialized data representingfields that the parser does not recognize. For example, when an old binaryparses data sent by a new binary with new fields, those new fields becomeunknown fields in the old binary.

Proto3 messages preserve unknown fields and includes them during parsing and inthe serialized output, which matches proto2 behavior.

Retaining Unknown Fields

Some actions can cause unknown fields to be lost. For example, if you do one ofthe following, unknown fields are lost:

  • Serialize a proto to JSON.
  • Iterate over all of the fields in a message to populate a new message.

To avoid losing unknown fields, do the following:

  • Use binary; avoid using text formats for data exchange.
  • Use message-oriented APIs, such asCopyFrom() andMergeFrom(), to copy datarather than copying field-by-field

TextFormat is a bit of a special case. Serializing to TextFormat prints unknownfields using their field numbers. But parsing TextFormat data back into a binaryproto fails if there are entries that use field numbers.

Any

TheAny message type lets you use messages as embedded types without havingtheir .proto definition. AnAny contains an arbitrary serialized message asbytes, along with a URL that acts as a globally unique identifier for andresolves to that message’s type. To use theAny type, you need toimportgoogle/protobuf/any.proto.

import"google/protobuf/any.proto";messageErrorStatus{stringmessage=1;repeatedgoogle.protobuf.Anydetails=2;}

The default type URL for a given message type istype.googleapis.com/_packagename_._messagename_.

Different language implementations will support runtime library helpers to packand unpackAny values in a typesafe manner – for example, in Java, theAnytype will have specialpack() andunpack() accessors, while in C++ there arePackFrom() andUnpackTo() methods:

// Storing an arbitrary message type in Any.NetworkErrorDetailsdetails=...;ErrorStatusstatus;status.add_details()->PackFrom(details);// Reading an arbitrary message from Any.ErrorStatusstatus=...;for(constgoogle::protobuf::Any&detail:status.details()){if(detail.Is<NetworkErrorDetails>()){NetworkErrorDetailsnetwork_error;detail.UnpackTo(&network_error);...processingnetwork_error...}}

Oneof

If you have a message with many singular fields and where at most one field willbe set at the same time, you can enforce this behavior and save memory by usingthe oneof feature.

Oneof fields are like optional fields except all the fields in a oneof sharememory, and at most one field can be set at the same time. Setting any member ofthe oneof automatically clears all the other members. You can check which valuein a oneof is set (if any) using a specialcase() orWhichOneof() method,depending on your chosen language.

Note that ifmultiple values are set, the last set value as determined by theorder in the proto will overwrite all previous ones.

Field numbers for oneof fields must be unique within the enclosing message.

Using Oneof

To define a oneof in your.proto you use theoneof keyword followed by youroneof name, in this casetest_oneof:

messageSampleMessage{oneoftest_oneof{stringname=4;SubMessagesub_message=9;}}

You then add your oneof fields to the oneof definition. You can add fields ofany type, exceptmap fields andrepeated fields. If you need to add arepeated field to a oneof, you can use a message containing the repeated field.

In your generated code, oneof fields have the same getters and setters asregular fields. You also get a special method for checking which value (if any)in the oneof is set. You can find out more about the oneof API for your chosenlanguage in the relevantAPI reference.

Oneof Features

  • Setting a oneof field will automatically clear all other members of theoneof. So if you set several oneof fields, only thelast field you setwill still have a value.

    SampleMessagemessage;message.set_name("name");CHECK_EQ(message.name(),"name");// Calling mutable_sub_message() will clear the name field and will set// sub_message to a new instance of SubMessage with none of its fields set.message.mutable_sub_message();CHECK(message.name().empty());
  • If the parser encounters multiple members of the same oneof on the wire,only the last member seen is used in the parsed message. When parsing dataon the wire, starting at the beginning of the bytes, evaluate the nextvalue, and apply the following parsing rules:

    • First, check if adifferent field in the same oneof is currently set,and if so clear it.

    • Then apply the contents as though the field was not in a oneof:

      • A primitive will overwrite any value already set
      • A message will merge into any value already set
  • A oneof cannot berepeated.

  • Reflection APIs work for oneof fields.

  • If you set a oneof field to the default value (such as setting an int32oneof field to 0), the “case” of that oneof field will be set, and the valuewill be serialized on the wire.

  • If you’re using C++, make sure your code doesn’t cause memory crashes. Thefollowing sample code will crash becausesub_message was already deletedby calling theset_name() method.

    SampleMessagemessage;SubMessage*sub_message=message.mutable_sub_message();message.set_name("name");// Will delete sub_messagesub_message->set_...// Crashes here
  • Again in C++, if youSwap() two messages with oneofs, each message willend up with the other’s oneof case: in the example below,msg1 will have asub_message andmsg2 will have aname.

    SampleMessagemsg1;msg1.set_name("name");SampleMessagemsg2;msg2.mutable_sub_message();msg1.swap(&msg2);CHECK(msg1.has_sub_message());CHECK_EQ(msg2.name(),"name");

Backwards-compatibility issues

Be careful when adding or removing oneof fields. If checking the value of aoneof returnsNone/NOT_SET, it could mean that the oneof has not been set orit has been set to a field in a different version of the oneof. There is no wayto tell the difference, since there’s no way to know if an unknown field on thewire is a member of the oneof.

Tag Reuse Issues

  • Move singular fields into or out of a oneof: You may lose some of yourinformation (some fields will be cleared) after the message is serializedand parsed. However, you can safely move a single field into anew oneofand may be able to move multiple fields if it is known that only one is everset. SeeUpdating A Message Type for further details.
  • Delete a oneof field and add it back: This may clear your currently setoneof field after the message is serialized and parsed.
  • Split or merge oneof: This has similar issues to moving singular fields.

Maps

If you want to create an associative map as part of your data definition,protocol buffers provides a handy shortcut syntax:

map<key_type,value_type>map_field=N;

…where thekey_type can be any integral or string type (so, anyscalar type except for floating point types andbytes). Note thatneither enum nor proto messages are valid forkey_type.Thevalue_type can be any type except another map.

So, for example, if you wanted to create a map of projects where eachProjectmessage is associated with a string key, you could define it like this:

map<string,Project>projects=3;

Maps Features

  • Map fields cannot berepeated.
  • Wire format ordering and map iteration ordering of map values is undefined,so you cannot rely on your map items being in a particular order.
  • When generating text format for a.proto, maps are sorted by key. Numerickeys are sorted numerically.
  • When parsing from the wire or when merging, if there are duplicate map keysthe last key seen is used. When parsing a map from text format, parsing mayfail if there are duplicate keys.
  • If you provide a key but no value for a map field, the behavior when thefield is serialized is language-dependent. In C++, Java, Kotlin, and Pythonthe default value for the type is serialized, while in other languagesnothing is serialized.
  • No symbolFooEntry can exist in the same scope as a mapfoo, becauseFooEntry is already used by the implementation of the map.

The generated map API is currently available for all supported languages. Youcan find out more about the map API for your chosen language in the relevantAPI reference.

Backwards Compatibility

The map syntax is equivalent to the following on the wire, so protocol buffersimplementations that do not support maps can still handle your data:

messageMapFieldEntry{key_typekey=1;value_typevalue=2;}repeatedMapFieldEntrymap_field=N;

Any protocol buffers implementation that supports maps must both produce andaccept data that can be accepted by the earlier definition.

Packages

You can add an optionalpackage specifier to a.proto file to prevent nameclashes between protocol message types.

packagefoo.bar;messageOpen{...}

You can then use the package specifier when defining fields of your messagetype:

messageFoo{...foo.bar.Openopen=1;...}

The way a package specifier affects the generated code depends on your chosenlanguage:

  • InC++ the generated classes are wrapped inside a C++ namespace. Forexample,Open would be in the namespacefoo::bar.
  • InJava andKotlin, the package is used as the Java package, unlessyou explicitly provide anoption java_package in your.proto file.
  • InPython, thepackage directive is ignored, since Python modules areorganized according to their location in the file system.
  • InGo, thepackage directive is ignored, and the generated.pb.gofile is in the package named after the correspondinggo_proto_libraryBazel rule. For open source projects, youmust provide either ago_package option or set the Bazel-M flag.
  • InRuby, the generated classes are wrapped inside nested Rubynamespaces, converted to the required Ruby capitalization style (firstletter capitalized; if the first character is not a letter,PB_ isprepended). For example,Open would be in the namespaceFoo::Bar.
  • InPHP the package is used as the namespace after converting toPascalCase, unless you explicitly provide anoption php_namespace in your.proto file. For example,Open would be in the namespaceFoo\Bar.
  • InC# the package is used as the namespace after converting toPascalCase, unless you explicitly provide anoption csharp_namespace inyour.proto file. For example,Open would be in the namespaceFoo.Bar.

Note that even when thepackage directive does not directly affect thegenerated code, for example in Python, it is still strongly recommended tospecify the package for the.proto file, as otherwise it may lead to namingconflicts in descriptors and make the proto not portable for other languages.

Packages and Name Resolution

Type name resolution in the protocol buffer language works like C++: first theinnermost scope is searched, then the next-innermost, and so on, with eachpackage considered to be “inner” to its parent package. A leading ‘.’ (forexample,.foo.bar.Baz) means to start from the outermost scope instead.

The protocol buffer compiler resolves all type names by parsing the imported.proto files. The code generator for each language knows how to refer to eachtype in that language, even if it has different scoping rules.

Defining Services

If you want to use your message types with an RPC (Remote Procedure Call)system, you can define an RPC service interface in a.proto file and theprotocol buffer compiler will generate service interface code and stubs in yourchosen language. So, for example, if you want to define an RPC service with amethod that takes yourSearchRequest and returns aSearchResponse, you candefine it in your.proto file as follows:

serviceSearchService{rpcSearch(SearchRequest)returns(SearchResponse);}

The most straightforward RPC system to use with protocol buffers isgRPC: a language- and platform-neutral open source RPC systemdeveloped at Google. gRPC works particularly well with protocol buffers and letsyou generate the relevant RPC code directly from your.proto files using aspecial protocol buffer compiler plugin.

If you don’t want to use gRPC, it’s also possible to use protocol buffers withyour own RPC implementation. You can find out more about this in theProto2 Language Guide.

There are also a number of ongoing third-party projects to develop RPCimplementations for Protocol Buffers. For a list of links to projects we knowabout, see thethird-party add-ons wiki page.

JSON Mapping

The standard protobuf binary wire format is the preferred serialization formatfor communication between two systems that use protobufs. For communicating withsystems that use JSON rather than protobuf wire format, Protobuf supports acanonical encoding inJSON.

Options

Individual declarations in a.proto file can be annotated with a number ofoptions. Options do not change the overall meaning of a declaration, but mayaffect the way it is handled in a particular context. The complete list ofavailable options is defined in/google/protobuf/descriptor.proto.

Some options are file-level options, meaning they should be written at thetop-level scope, not inside any message, enum, or service definition. Someoptions are message-level options, meaning they should be written inside messagedefinitions. Some options are field-level options, meaning they should bewritten inside field definitions. Options can also be written on enum types,enum values, oneof fields, service types, and service methods; however, nouseful options currently exist for any of these.

Here are a few of the most commonly used options:

  • java_package (file option): The package you want to use for your generatedJava/Kotlin classes. If no explicitjava_package option is given in the.proto file, then by default the proto package (specified using the“package” keyword in the.proto file) will be used. However, protopackages generally do not make good Java packages since proto packages arenot expected to start with reverse domain names. If not generating Java orKotlin code, this option has no effect.

    optionjava_package="com.example.foo";
  • java_outer_classname (file option): The class name (and hence the filename) for the wrapper Java class you want to generate. If no explicitjava_outer_classname is specified in the.proto file, the class namewill be constructed by converting the.proto file name to camel-case (sofoo_bar.proto becomesFooBar.java). If thejava_multiple_files optionis disabled, then all other classes/enums/etc. generated for the.protofile will be generatedwithin this outer wrapper Java class as nestedclasses/enums/etc. If not generating Java code, this option has no effect.

    optionjava_outer_classname="Ponycopter";
  • java_multiple_files (file option): If false, only a single.java filewill be generated for this.proto file, and all the Javaclasses/enums/etc. generated for the top-level messages, services, andenumerations will be nested inside of an outer class (seejava_outer_classname). If true, separate.java files will be generatedfor each of the Java classes/enums/etc. generated for the top-levelmessages, services, and enumerations, and the wrapper Java class generatedfor this.proto file won’t contain any nested classes/enums/etc. This is aBoolean option which defaults tofalse. If not generating Java code, thisoption has no effect.

    optionjava_multiple_files=true;
  • optimize_for (file option): Can be set toSPEED,CODE_SIZE, orLITE_RUNTIME. This affects the C++ and Java code generators (and possiblythird-party generators) in the following ways:

    • SPEED (default): The protocol buffer compiler will generate code forserializing, parsing, and performing other common operations on yourmessage types. This code is highly optimized.
    • CODE_SIZE: The protocol buffer compiler will generate minimal classesand will rely on shared, reflection-based code to implementserialization, parsing, and various other operations. The generated codewill thus be much smaller than withSPEED, but operations will beslower. Classes will still implement exactly the same public API as theydo inSPEED mode. This mode is most useful in apps that contain a verylarge number of.proto files and do not need all of them to beblindingly fast.
    • LITE_RUNTIME: The protocol buffer compiler will generate classes thatdepend only on the “lite” runtime library (libprotobuf-lite instead oflibprotobuf). The lite runtime is much smaller than the full library(around an order of magnitude smaller) but omits certain features likedescriptors and reflection. This is particularly useful for apps runningon constrained platforms like mobile phones. The compiler will stillgenerate fast implementations of all methods as it does inSPEED mode.Generated classes will only implement theMessageLite interface ineach language, which provides only a subset of the methods of the fullMessage interface.
    optionoptimize_for=CODE_SIZE;
  • cc_generic_services,java_generic_services,py_generic_services (fileoptions):Generic services are deprecated. Whether or not the protocolbuffer compiler should generate abstract service code based onservices definitions in C++, Java, and Python, respectively.For legacy reasons, these default totrue. However, as of version 2.3.0(January 2010), it is considered preferable for RPC implementations toprovidecode generator pluginsto generate code more specific to each system, rather than rely on the“abstract” services.

    // This file relies on plugins to generate service code.optioncc_generic_services=false;optionjava_generic_services=false;optionpy_generic_services=false;
  • cc_enable_arenas (file option): Enablesarena allocation for C++generated code.

  • objc_class_prefix (file option): Sets the Objective-C class prefix whichis prepended to all Objective-C generated classes and enums from this.proto. There is no default. You should use prefixes that are between 3-5uppercase characters asrecommended by Apple.Note that all 2 letter prefixes are reserved by Apple.

  • packed (field option): Defaults totrue on a repeated field of a basicnumeric type, causing a more compactencoding to beused. To use unpacked wireformat, it can be set tofalse. This providescompatibility with parsers prior to version 2.3.0 (rarely needed) as shownin the following example:

    repeatedint32samples=4[packed=false];
  • deprecated (field option): If set totrue, indicates that the field isdeprecated and should not be used by new code. In most languages this has noactual effect. In Java, this becomes a@Deprecated annotation. For C++,clang-tidy will generate warnings whenever deprecated fields are used. Inthe future, other language-specific code generators may generate deprecationannotations on the field’s accessors, which will in turn cause a warning tobe emitted when compiling code which attempts to use the field. If the fieldis not used by anyone and you want to prevent new users from using it,consider replacing the field declaration with areservedstatement.

    int32old_field=6[deprecated=true];

Enum Value Options

Enum value options are supported. You can use thedeprecated option toindicate that a value shouldn’t be used anymore. You can also create customoptions using extensions.

The following example shows the syntax for adding these options:

import"google/protobuf/descriptor.proto";extendgoogle.protobuf.EnumValueOptions{optionalstringstring_name=123456789;}enumData{DATA_UNSPECIFIED=0;DATA_SEARCH=1[deprecated=true];DATA_DISPLAY=2[(string_name)="display_value"];}

The C++ code to read thestring_name option might look something like this:

constabsl::string_viewfoo=proto2::GetEnumDescriptor<Data>()->FindValueByName("DATA_DISPLAY")->options().GetExtension(string_name);

SeeCustom Options to see how to apply custom options to enumvalues and to fields.

Custom Options

Protocol Buffers also allows you to define and use your own options. Note thatthis is anadvanced feature which most people don’t need. If you do thinkyou need to create your own options, see theProto2 Language Guidefor details. Note that creating custom options usesextensions,which are permitted only for custom options in proto3.

Option Retention

Options have a notion ofretention, which controls whether an option isretained in the generated code. Options haveruntime retention by default,meaning that they are retained in the generated code and are thus visible atruntime in the generated descriptor pool. However, you can setretention = RETENTION_SOURCE to specify that an option (or field within an option) must notbe retained at runtime. This is calledsource retention.

Option retention is an advanced feature that most users should not need to worryabout, but it can be useful if you would like to use certain options withoutpaying the code size cost of retaining them in your binaries. Options withsource retention are still visible toprotoc andprotoc plugins, so codegenerators can use them to customize their behavior.

Retention can be set directly on an option, like this:

extendgoogle.protobuf.FileOptions{optionalint32source_retention_option=1234[retention=RETENTION_SOURCE];}

It can also be set on a plain field, in which case it takes effect only whenthat field appears inside an option:

messageOptionsMessage{int32source_retention_field=1[retention=RETENTION_SOURCE];}

You can setretention = RETENTION_RUNTIME if you like, but this has no effectsince it is the default behavior. When a message field is markedRETENTION_SOURCE, its entire contents are dropped; fields inside it cannotoverride that by trying to setRETENTION_RUNTIME.

Note

Asof Protocol Buffers 22.0, support for option retention is still in progress andonly C++ and Java are supported. Go has support starting from 1.29.0. Pythonsupport is complete but has not made it into a release yet.

Option Targets

Fields have atargets option which controls the types of entities that thefield may apply to when used as an option. For example, if a field hastargets = TARGET_TYPE_MESSAGE then that field cannot be set in a custom optionon an enum (or any other non-message entity). Protoc enforces this and willraise an error if there is a violation of the target constraints.

At first glance, this feature may seem unnecessary given that every customoption is an extension of the options message for a specific entity, whichalready constrains the option to that one entity. However, option targets areuseful in the case where you have a shared options message applied to multipleentity types and you want to control the usage of individual fields in thatmessage. For example:

messageMyOptions{stringfile_only_option=1[targets=TARGET_TYPE_FILE];int32message_and_enum_option=2[targets=TARGET_TYPE_MESSAGE,targets=TARGET_TYPE_ENUM];}extendgoogle.protobuf.FileOptions{optionalMyOptionsfile_options=50000;}extendgoogle.protobuf.MessageOptions{optionalMyOptionsmessage_options=50000;}extendgoogle.protobuf.EnumOptions{optionalMyOptionsenum_options=50000;}// OK: this field is allowed on file optionsoption(file_options).file_only_option="abc";messageMyMessage{// OK: this field is allowed on both message and enum optionsoption(message_options).message_and_enum_option=42;}enumMyEnum{MY_ENUM_UNSPECIFIED=0;// Error: file_only_option cannot be set on an enum.option(enum_options).file_only_option="xyz";}

Generating Your Classes

To generate the Java, Kotlin, Python, C++, Go, Ruby, Objective-C, or C# codethat you need to work with the message types defined in a.proto file, youneed to run the protocol buffer compilerprotoc on the.proto file. If youhaven’t installed the compiler,download the package and follow theinstructions in the README. For Go, you also need to install a special codegenerator plugin for the compiler; you can find this and installationinstructions in thegolang/protobufrepository on GitHub.

The Protocol Compiler is invoked as follows:

protoc --proto_path=IMPORT_PATH --cpp_out=DST_DIR --java_out=DST_DIR --python_out=DST_DIR --go_out=DST_DIR --ruby_out=DST_DIR --objc_out=DST_DIR --csharp_out=DST_DIR path/to/file.proto
  • IMPORT_PATH specifies a directory in which to look for.proto files whenresolvingimport directives. If omitted, the current directory is used.Multiple import directories can be specified by passing the--proto_pathoption multiple times.-I=_IMPORT_PATH_ can be used as a short form of--proto_path.

Note: File paths relative to theirproto_path must be globally unique in agiven binary. For example, if you haveproto/lib1/data.proto andproto/lib2/data.proto, those two files cannot be used together with-I=proto/lib1 -I=proto/lib2 because it would be ambiguous which fileimport "data.proto" will mean. Instead-Iproto/ should be used and the global nameswill belib1/data.proto andlib2/data.proto.

If you are publishing a library and other users may use your messages directly,you should include a unique library name in the path that they are expected tobe used under to avoid file name collisions. If you have multiple directories inone project, it is best practice to prefer setting one-I to a top leveldirectory of the project.

  • You can provide one or moreoutput directives:

    As an extra convenience, if theDST_DIR ends in.zip or.jar, thecompiler will write the output to a single ZIP-format archive file with thegiven name..jar outputs will also be given a manifest file as required bythe Java JAR specification. Note that if the output archive already exists,it will be overwritten.

  • You must provide one or more.proto files as input. Multiple.protofiles can be specified at once. Although the files are named relative to thecurrent directory, each file must reside in one of theIMPORT_PATHs sothat the compiler can determine its canonical name.

File location

Prefer not to put.proto files in the samedirectory as other language sources. Considercreating a subpackageproto for.proto files, under the root package foryour project.

Location Should be Language-agnostic

When working with Java code, it’s handy to put related.proto files in thesame directory as the Java source. However, if any non-Java code ever uses thesame protos, the path prefix will no longer make sense. So ingeneral, put the protos in a related language-agnostic directory such as//myteam/mypackage.

The exception to this rule is when it’s clear that the protos will be used onlyin a Java context, such as for testing.

Supported Platforms

For information about: