Overview

Protocol Buffers are a language-neutral, platform-neutral extensible mechanism for serializing structured data.

It’s like JSON, except it’ssmaller and faster, and it generates native language bindings. You define howyou want your data to be structured once, then you can use special generatedsource code to easily write and read your structured data to and from a varietyof data streams and using a variety of languages.

Protocol buffers are a combination of the definition language (created in.proto files), the code that the proto compiler generates to interface withdata, language-specific runtime libraries, the serialization format for datathat is written to a file (or sent across a network connection), and theserialized data.

What Problems do Protocol Buffers Solve?

Protocol buffers provide a serialization format for packets of typed, structureddata that are up to a few megabytes in size. The format is suitable for bothephemeral network traffic and long-term data storage. Protocol buffers can beextended with new information without invalidating existing data or requiringcode to be updated.

Protocol buffers are the most commonly-used data format at Google. They are usedextensively in inter-server communications as well as for archival storage ofdata on disk. Protocol buffermessages andservices are described byengineer-authored.proto files. The following shows an examplemessage:

edition="2023";messagePerson{stringname=1;int32id=2;stringemail=3;}

The proto compiler is invoked at build time on.proto files to generate codein various programming languages (covered inCross-language Compatibility later in this topic) to manipulatethe corresponding protocol buffer. Each generated class contains simpleaccessors for each field and methods to serialize and parse the whole structureto and from raw bytes. The following shows you an example that uses thosegenerated methods:

Personjohn=Person.newBuilder().setId(1234).setName("John Doe").setEmail("jdoe@example.com").build();output=newFileOutputStream(args[0]);john.writeTo(output);

Because protocol buffers are used extensively across all manner of services atGoogle and data within them may persist for some time, maintaining backwardscompatibility is crucial. Protocol buffers allow for the seamless support ofchanges, including the addition of new fields and the deletion of existingfields, to any protocol buffer without breaking existing services. For more onthis topic, seeUpdating Proto Definitions Without Updating Code, later inthis topic.

What are the Benefits of Using Protocol Buffers?

Protocol buffers are ideal for any situation in which you need to serializestructured, record-like, typed data in a language-neutral, platform-neutral,extensible manner. They are most often used for defining communicationsprotocols (together with gRPC) and for data storage.

Some of the advantages of using protocol buffers include:

  • Compact data storage
  • Fast parsing
  • Availability in many programming languages
  • Optimized functionality through automatically-generated classes

Cross-language Compatibility

The same messages can be read by code written in any supported programminglanguage. You can have a Java program on one platform capture data from onesoftware system, serialize it based on a.proto definition, and then extractspecific values from that serialized data in a separate Python applicationrunning on another platform.

The following languages are supported directly in the protocol buffers compiler,protoc:

The following languages are supported by Google, but the projects’ source coderesides in GitHub repositories. The protoc compiler uses plugins for theselanguages:

Additional languages are not directly supported by Google, but rather by otherGitHub projects. These languages are covered inThird-Party Add-ons for Protocol Buffers.

Cross-project Support

You can use protocol buffers across projects by definingmessage types in.proto files that reside outside of a specific project’s code base. If you’redefiningmessage types or enums that you anticipate will be widely usedoutside of your immediate team, you can put them in their own file with nodependencies.

A couple of examples of proto definitions widely-used within Google aretimestamp.protoandstatus.proto.

Updating Proto Definitions Without Updating Code

It’s standard for software products to be backward compatible, but it is lesscommon for them to be forward compatible. As long as you follow somesimple practiceswhen updating.proto definitions, old code will read new messages withoutissues, ignoring any newly added fields. To the old code, fields that weredeleted will have their default value, and deleted repeated fields will beempty. For information on what “repeated” fields are, seeProtocol Buffers Definition Syntax later in this topic.

New code will also transparently read old messages. New fields will not bepresent in old messages; in these cases protocol buffers provide a reasonabledefault value.

When are Protocol Buffers not a Good Fit?

Protocol buffers do not fit all data. In particular:

  • Protocol buffers tend to assume that entire messages can be loaded intomemory at once and are not larger than an object graph. For data thatexceeds a few megabytes, consider a different solution; when working withlarger data, you may effectively end up with several copies of the data dueto serialized copies, which can cause surprising spikes in memory usage.
  • When protocol buffers are serialized, the same data can have many differentbinary serializations. You cannot compare two messages for equality withoutfully parsing them.
  • Messages are not compressed. While messages can be zipped or gzipped likeany other file, special-purpose compression algorithms like the ones used byJPEG and PNG will produce much smaller files for data of the appropriatetype.
  • Protocol buffer messages are less than maximally efficient in both size andspeed for many scientific and engineering uses that involve large,multi-dimensional arrays of floating point numbers. For these applications,FITS and similar formats have lessoverhead.
  • Protocol buffers are not well supported in non-object-oriented languagespopular in scientific computing, such as Fortran and IDL.
  • Protocol buffer messages don’t inherently self-describe their data, but theyhave a fully reflective schema that you can use to implementself-description. That is, you cannot fully interpret one without access toits corresponding.proto file.
  • Protocol buffers are not a formal standard of any organization. This makesthem unsuitable for use in environments with legal or other requirements tobuild on top of standards.

Who Uses Protocol Buffers?

Many projects use protocol buffers, including the following:

How do Protocol Buffers Work?

The following diagram shows how you use protocol buffers to work with your data.

Compilation workflow showing the creation of a proto file, generated code, and compiled classes
Figure 1. Protocol buffers workflow

The code generated by protocol buffers provides utility methods to retrieve datafrom files and streams, extract individual values from the data, check if dataexists, serialize data back to a file or stream, and other useful functions.

The following code samples show you an example of this flow in Java. As shownearlier, this is a.proto definition:

messagePerson{stringname=1;int32id=2;stringemail=3;}

Compiling this.proto file creates aBuilder class that you can use tocreate new instances, as in the following Java code:

Personjohn=Person.newBuilder().setId(1234).setName("John Doe").setEmail("jdoe@example.com").build();output=newFileOutputStream(args[0]);john.writeTo(output);

You can then deserialize data using the methods protocol buffers creates inother languages, like C++:

Personjohn;fstreaminput(argv[1],ios::in|ios::binary);john.ParseFromIstream(&input);intid=john.id();std::stringname=john.name();std::stringemail=john.email();

Protocol Buffers Definition Syntax

When defining.proto files, you can specify cardinality (singular orrepeated). In proto2 and proto3, you can also specify if the field is optional.In proto3, setting a field to optionalchanges it from implicit presence to explicit presence.

After setting the cardinality of a field, you specify the data type. Protocolbuffers support the usual primitive data types, such as integers, booleans, andfloats. For the full list, seeScalar Value Types.

A field can also be of:

  • Amessage type, so that you can nest parts of the definition, such as forrepeating sets of data.
  • Anenum type, so you can specify a set of values to choose from.
  • Aoneof type, which you can use when a message has many optional fieldsand at most one field will be set at the same time.
  • Amap type, to add key-value pairs to your definition.

Messages can allowextensions to define fields outside of the message,itself. For example, the protobuf library’s internal message schema allowsextensions for custom, usage-specific options.

For more information about the options available, see the language guide forproto2,proto3, oredition 2023.

After setting cardinality and data type, you choose a name for the field. Thereare some things to keep in mind when setting field names:

  • It can sometimes be difficult, or even impossible, to change field namesafter they’ve been used in production.
  • Field names cannot contain dashes. For more on field name syntax, seeMessage and Field Names.
  • Use pluralized names for repeated fields.

After assigning a name to the field, you assign a field number. Fieldnumbers cannot be repurposed or reused. If you delete a field, you shouldreserve its field number to prevent someone from accidentally reusing thenumber.

Additional Data Type Support

Protocol buffers support many scalar value types, including integers that useboth variable-length encoding and fixed sizes. You can also create your owncomposite data types by defining messages that are, themselves, data types thatyou can assign to a field. In addition to the simple and composite value types,severalcommon typesare published.

History

To read about the history of the protocol buffers project, seeHistory of Protocol Buffers.

Protocol Buffers Open Source Philosophy

Protocol buffers were open sourced in 2008 as a way to provide developersoutside of Google with the same benefits that we derive from them internally. Wesupport the open source community through regular updates to the language as wemake those changes to support our internal requirements. While we accept selectpull requests from external developers, we cannot always prioritize featurerequests and bug fixes that don’t conform to Google’s specific needs.

Developer Community

To be alerted to upcoming changes in Protocol Buffers and to connect withprotobuf developers and users,join the Google Group.

Additional Resources