Export data as Protobuf columns

This document describes how you can export BigQuery data as ProtocolBuffers (Protobuf) columns by using BigQuery user-definedfunctions (UDFs).

When to use Protobuf columns

BigQuery offers a number of built-in functions to format selecteddata. One option is to merge multiple column values into a single Protobufvalue, which has the following benefits:

Object type safety.
Improved compression, data transfer time, and cost as compared with JSON.
Flexibility, as most programming languages have libraries to handle Protobuf.
Less overhead when reading from multiple columns and building a single object.

While other column types can also provide type safety, using Protobuf columnsprovides a fully typed object, which can reduce the amount of work that needs tobe done on the application layer or on another part of the pipeline.

However, there are limitations to exporting BigQuery data asProtobuf columns:

Protobuf columns are not well indexed or filtered. Searching by the content ofthe Protobuf columns can be less effective.
Sorting data in Protobuf format can be difficult.

If these limitations apply to the export workflow, you might consider othermethods of exporting BigQuery data:

Usescheduled queries withEXPORT DATA statementsto sort the exported BigQuery data by date or time, and toschedule exports on a recurring basis. BigQuery supportsexporting data into Avro, CSV, JSON, and Parquet formats.
UseDataflow to exportBigQuery data in either the Avro or CSV file format.

Required roles

To get the permissions that you need to export BigQuery data as Protobuf columns, ask your administrator to grant you the following IAM roles on your project:

Create a user-defined function:BigQuery Data Editor (roles/bigquery.dataEditor)
Export data from a BigQuery table:BigQuery Data Viewer (roles/bigquery.dataViewer)
Read and upload files to Cloud Storage:Storage Object Creator (roles/storage.objectCreator)

For more information about granting roles, seeManage access to projects, folders, and organizations.

You might also be able to get the required permissions throughcustom roles or otherpredefined roles.

Create a UDF

Create a UDF that converts a BigQuerySTRUCTdata type into a Protobuf column:

In a command line, clone thebigquery-utils.git repository:

git clone https://github.com/GoogleCloudPlatform/bigquery-utils.git

Navigate to the Protobuf export folder:
```
cdbigquery-utils/tools/protobuf_export
```
Use thecp commandor your operating system's file browser to copy your proto file to the./protos child folder.
There is a sample proto file nameddummy.proto already in the./protos folder.
Install the necessary packages from the GitHub repository:
```
npm install
```

Bundle the package by using webpack:

npx webpack --config webpack.config.js --stats-error-details

Locate thepbwrapper.js file in the./dist child folder, and thenupload the file to a Cloud Storage bucket.
Go to theBigQuery page.
Go to BigQuery
Using the query editor, create a UDF namedtoMyProtoMessage that builds aProtobuf column from existing BigQuery table columns:
```
CREATEFUNCTIONDATASET_ID.toMyProtoMessage(inputSTRUCT<INPUT_FIELDS>)RETURNSBYTESLANGUAGEjsOPTIONS(library=["gs://BUCKET_NAME/pbwrapper.js"])ASr"""let message = pbwrapper.setup("PROTO_PACKAGE.PROTO_MESSAGE")return pbwrapper.parse(message, input)""";
```
Replace the following:
- DATASET_ID: the ID of the dataset to contain the UDF.
- INPUT_FIELDS: the fields used in theproto message typefor the proto file, in theformatfield_name_1 field_type_1 [, field_name_2 field_type_2, ...].
  You must translate any message type fields that use underscores to usecamel case instead.For example, if the message type looks like the following, thenthe input fields value must beitemId int64, itemDescription string:
```
message ThisMessage {  int64 item_id = 1;  string item_description = 2;}
```
- BUCKET_NAME: the name of the Cloud Storage bucketthat contains thepbwrapper.js file.
- PROTO_PACKAGE: thepackage forthe proto file.
- PROTO_MESSAGE: the message type for the proto file.
For example, if you use the provideddummy.proto file, theCREATE FUNCTION statement looks as follows:
```
CREATE OR REPLACE FUNCTION  mydataset.toMyProtoMessage(input STRUCT<dummyField STRING>)  RETURNS BYTES    LANGUAGE js OPTIONS ( library=["gs://mybucket/pbwrapper.js"]) AS r"""let message = pbwrapper.setup("dummypackage.DummyMessage")return pbwrapper.parse(message, input)  """;
```

Format columns as Protobuf values

Run thetoMyProtoMessage UDF to format BigQuerytable columns as Protobuf values:

SELECTUDF_DATASET_ID.toMyProtoMessage(STRUCT(INPUT_COLUMNS))ASprotoResultFROM`PROJECT_ID.DATASET_ID.TABLE_NAME`LIMIT100;

Replace the following:

UDF_DATASET_ID: the ID of the dataset that contains the UDF.
INPUT_COLUMNS: the names of the columns to format as aProtobuf value, in the formatcolumn_name_1 [, column_name_2, ...].Columns can be of any supportedscalar value typeor non-scalar type, includingARRAY andSTRUCT. Input columns must matchthe type and number of the proto message type fields.
PROJECT_ID: the ID of the project that contains the table. Youcan skip identifying the project if the dataset is in your current project.
DATASET_ID: the ID of the dataset that contains the table.
TABLE_NAME: the name of the table that contains the columnsto format.

For example, if you use atoMyProtoMessage UDF based ondummy.proto,the followingSELECT statement works:

SELECTmydataset.toMyProtoMessage(STRUCT(word))ASprotoResultFROM`bigquery-public-data.samples.shakespeare`LIMIT100;

Work with Protobuf values

With the BigQuery data exported in the Protobuf format, you cannow work with the data as a fully typed object or struct.

The following code samples provide several examples of ways that you can processor work with the exported data:

Go

// package Main queries Google BigQuery.packagemainimport("context""fmt""io""log""os""cloud.google.com/go/bigquery""google.golang.org/api/iterator""google.golang.org/Protobuf/proto"pb"path/to/proto/file_proto")const(projectID="your-project-id")// Row contains returned row data from bigquery.typeRowstruct{RowKeystring`bigquery:"RowKey"`Proto[]byte`bigquery:"ProtoResult"`}funcmain(){ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{log.Fatalf("bigquery.NewClient: %v",err)}deferclient.Close()rows,err:=query(ctx,client)iferr!=nil{log.Fatal(err)}iferr:=printResults(os.Stdout,rows);err!=nil{log.Fatal(err)}}// query returns a row iterator suitable for reading query results.funcquery(ctxcontext.Context,client*bigquery.Client)(*bigquery.RowIterator,error){query:=client.Query(`SELECT  concat(word, ":", corpus) as RowKey,  <dataset-id>.toMyProtoMessage(    STRUCT(      word,      CAST(word_count AS BIGNUMERIC)    )  ) AS ProtoResultFROM  `+"` bigquery - public - data.samples.shakespeare `"+`LIMIT  100;`)returnquery.Read(ctx)}// printResults prints results from a query.funcprintResults(wio.Writer,iter*bigquery.RowIterator)error{for{varrowRowerr:=iter.Next(&row)iferr==iterator.Done{returnnil}iferr!=nil{returnfmt.Errorf("error iterating through results: %w",err)}message:=&pb.TestMessage{}iferr=proto.Unmarshal(row.Proto,message);err!=nil{returnerr}fmt.Fprintf(w,"rowKey: %s, message: %v\n",row.RowKey,message)}}

Java

packageproto;importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.FieldValueList;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobId;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.QueryJobConfiguration;importcom.google.cloud.bigquery.TableResult;importpath.to.proto.TestMessage;importjava.util.UUID;/** Queries Google BigQuery */publicfinalclassMain{publicstaticvoidmain(String[]args)throwsException{StringprojectId="your-project-id";BigQuerybigquery=BigQueryOptions.newBuilder().setProjectId(projectId).build().getService();QueryJobConfigurationqueryConfig=QueryJobConfiguration.newBuilder(" SELECT "+"concat(word , \":\",corpus) as RowKey,"+"<dataset-id>.toMyProtoMessage(STRUCT(word, "+"CAST(word_count AS BIGNUMERIC))) AS ProtoResult "+"FROM "+"`bigquery-public-data.samples.shakespeare` "+"ORDER BY word_count DESC "+"LIMIT 20").setUseLegacySql(false).build();// Create a job ID so that we can safely retry.JobIdjobId=JobId.of(UUID.randomUUID().toString());JobqueryJob=bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());// Wait for the query to complete.queryJob=queryJob.waitFor();// Check for errorsif(queryJob==null){thrownewRuntimeException("Job no longer exists");}elseif(queryJob.getStatus().getError()!=null){// You can also look at queryJob.getStatus().getExecutionErrors() for all// errors, not just the latest one.thrownewRuntimeException(queryJob.getStatus().getError().toString());}// Get the results.TableResultresult=queryJob.getQueryResults();// Print all pages of the results.for(FieldValueListrow:result.iterateAll()){Stringkey=row.get("RowKey").getStringValue();byte[]message=row.get("ProtoResult").getBytesValue();TestMessagetestMessage=TestMessage.parseFrom(message);System.out.printf("rowKey: %s, message: %s\n",key,testMessage);}}}

Python

"""Queries Google BigQuery."""fromgoogle.cloudimportbigqueryfrompath.to.protoimportawesome_pb2defmain():project_id="your-project-id"client=bigquery.Client(project=project_id)query_job=client.query(query="""               SELECTconcat(word , ":",corpus) as RowKey,<dataset-id>.toMyProtoMessage(    STRUCT(      word,      CAST(word_count AS BIGNUMERIC)    )  ) AS ProtoResultFROM  `bigquery-public-data.samples.shakespeare`ORDER BY word_count DESCLIMIT 20    """)rows=query_job.result()forrowinrows:message=awesome_pb2.TestMessage()message.ParseFromString(row.get("ProtoResult"))print("rowKey:{}, message:{}".format(row.get("RowKey"),message))

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換