Export data as Protobuf columns
This document describes how you can export BigQuery data as ProtocolBuffers (Protobuf) columns by using BigQuery user-definedfunctions (UDFs).
When to use Protobuf columns
BigQuery offers a number of built-in functions to format selecteddata. One option is to merge multiple column values into a single Protobufvalue, which has the following benefits:
- Object type safety.
- Improved compression, data transfer time, and cost as compared with JSON.
- Flexibility, as most programming languages have libraries to handle Protobuf.
- Less overhead when reading from multiple columns and building a single object.
While other column types can also provide type safety, using Protobuf columnsprovides a fully typed object, which can reduce the amount of work that needs tobe done on the application layer or on another part of the pipeline.
However, there are limitations to exporting BigQuery data asProtobuf columns:
- Protobuf columns are not well indexed or filtered. Searching by the content ofthe Protobuf columns can be less effective.
- Sorting data in Protobuf format can be difficult.
If these limitations apply to the export workflow, you might consider othermethods of exporting BigQuery data:
- Usescheduled queries with
EXPORT DATAstatementsto sort the exported BigQuery data by date or time, and toschedule exports on a recurring basis. BigQuery supportsexporting data into Avro, CSV, JSON, and Parquet formats. - UseDataflow to exportBigQuery data in either the Avro or CSV file format.
Required roles
To get the permissions that you need to export BigQuery data as Protobuf columns, ask your administrator to grant you the following IAM roles on your project:
- Create a user-defined function:BigQuery Data Editor (
roles/bigquery.dataEditor) - Export data from a BigQuery table:BigQuery Data Viewer (
roles/bigquery.dataViewer) - Read and upload files to Cloud Storage:Storage Object Creator (
roles/storage.objectCreator)
For more information about granting roles, seeManage access to projects, folders, and organizations.
You might also be able to get the required permissions throughcustom roles or otherpredefined roles.
Create a UDF
Create a UDF that converts a BigQuerySTRUCTdata type into a Protobuf column:
In a command line, clone the
bigquery-utils.gitrepository:git clone https://github.com/GoogleCloudPlatform/bigquery-utils.gitNavigate to the Protobuf export folder:
cdbigquery-utils/tools/protobuf_exportUse the
cpcommandor your operating system's file browser to copy your proto file to the./protoschild folder.There is a sample proto file named
dummy.protoalready in the./protosfolder.Install the necessary packages from the GitHub repository:
npm installBundle the package by using webpack:
npx webpack --config webpack.config.js --stats-error-detailsLocate the
pbwrapper.jsfile in the./distchild folder, and thenupload the file to a Cloud Storage bucket.Go to theBigQuery page.
Using the query editor, create a UDF named
toMyProtoMessagethat builds aProtobuf column from existing BigQuery table columns:CREATEFUNCTIONDATASET_ID.toMyProtoMessage(inputSTRUCT<INPUT_FIELDS>)RETURNSBYTESLANGUAGEjsOPTIONS(library=["gs://BUCKET_NAME/pbwrapper.js"])ASr"""let message = pbwrapper.setup("PROTO_PACKAGE.PROTO_MESSAGE")return pbwrapper.parse(message, input)""";Replace the following:
DATASET_ID: the ID of the dataset to contain the UDF.INPUT_FIELDS: the fields used in theproto message typefor the proto file, in theformatfield_name_1 field_type_1 [, field_name_2 field_type_2, ...].You must translate any message type fields that use underscores to usecamel case instead.For example, if the message type looks like the following, thenthe input fields value must be
itemId int64, itemDescription string:message ThisMessage { int64 item_id = 1; string item_description = 2;}BUCKET_NAME: the name of the Cloud Storage bucketthat contains thepbwrapper.jsfile.PROTO_PACKAGE: thepackage forthe proto file.PROTO_MESSAGE: the message type for the proto file.
For example, if you use the provided
dummy.protofile, theCREATE FUNCTIONstatement looks as follows:CREATE OR REPLACE FUNCTION mydataset.toMyProtoMessage(input STRUCT<dummyField STRING>) RETURNS BYTES LANGUAGE js OPTIONS ( library=["gs://mybucket/pbwrapper.js"]) AS r"""let message = pbwrapper.setup("dummypackage.DummyMessage")return pbwrapper.parse(message, input) """;
Format columns as Protobuf values
Run thetoMyProtoMessage UDF to format BigQuerytable columns as Protobuf values:
SELECTUDF_DATASET_ID.toMyProtoMessage(STRUCT(INPUT_COLUMNS))ASprotoResultFROM`PROJECT_ID.DATASET_ID.TABLE_NAME`LIMIT100;Replace the following:
UDF_DATASET_ID: the ID of the dataset that contains the UDF.INPUT_COLUMNS: the names of the columns to format as aProtobuf value, in the formatcolumn_name_1 [, column_name_2, ...].Columns can be of any supportedscalar value typeor non-scalar type, includingARRAYandSTRUCT. Input columns must matchthe type and number of the proto message type fields.PROJECT_ID: the ID of the project that contains the table. Youcan skip identifying the project if the dataset is in your current project.DATASET_ID: the ID of the dataset that contains the table.TABLE_NAME: the name of the table that contains the columnsto format.
For example, if you use atoMyProtoMessage UDF based ondummy.proto,the followingSELECT statement works:
SELECTmydataset.toMyProtoMessage(STRUCT(word))ASprotoResultFROM`bigquery-public-data.samples.shakespeare`LIMIT100;Work with Protobuf values
With the BigQuery data exported in the Protobuf format, you cannow work with the data as a fully typed object or struct.
The following code samples provide several examples of ways that you can processor work with the exported data:
Go
// package Main queries Google BigQuery.packagemainimport("context""fmt""io""log""os""cloud.google.com/go/bigquery""google.golang.org/api/iterator""google.golang.org/Protobuf/proto"pb"path/to/proto/file_proto")const(projectID="your-project-id")// Row contains returned row data from bigquery.typeRowstruct{RowKeystring`bigquery:"RowKey"`Proto[]byte`bigquery:"ProtoResult"`}funcmain(){ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{log.Fatalf("bigquery.NewClient: %v",err)}deferclient.Close()rows,err:=query(ctx,client)iferr!=nil{log.Fatal(err)}iferr:=printResults(os.Stdout,rows);err!=nil{log.Fatal(err)}}// query returns a row iterator suitable for reading query results.funcquery(ctxcontext.Context,client*bigquery.Client)(*bigquery.RowIterator,error){query:=client.Query(`SELECT concat(word, ":", corpus) as RowKey, <dataset-id>.toMyProtoMessage( STRUCT( word, CAST(word_count AS BIGNUMERIC) ) ) AS ProtoResultFROM `+"` bigquery - public - data.samples.shakespeare `"+`LIMIT 100;`)returnquery.Read(ctx)}// printResults prints results from a query.funcprintResults(wio.Writer,iter*bigquery.RowIterator)error{for{varrowRowerr:=iter.Next(&row)iferr==iterator.Done{returnnil}iferr!=nil{returnfmt.Errorf("error iterating through results: %w",err)}message:=&pb.TestMessage{}iferr=proto.Unmarshal(row.Proto,message);err!=nil{returnerr}fmt.Fprintf(w,"rowKey: %s, message: %v\n",row.RowKey,message)}}Java
packageproto;importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.FieldValueList;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobId;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.QueryJobConfiguration;importcom.google.cloud.bigquery.TableResult;importpath.to.proto.TestMessage;importjava.util.UUID;/** Queries Google BigQuery */publicfinalclassMain{publicstaticvoidmain(String[]args)throwsException{StringprojectId="your-project-id";BigQuerybigquery=BigQueryOptions.newBuilder().setProjectId(projectId).build().getService();QueryJobConfigurationqueryConfig=QueryJobConfiguration.newBuilder(" SELECT "+"concat(word , \":\",corpus) as RowKey,"+"<dataset-id>.toMyProtoMessage(STRUCT(word, "+"CAST(word_count AS BIGNUMERIC))) AS ProtoResult "+"FROM "+"`bigquery-public-data.samples.shakespeare` "+"ORDER BY word_count DESC "+"LIMIT 20").setUseLegacySql(false).build();// Create a job ID so that we can safely retry.JobIdjobId=JobId.of(UUID.randomUUID().toString());JobqueryJob=bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());// Wait for the query to complete.queryJob=queryJob.waitFor();// Check for errorsif(queryJob==null){thrownewRuntimeException("Job no longer exists");}elseif(queryJob.getStatus().getError()!=null){// You can also look at queryJob.getStatus().getExecutionErrors() for all// errors, not just the latest one.thrownewRuntimeException(queryJob.getStatus().getError().toString());}// Get the results.TableResultresult=queryJob.getQueryResults();// Print all pages of the results.for(FieldValueListrow:result.iterateAll()){Stringkey=row.get("RowKey").getStringValue();byte[]message=row.get("ProtoResult").getBytesValue();TestMessagetestMessage=TestMessage.parseFrom(message);System.out.printf("rowKey: %s, message: %s\n",key,testMessage);}}}Python
"""Queries Google BigQuery."""fromgoogle.cloudimportbigqueryfrompath.to.protoimportawesome_pb2defmain():project_id="your-project-id"client=bigquery.Client(project=project_id)query_job=client.query(query=""" SELECTconcat(word , ":",corpus) as RowKey,<dataset-id>.toMyProtoMessage( STRUCT( word, CAST(word_count AS BIGNUMERIC) ) ) AS ProtoResultFROM `bigquery-public-data.samples.shakespeare`ORDER BY word_count DESCLIMIT 20 """)rows=query_job.result()forrowinrows:message=awesome_pb2.TestMessage()message.ParseFromString(row.get("ProtoResult"))print("rowKey:{}, message:{}".format(row.get("RowKey"),message))Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.