BigQuery

Use the BigQuery connector to perform insert, delete, update, and read operations on Google BigQuery data. You can also execute custom SQL queries against BigQuery data. You can use BigQuery connector to integrate data from multiple Google Cloud services or other third-party services, such as Cloud Storage or Amazon S3.

Before you begin

In your Google Cloud project, do the following tasks:

  • Ensure that network connectivity is set up. For information about network patterns, seeNetwork connectivity.
  • Grant theroles/connectors.admin IAM role to the user configuring the connector.
  • Grant theroles/bigquery.dataEditor IAM role to theservice account that you want to use for the connector. If you don't have a service account, you must create a service account. The connector and the service account must belong to the same project.
  • Enable the following services:
    • secretmanager.googleapis.com (Secret Manager API)
    • connectors.googleapis.com (Connectors API)

    To understand how to enable services, seeEnabling services. If these services or permissions have not been enabled for your project previously, you are prompted to enable them when you configure the connector.

Create a BigQuery connection

A connection is specific to a data source. Itmeans that if you have many data sources, you must create a separate connectionfor each data source. To create a connection, do the following:

  1. In theCloud console, go to theIntegration Connectors > Connections page and then select or create a Google Cloud project.

    Go to the Connections page

  2. Click+ CREATE NEW to open theCreate Connection page.
  3. In theLocation section, select a location from theRegion list and then clickNEXT.

    For the list of all the supported regions, seeLocations.

  4. In theConnection Details section, do the following:
    1. SelectBigQuery from theConnector list.
    2. Select a connector version from theConnector version list.
    3. In theConnection Name field, enter a name for the connection instance. The connection name can contain lower-case letters, numbers, or hyphens. The name must begin with a letter and end with a letter or number and the name must not exceed 49 characters.
    4. Optionally, enableCloud logging, and then select a log level. By default, the log level is set toError.
    5. Service Account: Select a service account that has therequired roles.
    6. (Optional) Configure theConnection node settings.
      • Minimum number of nodes: Enter the minimum number of connection nodes.
      • Maximum number of nodes: Enter the maximum number of connection nodes.

      A node is a unit (or replica) of a connection that processes transactions. More nodes are required to process more transactions for a connection and conversely, fewer nodes are required to process fewer transactions. To understand how the nodes affect your connector pricing, see Pricing for connection nodes. If you don't enter any values, by default the minimum nodes are set to 2 (for better availability) and the maximum nodes are set to 50.

      Note: You can customize the connection node values only if you are a Pay-as-you-go customer.
    7. Project ID: The ID of the Google Cloud project where the data resides.
    8. Dataset ID: The ID of the BigQuery Dataset.
    9. To support BigQuery Array data type, selectSupport Native Data Types. The following array types are supported: Varchar, Int64, Float64, Long, Double, Bool, and Timestamp. Nested arrays are not supported.
    10. (Optional) To configure a proxy server for the connection, selectUse proxy and enter the proxy details.
    11. ClickNEXT.
  5. In theAuthentication section, enter the authentication details.
    1. Select whether to authenticate with OAuth 2.0 - Authorization code or to proceed without authentication.

      To understand how to configure authentication, seeConfigure authentication.

    2. ClickNEXT.
  6. Review your connection and authentication details, and then clickCreate.

Configure authentication

Enter the details based on the authentication you want to use.

  • No Authentication: Select this option if you don't require authentication.
  • OAuth 2.0 - Authorization code: Select this option to authenticate using a web-based user login flow. Specify the following details:
    • Client ID:The client ID required to connect to your backend Google service.
    • Scopes:A comma-separated list of desired scopes. To view all the supported OAuth 2.0 scopes for your required Google service, see the relevant section in theOAuth 2.0 Scopes for Google APIs page.
    • Client secret:Select theSecret Manager secret. You must have created the Secret Manager secret prior configuring this authorization.
    • Secret version:Secret Manager secret version for client secret.

    For theAuthorization code authentication type, after creating the connection, you mustauthorize the connection.

Authorize the connection

If you use OAuth 2.0 - authorization code to authenticate the connection, complete the following tasks after you create the connection.

  1. In theConnections page, locate the newly created connection.

    Notice that theStatus for the new connector will beAuthorization required.

  2. ClickAuthorization required.

    This shows theEdit authorization pane.

  3. Copy theRedirect URI value to your external application.
  4. Verify the authorization details.
  5. ClickAuthorize.

    If the authorization is successful, the connection status will be set toActive in theConnections page.

Re-authorization for authorization code

If you are usingAuthorization code authentication type and have made any configuration changes in BigQuery, you must re-authorize your BigQuery connection. To re-authorize a connection, perform the following steps:

  1. Click on the required connection in theConnections page.

    This opens the connection details page.

  2. ClickEdit to edit the connection details.
  3. Verify theOAuth 2.0 - Authorization code details in theAuthentication section.

    If required, make the necessary changes.

  4. ClickSave. This takes you to the connection details page.
  5. ClickEdit authorization in theAuthentication section. This shows theAuthorize pane.
  6. ClickAuthorize.

    If the authorization is successful, the connection status will be set toActive in theConnections page.

Use the BigQuery connection in an integration

After you create the connection, it becomes available in both Apigee Integration and Application Integration. You can use the connection in an integration through the Connectors task.

  • To understand how to create and use the Connectors task in Apigee Integration, seeConnectors task.
  • To understand how to create and use the Connectors task in Application Integration, seeConnectors task.

Actions

This section describes theactions available in the BigQuery connector.

The results of all the entity operations and actions will be available as a JSON response in theConnectors task'sconnectorOutputPayload response parameter after you run your integration.

Note:All entities and actions have a schema associated with them. For example, an action schema has the parameter details such as parameter names, and its corresponding data type. The schema (metadata) for entities and actions is fetched by the connection at runtime from your backend. If there are any updates to the schema, such updates aren't automatically reflected in your existing connections. You must manually refresh the schema. To refresh the schema for a connection, open theConnection details page of the connection, and then clickRefresh connection schema.

CancelJob action

This action lets you cancel a running BigQuery job.

The following table describes the input parameters of theCancelJob action.

Parameter nameData typeDescription
JobIdStringThe ID of the job you want to cancel. This is a mandatory field.
RegionStringThe region where the job is currently executing. This is not required if the job is a US or EU region.

GetJob action

This action lets you retrieve the configuration information and execution state of an existing job.

The following table describes the input parameters of theGetJob action.

Parameter nameData typeDescription
JobIdStringThe ID of the job for which you want to retrieve the configuration. This is a mandatory field.
RegionStringThe region where the job is currently executing. This is not required if the job is a US or EU region.

InsertJob action

This action lets you insert a BigQuery job, which can then be selected later to retrieve the query results.

The following table describes the input parameters of theInsertJob action.

Parameter nameData typeDescription
QueryStringThe query to submit to BigQuery. This is a mandatory field.
IsDMLStringShould be set totrue if the query is a DML statement orfalse otherwise. The default value isfalse.
DestinationTableStringThe destination table for the query, in theDestProjectId:DestDatasetId.DestTable format.
WriteDispositionStringSpecifies how to write data to the destination table; such as truncate existing results, append existing results, or write only when the table is empty. Following are the supported values:
  • WRITE_TRUNCATE
  • WRITE_APPEND
  • WRITE_EMPTY
The default value is WRITE_TRUNCATE.
DryRunStringSpecifies if the job's execution is a dry run.
MaximumBytesBilledStringSpecifies the maximum bytes that can be processed by the job. BigQuery cancels the job if the job attempts to process more bytes than the specified value.
RegionStringSpecifies the region where the job should execute.

InsertLoadJob action

This action lets you insert a BigQuery load job, which adds data from Google Cloud Storage into an existing table.

The following table describes the input parameters of theInsertLoadJob action.

Parameter nameData typeDescription
SourceURIsStringA space-separated list of Google Cloud Storage URIs.
SourceFormatStringThe source format of the files. Following are the supported values:
  • AVRO
  • NEWLINE_DELIMITED_JSON
  • DATASTORE_BACKUP
  • PARQUET
  • ORC
  • CSV
DestinationTableStringThe destination table for the query, in theDestProjectId.DestDatasetId.DestTable format.
DestinationTablePropertiesStringA JSON object specifying the table friendly name, description, and list of labels.
DestinationTableSchemaStringA JSON list specifying the table schema, in the"DestinationTableSchema": "\"fields\":[{\"name\":\"id\",\"type\":\"INTEGER\"},{\"name\":\"name\",\"type\":\"STRING\"}]" format.
DestinationEncryptionConfigurationStringA JSON object specifying the KMS encryption settings for the table.
SchemaUpdateOptionsStringA JSON list specifying the options to apply when updating the destination table schema.
TimePartitioningStringA JSON object specifying the time partitioning type and field.
RangePartitioningStringA JSON object specifying the range partitioning field and buckets.
ClusteringStringA JSON object specifying the fields to be used for clustering.
AutodetectStringSpecifies if options and schema should be automatically determined for JSON and CSV files.
CreateDispositionStringSpecifies if the destination table needs to be created if it doesn't already exist. Following are the supported values:
  • CREATE_IF_NEEDED
  • CREATE_NEVER
The default value is CREATE_IF_NEEDED.
WriteDispositionStringSpecifies how to write data to the destination table, such as; truncate existing results, appending existing results, or writing only when the table is empty. Following are the supported values:
  • WRITE_TRUNCATE
  • WRITE_APPEND
  • WRITE_EMPTY
The default value is WRITE_APPEND.
RegionStringSpecifies the region where the job should execute. Both the Google Cloud Storage resources and the BigQuery dataset must be in the same region.
DryRunStringSpecifies if the job's execution is a dry run. The default value isfalse.
MaximumBadRecordsStringSpecifies the number of records that can be invalid before the entire job is canceled. By default all records must be valid. The default value is0.
IgnoreUnknownValuesStringSpecifies if the unknown fields must be ignored in the input file or treat them as errors. By default they are treated as errors. The default value isfalse.
AvroUseLogicalTypesStringSpecifies if AVRO logical types must be used to convert AVRO data to BigQuery types. The default value istrue.
CSVSkipLeadingRowsStringSpecifies how many rows to skip at the start of CSV files. This is usually used to skip header rows.
CSVEncodingStringEncoding type of the CSV files. Following are the supported values:
  • ISO-8859-1
  • UTF-8
The default value is UTF-8.
CSVNullMarkerStringIf provided, this string is used for NULL values within CSV files. By default, CSV files cannot use NULL.
CSVFieldDelimiterStringThe character used to separate columns within CSV files. The default value is a comma (,).
CSVQuoteStringThe character used for quoted fields in CSV files. May be set to empty to disable quoting. The default value is double quotes (").
CSVAllowQuotedNewlinesStringSpecifies if the CSV files can contain newlines within quoted fields. The default value isfalse.
CSVAllowJaggedRowsStringSpecifies if the CSV files can contain missing fields. The default value isfalse.
DSBackupProjectionFieldsStringA JSON list of fields to load from a Cloud datastore backup.
ParquetOptionsStringA JSON object specifying the Parquet-specific import options.
DecimalTargetTypesStringA JSON list giving the preference order applied to numeric types.
HivePartitioningOptionsStringA JSON object specifying the source-side partitioning options.

Execute custom SQL query

To create a custom query, follow these steps:

  1. Follow the detailed instructions to add a connectors task.
  2. When youconfigure the connector task, in the type of action you want to perform, selectActions.
  3. In theAction list, selectExecute custom query, and then clickDone.

    image showing execute-custom-query-actionimage showing execute-custom-query-action

  4. Expand theTask input section, and then do the following:
    1. In theTimeout after field, enter the number of seconds to wait till the query executes.

      Default value:180 seconds.

    2. In theMaximum number of rows field, enter the maximum number of rows to be returned from the database.

      Default value:25.

    3. To update the custom query, clickEdit Custom Script. TheScript editor dialog opens.

      image showing custom-sql-queryimage showing custom-sql-query

    4. In theScript editor dialog, enter the SQL query and clickSave.

      You can use a question mark (?) in a SQL statement to represent a single parameter that must be specified in the query parameters list. For example, the following SQL query selects all rows from theEmployees table that matches the values specified for theLastName column:

      SELECT * FROM Employees where LastName=?

      Note: Data manipulation language (DML) and data definition language (DDL) statements are supported.
    5. If you've used question marks in your SQL query, you must add the parameter by clicking+ Add Parameter Name for each question mark. While executing the integration, these parameters replace the question marks (?) in the SQL query sequentially. For example, if you have added three question marks (?), then you must add three parameters in order of sequence.

      image showing add-query-paramimage showing add-query-param

      To add query parameters, do the following:

      1. From theType list, select the data type of the parameter.
      2. In theValue field, enter the value of the parameter.
      3. To add multiple parameters, click+ Add Query Parameter.
    6. TheExecute custom query action does not support array variables.

Use terraform to create connections

You can use theTerraform resource to create a new connection.

To learn how to apply or remove a Terraform configuration, seeBasic Terraform commands.

To view a sample terraform template for connection creation, seesample template.

When creating this connection by using Terraform, you must set the following variables in your Terraform configuration file:

Parameter nameData typeRequiredDescription
project_idSTRINGTrueThe ID of the project containing BigQuery dataset. e.g. myproject.
dataset_idSTRINGFalseDataset ID of the BigQuery dataset without the project name. e.g. mydataset.
proxy_enabledBOOLEANFalseSelect this checkbox to configure a proxy server for the connection.
proxy_auth_schemeENUMFalseThe authentication type to use to authenticate to the ProxyServer proxy. Supported values are: BASIC, DIGEST, NONE
proxy_userSTRINGFalseA user name to be used to authenticate to the ProxyServer proxy.
proxy_passwordSECRETFalseA password to be used to authenticate to the ProxyServer proxy.
proxy_ssltypeENUMFalseThe SSL type to use when connecting to the ProxyServer proxy. Supported values are: AUTO, ALWAYS, NEVER, TUNNEL

System limitations

The BigQuery connector can process a maximum of 8 transactions per second, pernode, andthrottles any transactions beyond this limit. By default, Integration Connectors allocates 2 nodes (for better availability) for a connection.

For information on the limits applicable to Integration Connectors, seeLimits.

Note: The number ofIntegration Connectors nodes will autoscale dynamically based on your usage. However, if you want to reserve capacity for large volumes without waiting for autoscaling, you can adjust the minimum node value for a connection. More nodes are required to process more transactions for a connection. Conversely, fewer nodes are required if a connection processes fewer transactions. To configure the node values, do the following:
  • If you are a pay-as-you-go customer, configure the minimum and maximum node value in the edit connection page.
  • If you are a subscription based customer,contact support.

The maximum transactions that a node can handle depends on various factors. So, before adjusting the minimum nodes for better throughput, it is recommended you check if your backend systems are set up optimally to handle the required traffic.

Supported data types

The following are the supported data types for this connector:

  • ARRAY
  • BIGINT
  • BINARY
  • BIT
  • BOOLEAN
  • CHAR
  • DATE
  • DECIMAL
  • DOUBLE
  • FLOAT
  • INTEGER
  • LONGN VARCHAR
  • LONG VARCHAR
  • NCHAR
  • NUMERIC
  • NVARCHAR
  • REAL
  • SMALL INT
  • TIME
  • TIMESTAMP
  • TINY INT
  • VARBINARY
  • VARCHAR

Known limitations

  • The BigQuery connector doesn't support the primary key in a BigQuery table. It means that you can't perform the Get, Update, and Deleteentity operations by using anentityId. Alternately, you can use thefilter clause to filter records based on an ID.

  • When you fetch data for the first time, you might experience an initial latency of around 6 seconds. Due to caching, there is no latency for subsequent requests. This latency can recur upon cache expiry.

Get help from the Google Cloud community

You can post your questions and discuss this connector in the Google Cloudcommunity atCloud Forums.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.