Teradata to BigQuery migration: Overview

This document provides more information to help you understand the decisions youneed to make when using the BigQuery Data Transfer Service to migrate schema and data fromTeradata toBigQuery.For an introduction to the Teradata migration process, seeIntroduction to a Teradata to BigQuery migration.

Migrating schema and data is typically one of several steps needed to move adata warehouse from a different platform to BigQuery.For a description of a general migration process, seeOverview: Migrate data warehouses to BigQuery.

You can also usebatch SQL translationto migrate your SQL scripts in bulk, orinteractive SQL translationto translate ad hoc queries. Teradata SQL is fully supported by bothSQL translation services.

Overview

You can use the BigQuery Data Transfer Service in combination with aspecial migration agent to copy your schema and data from Teradata toBigQuery. The migration agent connects to your local datawarehouse communicates with the BigQuery Data Transfer Serviceto copy tables from your data warehouse to BigQuery.

The following steps describe the workflow for the migration process:

  1. Download the migration agent.
  2. Configure a transfer in the BigQuery Data Transfer Service.
  3. Run the transfer job to copy table schema and data from your data warehouseto BigQuery.
  4. Optional. Monitor transfer jobs by using the Google Cloud console.

Transfer job configuration

You can configure a transfer job to best suit your needs. Beforesetting up a data transfer from Teradata to BigQuery, consider theconfiguration options described in the following sections and decide whatsettings to use. Depending on the settings youchoose, you might need to complete some prerequisites prior to startingthe transfer job.

For most systems, especially those with large tables, you can get the bestperformance by following these steps:

  1. Partition your Teradata tables.
  2. UseTeradata Parallel Transporter (TPT) forextraction.
  3. Create acustom schema file and configure your targetBigQuery clustering and partitioning columns.

This enables the migration agent to perform partition-by-partition extraction,which is the most efficient.

Extraction method

The BigQuery Data Transfer Service supports two extraction methods fortransferring data from Teradata to BigQuery:

  • Use theTeradata Parallel Transporter (TPT)tbuild utility.This is the recommended approach. Using TPT typically results infaster data extraction.

    In this mode, the migration agent attempts to calculate extraction batchesusing rows distributed by partitions. For each batch, the agent emitsand executes a TPT extractionscript, producing a set of pipe delimited files.It then uploads these files to a Cloud Storage bucket,where they are used by the transfer job. Once the files are uploaded toCloud Storage, the migration agent deleted them from the local filesystem.

    When you use TPT extractionwithout a partitioning column, yourwhole table is extracted. When you use TPT extractionwith apartitioning column, the agent extracts sets of partitions.

    In this mode, the migration agent doesn't limit the amount ofspace that the extracted files take up on the local file system.Make sure the local file system has more space than the size of your largestpartition or your largest table, depending on whether you are specifyinga partitioning column or not.

    Useaccess module for Cloud Storage.This approach eliminates the need for intermediate storage on your local file system,which provides better performance and lower resource utilizationof the VM running the agent. This approach directly exports the data to Cloud Storageusing the Teradata access module for Cloud Storage. To use this feature,the Teradata tools running on your VM must be newer than version 17.20.Teradata tools can be independently upgraded without any changes toTeradata instance version.

  • Extraction using a JDBC driver with FastExport connection.If there are constraints on the local storage space available for extractedfiles, or if there is some reason you can't use TPT, then use thisextraction method.

    In this mode,the migration agent extracts tables into a collection of AVRO files on thelocal file system. It then uploads these files to a Cloud Storage bucket,where they are used by the transfer job. Once the files are uploaded toCloud Storage, the migration agent deletes them from the local filesystem.

    In this mode, you can limit the amount of space used by the AVRO fileson the local file system. If this limit is exceeded,extraction is paused until space is freed up by the migration agentuploading and deleting existing AVRO files.

Schema identification

You can define the schema in several ways. The BigQuery Data Transfer Serviceprovides automatic schema detection anddata type mappingduring a data transfer from Teradata to BigQuery.You can also use the translation engine to get the data type mapping, or youcan choose to specify a custom schema file instead.

Default schema detection

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Note: To get support or provide feedback for this feature, contactbq-dts-migration@google.com.

If you don't specify any schema configuration, the BigQuery Data Transfer Serviceautomatically detects the schema of your Teradata source tables andperform data type mapping to the corresponding BigQuery data typesduring the data transfer. For more information on the default data type mapping,seeData types.

Using translation engine output for schema

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Note: To get support or provide feedback for this feature, contactbq-dts-migration@google.com.

The BigQuery Data Transfer Service uses the output of the BigQuerytranslation engine for schema mapping during the migration of Teradatatables to BigQuery. To use this option, ensure the followingprerequisites are met:

  1. Generate metadata for translation. Execute the dumper tool to generatemetadata for translation, adhering to the Teradata sourceguidelines. For more information, seeGenerate metadata for translation and assessment.
  2. Upload the generated metadata file (Forexample,metadata.zip) to a Cloud Storage bucket. This bucket serves as theinput location for the translation engine.
  3. Initiate a batch translation job to create the BigQuery Data Transfer Servicemapping, which defines the schema for the target BigQuerytables. For information on how to do this, seeCreate a batchtranslation.The following example generates the BigQuery Data Transfer Service mapping byspecifyingtarget_types = "dts_mapping":

    curl-d"{\"name\": \"teradata_2_bq_translation\", \"displayName\": \"Teradata to BigQuery Translation\", \"tasks\": {     string: {       \"type\": \"Teradata2BigQuery_Translation\",       \"translation_details\": {           \"target_base_uri\": \"gs://your_translation_output_bucket/output\",           \"source_target_mapping\": {             \"source_spec\": {                 \"base_uri\": \"gs://your_metadata_bucket/input\"             }           },           \"target_types\": \"metadata\",       }     } }, }"\ -H"Content-Type:application/json"\ -H"Authorization: Bearer YOUR_ACCESS_TOKEN"-XPOSThttps://bigquerymigration.googleapis.com/v2alpha/projects/your_project_id/locations/your_location/workflows

    You can check the status of the batch translation job in the Google Cloud console by navigating toBigQuery ->SQL Translation. Once complete, the mapping file is stored in a Cloud Storage location specified in thetarget_base_uri flag.

    To generate a token, use thegcloud auth print-access-token command or theOAuth 2.0 Playground with the scopehttps://www.googleapis.com/auth/cloud-platform.

  4. In your Teradata data transfer configuration, specify the path tothe Cloud Storage folder where the mapping file from the previous step isstored. The BigQuery Data Transfer Service uses this mapping to define theschema of your target BigQuery tables.

Custom schema file

We recommend specifying custom schema in the following situations:

  • If you need to capture important informationabout a table, like partitioning, that would otherwise be lost in themigration.

    For example,incremental transfers should have aschema file specified so that data from subsequent transfers can beproperly partitioned when loaded into BigQuery.Without a schema file, everytime a transfer runs, the BigQuery Data Transfer Service automatically applies atable schema by using the source data being transferred, and allinformation about partitioning, clustering, primary keys and changetracking is lost.

  • If you need to change column names or data types during thedata transfer.

A custom schema file is a JSON file that describes database objects. The schemacontains a set of databases, each containing a set of tables, each of whichcontains a set of columns. Each object has anoriginalName field that indicates the object name in Teradata, and anamefield that indicates the target name for the object in BigQuery.

Columns have the following fields:

  • originalType: indicates the column data type in Teradata
  • type: indicates the target data type for the column inBigQuery.
  • usageType: information about the way thecolumn is used by the system. The following usage types are supported:

    • DEFAULT: You can annotate multiple columns in one target table withthis usage type. ThisusageType indicates that the column has nospecial use in the source system. This is the default value.
    • CLUSTERING: You can annotate up to four columns in each target tablewith this usage type. The column order for clustering is determinedbased on the order in which they appear in the custom schema. Thecolumns you select must meet theconstraintsfor clustering in BigQuery. If aPARTITIONING field isspecified for the same table, BigQuery uses these columnsto create a clustered table.
    • PARTITIONING: You can annotate only one column in each target tablewith this usage type. This column is used in thepartitionedtable definition for the containingtables object. You can only usethis usage type with column that has aTIMESTAMP orDATE data type.
    • COMMIT_TIMESTAMP: You can annotate only one column in each targettable with this usage type. Use thisusageType to identify an updatetimestamp column forincremental updates. This column isused to extract rows that have been created or updated sincethe last transfer run. You can only use this usage type with columns thathave aTIMESTAMP orDATE data type.
    • PRIMARY_KEY: You can annotate columns in each target table with thisusage type. Use this usage type to identify just one column asthe primary key, or in the case of a composite key, use the same usage typeon multiple columns to identify the unique entities of a table. Thesecolumns work together withCOMMIT_TIMESTAMP to extract rowscreated or updated since the last transfer run.

You can create a custom schema file manually, as seen in the following example,or you can have the migration agent generate one for you when you initialize the agent.

In this example, a user is migrating a Teradata table calledorders in thetpchdatabase with the following table definition:

  CREATE SET TABLE TPCH.orders ,FALLBACK ,      NO BEFORE JOURNAL,      NO AFTER JOURNAL,      CHECKSUM = DEFAULT,      DEFAULT MERGEBLOCKRATIO,      MAP = TD_MAP1      (        O_ORDERKEY INTEGER NOT NULL,        O_CUSTKEY INTEGER NOT NULL,        O_ORDERSTATUS CHAR(1) CHARACTER SET LATIN CASESPECIFIC NOT NULL,        O_TOTALPRICE DECIMAL(15,2) NOT NULL,        O_ORDERDATE DATE FORMAT 'yyyy-mm-dd' NOT NULL,        O_ORDERPRIORITY CHAR(15) CHARACTER SET LATIN CASESPECIFIC NOT NULL,        O_CLERK CHAR(15) CHARACTER SET LATIN CASESPECIFIC NOT NULL,        O_SHIPPRIORITY INTEGER NOT NULL,        O_COMMENT VARCHAR(79) CHARACTER SET LATIN CASESPECIFIC NOT NULL)  UNIQUE PRIMARY INDEX ( O_ORDERKEY );

While migrating to BigQuery, the user wants to configurethe schema with the following changes:

  • Rename theO_CUSTKEY column toO_CUSTOMERKEY
  • IdentifyO_ORDERDATE as the partitioning column

The following example is a custom schema to configure these settings:

{"databases":[{"name":"tpch","originalName":"e2e_db","tables":[{"name":"orders","originalName":"orders","columns":[{"name":"O_ORDERKEY","originalName":"O_ORDERKEY","type":"INT64","originalType":"integer","usageType":["DEFAULT"],"isRequired":true,"originalColumnLength":4},{"name":"O_CUSTOMERKEY","originalName":"O_CUSTKEY","type":"INT64","originalType":"integer","usageType":["DEFAULT"],"isRequired":true,"originalColumnLength":4},{"name":"O_ORDERSTATUS","originalName":"O_ORDERSTATUS","type":"STRING","originalType":"character","usageType":["DEFAULT"],"isRequired":true,"originalColumnLength":1},{"name":"O_TOTALPRICE","originalName":"O_TOTALPRICE","type":"NUMERIC","originalType":"decimal","usageType":["DEFAULT"],"isRequired":true,"originalColumnLength":8},{"name":"O_ORDERDATE","originalName":"O_ORDERDATE","type":"DATE","originalType":"date","usageType":["PARTITIONING"],"isRequired":true,"originalColumnLength":4},{"name":"O_ORDERPRIORITY","originalName":"O_ORDERPRIORITY","type":"STRING","originalType":"character","usageType":["DEFAULT"],"isRequired":true,"originalColumnLength":15},{"name":"O_CLERK","originalName":"O_CLERK","type":"STRING","originalType":"character","usageType":["DEFAULT"],"isRequired":true,"originalColumnLength":15},{"name":"O_SHIPPRIORITY","originalName":"O_SHIPPRIORITY","type":"INT64","originalType":"integer","usageType":["DEFAULT"],"isRequired":true,"originalColumnLength":4},{"name":"O_COMMENT","originalName":"O_COMMENT","type":"STRING","originalType":"varchar","usageType":["DEFAULT"],"isRequired":true,"originalColumnLength":79}]}]}]}

On-demand or Incremental transfers

When migrating data from a Teradata database instance toBigQuery, the BigQuery Data Transfer Service supports both full transfers(on-demand transfer) and recurring transfers (incremental transfers). Youdesignate the transfer as on-demand or incremental in the scheduling optionswhensetting up a transfer.

  • On-demand transfer: Use this mode to perform the full snapshot migrationof schema and data from Teradata to BigQuery.

  • Scheduled transfer: Use this mode to perform the full snapshot and regularlymigrate new and modified data (incremental data) from Teradata toBigQuery. Incremental transfers requires customizing yourschema to annotate columns with either of the below use cases:

    • Annotate columns with onlyCOMMIT_TIMESTAMP usage type: In this transfer,new or modified rows in Teradata are appended to data inBigQuery. Updated rows in BigQuery tables mightpotentially have duplicate rows with old and new values.
    • Annotate columns with bothCOMMIT_TIMESTAMP andPRIMARY_KEY usage type:In this transfer, new rows are appended and modified rows are updated to thecorresponding row in BigQuery. The column defined inPRIMARY_KEY is used to maintain uniqueness of the data inBigQuery.
    • ThePRIMARY_KEY column defined in the schema does not have to be thePRIMARY_KEY in the Teradata table. It can be any column, but must containunique data.

Incremental transfers

In incremental transfers, the first transfer always creates a tablesnapshot in BigQuery. All subsequent incremental transferswill adhere to the annotations defined in the custom schema file explainedbelow.

For each transfer run, a timestamp of the transfer run is saved. For eachsubsequent transfer run, an agent receives the timestamp of a previous transferrun (T1) and a timestamp of when the current transfer run started (T2).

For transfers after initial run, the migration agent will extract datausing the following per-table logic:

  • If a table object in a schema file does not have a column with ausage type ofCOMMIT_TIMESTAMP, then the table is skipped.
  • If a table has a column with the usage type ofCOMMIT_TIMESTAMP, then allrows with a timestamp between T1 and T2 are extracted and appendedto the existing table in BigQuery.
  • If a table has a column with the usage type ofCOMMIT_TIMESTAMP and acolumn with usage type ofPRIMARY_KEY, then all rows with a timestampbetween T1 and T2 are extracted. Any new rows are appended and modified rowsare updated in the existing table in BigQuery.
Note: Incremental migration from Teradata does not support the syncing ofdeleted rows with BigQuery.

The following are example schema files for incremental transfers.

Schema with onlyCOMMIT_TIMESTAMP

{"databases":[{"name":"abc_db","originalName":"abc_db","tables":[{"name":"abc_table","originalName":"abc_table","columns":[{"name":"Id","originalName":"Id","type":"INT64","originalType":"integer","originalColumnLength":4,"usageType":["DEFAULT"],"isRequired":true},{"name":"timestamp","originalName":"timestamp","type":"TIMESTAMP","originalType":"timestamp","originalColumnLength":26,"usageType":["COMMIT_TIMESTAMP"],"isRequired":false}]}]}]}

Scheme withCOMMIT_TIMESTAMP and one column (Id) asPRIMARY_KEY

{"databases":[{"name":"abc_db","originalName":"abc_db","tables":[{"name":"abc_table","originalName":"abc_table","columns":[{"name":"Id","originalName":"Id","type":"INT64","originalType":"integer","originalColumnLength":4,"usageType":["PRIMARY_KEY"],"isRequired":true},{"name":"timestamp","originalName":"timestamp","type":"TIMESTAMP","originalType":"timestamp","originalColumnLength":26,"usageType":["COMMIT_TIMESTAMP"],"isRequired":false}]}]}]}

Schema withCOMMIT_TIMESTAMP and Composite key (Id + Name) asPRIMARY_KEY

{"databases":[{"name":"abc_db","originalName":"abc_db","tables":[{"name":"abc_table","originalName":"abc_table","columns":[{"name":"Id","originalName":"Id","type":"INT64","originalType":"integer","originalColumnLength":4,"usageType":["PRIMARY_KEY"],"isRequired":true},{"name":"Name","originalName":"Name","type":"STRING","originalType":"character","originalColumnLength":30,"usageType":["PRIMARY_KEY"],"isRequired":false},{"name":"timestamp","originalName":"timestamp","type":"TIMESTAMP","originalType":"timestamp","originalColumnLength":26,"usageType":["COMMIT_TIMESTAMP"],"isRequired":false}]}]}]}

The following table describes how the migration agent handles data definitionlanguage (DDL) and data manipulation language (DML) operations in incrementaltransfers.

Teradata operationTypeTeradata-to-BigQuery support
CREATEDDLA new full snapshot for thetable is created in BigQuery.
DROPDDLNot supported
ALTER (RENAME)DDLA new full snapshot for therenamed table is created in BigQuery. The previous snapshot is not deleted from BigQuery}. The user is not notified of the renamed table.
INSERTDMLNew rows are added to the BigQuery table.
UPDATEDMLRows are appended to the BigQuery table as new, similar to anINSERT operation if onlyCOMMIT_TIMESTAMP is used. Rows are updated, similar to anUPDATE operation if bothCOMMIT_TIMESTAMP andPRIMARY_KEY are used.
MERGEDMLNot supported. See insteadINSERT,UPDATE, andDELETE.
DELETEDMLNot supported

Location considerations

Your Cloud Storage bucket must be in a region or multi-region that iscompatible with the region or multi-region of the destination dataset inBigQuery.

  • If your BigQuery dataset is in a multi-region, the Cloud Storage bucket containing the data you're transferring must be in the same multi-region or in a location that is contained within the multi-region. For example, if your BigQuery dataset is in theEU multi-region, the Cloud Storage bucket can be located in theeurope-west1 Belgium region, which is within the EU.
  • If your dataset is in a single region, your Cloud Storage bucket must be in the same region. For example, if your dataset is in theasia-northeast1 Tokyo region, your Cloud Storage bucket cannot be in theASIA multi-region.

For detailed information about transfers and regions, seeDataset locations and transfers.

Pricing

The data transfer with BigQuery is free of charge. However,costs can be incurred outside of Google by using this service, such asplatform outbound data transfer charges.

  • Extraction, uploading to a Cloud Storage bucket, and loading data intoBigQuery is free.
  • Data isnot automatically deleted from your Cloud Storage bucketafter it is uploaded to BigQuery. Consider deleting the data fromyour Cloud Storage bucket to avoid additional storage costs. SeeCloud Storage pricing.
  • Standard BigQueryQuotas & limitson load jobs apply.
  • Standard DML BigQueryQuotas & limitson incremental ingestion upserts will apply.
  • After data is transferred to BigQuery, standardBigQuerystorage andcompute pricing applies.
  • See our transfersPricing page for details.

Limitations

  • One-time, on-demand transfers are fully supported.DDL/DML operations in incremental transfersare partially supported.
  • During data transfer, data is extracted to a directory on the local filesystem. Make sure there is adequate free space.
    • When using the FastExport mode of extraction, you can set the maximumstorage space to be used, and the limit strongly enforced by the migrationagent. Set themax-local-storage setting in themigration agent's configuration filewhensetting up a transfer from Teradata to BigQuery.
    • When using the TPT extraction method, make sure the file system has enoughfree space — larger than the largest table partition in theTeradata instance.
  • The BigQuery Data Transfer Service converts schema automatically (if you don't supply acustom schema file) and transfers Teradata data to BigQuery.Data ismapped from Teradata to BigQuery types.
  • Files arenot automatically deleted from your Cloud Storage bucket afterbeing loaded into BigQuery.Consider deleting the data from your Cloud Storage bucket after loading it intoBigQuery, to avoid additional storage costs. SeePricing.
  • The speed of the extraction is bounded by your JDBC connection.
  • The data extracted from Teradata isnot encrypted. Take appropriatesteps to restrict access to the extracted files in the local file system,and ensure the Cloud Storage bucket is properly secured.
  • Other database resources, such as stored procedures, saved queries, views, anduser-defined functions arenot transferred and not in the scope of thisservice.
  • Incremental transfers does not support hard deletes. Incremental transfersdoes not sync any deleted rows in Teradata with BigQuery.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.