Teradata to BigQuery migration: Overview
This document provides more information to help you understand the decisions youneed to make when using the BigQuery Data Transfer Service to migrate schema and data fromTeradata toBigQuery.For an introduction to the Teradata migration process, seeIntroduction to a Teradata to BigQuery migration.
Migrating schema and data is typically one of several steps needed to move adata warehouse from a different platform to BigQuery.For a description of a general migration process, seeOverview: Migrate data warehouses to BigQuery.
You can also usebatch SQL translationto migrate your SQL scripts in bulk, orinteractive SQL translationto translate ad hoc queries. Teradata SQL is fully supported by bothSQL translation services.
Overview
You can use the BigQuery Data Transfer Service in combination with aspecial migration agent to copy your schema and data from Teradata toBigQuery. The migration agent connects to your local datawarehouse communicates with the BigQuery Data Transfer Serviceto copy tables from your data warehouse to BigQuery.
The following steps describe the workflow for the migration process:
- Download the migration agent.
- Configure a transfer in the BigQuery Data Transfer Service.
- Run the transfer job to copy table schema and data from your data warehouseto BigQuery.
- Optional. Monitor transfer jobs by using the Google Cloud console.
Transfer job configuration
You can configure a transfer job to best suit your needs. Beforesetting up a data transfer from Teradata to BigQuery, consider theconfiguration options described in the following sections and decide whatsettings to use. Depending on the settings youchoose, you might need to complete some prerequisites prior to startingthe transfer job.
For most systems, especially those with large tables, you can get the bestperformance by following these steps:
- Partition your Teradata tables.
- UseTeradata Parallel Transporter (TPT) forextraction.
- Create acustom schema file and configure your targetBigQuery clustering and partitioning columns.
This enables the migration agent to perform partition-by-partition extraction,which is the most efficient.
Extraction method
The BigQuery Data Transfer Service supports two extraction methods fortransferring data from Teradata to BigQuery:
Use theTeradata Parallel Transporter (TPT)tbuild utility.This is the recommended approach. Using TPT typically results infaster data extraction.
In this mode, the migration agent attempts to calculate extraction batchesusing rows distributed by partitions. For each batch, the agent emitsand executes a TPT extractionscript, producing a set of pipe delimited files.It then uploads these files to a Cloud Storage bucket,where they are used by the transfer job. Once the files are uploaded toCloud Storage, the migration agent deleted them from the local filesystem.
When you use TPT extractionwithout a partitioning column, yourwhole table is extracted. When you use TPT extractionwith apartitioning column, the agent extracts sets of partitions.
In this mode, the migration agent doesn't limit the amount ofspace that the extracted files take up on the local file system.Make sure the local file system has more space than the size of your largestpartition or your largest table, depending on whether you are specifyinga partitioning column or not.
Useaccess module for Cloud Storage.This approach eliminates the need for intermediate storage on your local file system,which provides better performance and lower resource utilizationof the VM running the agent. This approach directly exports the data to Cloud Storageusing the Teradata access module for Cloud Storage. To use this feature,the Teradata tools running on your VM must be newer than version 17.20.Teradata tools can be independently upgraded without any changes toTeradata instance version.
Extraction using a JDBC driver with FastExport connection.If there are constraints on the local storage space available for extractedfiles, or if there is some reason you can't use TPT, then use thisextraction method.
In this mode,the migration agent extracts tables into a collection of AVRO files on thelocal file system. It then uploads these files to a Cloud Storage bucket,where they are used by the transfer job. Once the files are uploaded toCloud Storage, the migration agent deletes them from the local filesystem.
In this mode, you can limit the amount of space used by the AVRO fileson the local file system. If this limit is exceeded,extraction is paused until space is freed up by the migration agentuploading and deleting existing AVRO files.
Schema identification
You can define the schema in several ways. The BigQuery Data Transfer Serviceprovides automatic schema detection anddata type mappingduring a data transfer from Teradata to BigQuery.You can also use the translation engine to get the data type mapping, or youcan choose to specify a custom schema file instead.
Default schema detection
Preview
This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
Note: To get support or provide feedback for this feature, contactbq-dts-migration@google.com.If you don't specify any schema configuration, the BigQuery Data Transfer Serviceautomatically detects the schema of your Teradata source tables andperform data type mapping to the corresponding BigQuery data typesduring the data transfer. For more information on the default data type mapping,seeData types.
Using translation engine output for schema
Preview
This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
Note: To get support or provide feedback for this feature, contactbq-dts-migration@google.com.The BigQuery Data Transfer Service uses the output of the BigQuerytranslation engine for schema mapping during the migration of Teradatatables to BigQuery. To use this option, ensure the followingprerequisites are met:
- Generate metadata for translation. Execute the dumper tool to generatemetadata for translation, adhering to the Teradata sourceguidelines. For more information, seeGenerate metadata for translation and assessment.
- Upload the generated metadata file (Forexample,
metadata.zip) to a Cloud Storage bucket. This bucket serves as theinput location for the translation engine. Initiate a batch translation job to create the BigQuery Data Transfer Servicemapping, which defines the schema for the target BigQuerytables. For information on how to do this, seeCreate a batchtranslation.The following example generates the BigQuery Data Transfer Service mapping byspecifying
target_types = "dts_mapping":curl-d"{\"name\": \"teradata_2_bq_translation\", \"displayName\": \"Teradata to BigQuery Translation\", \"tasks\": { string: { \"type\": \"Teradata2BigQuery_Translation\", \"translation_details\": { \"target_base_uri\": \"gs://your_translation_output_bucket/output\", \"source_target_mapping\": { \"source_spec\": { \"base_uri\": \"gs://your_metadata_bucket/input\" } }, \"target_types\": \"metadata\", } } }, }"\ -H"Content-Type:application/json"\ -H"Authorization: Bearer YOUR_ACCESS_TOKEN"-XPOSThttps://bigquerymigration.googleapis.com/v2alpha/projects/your_project_id/locations/your_location/workflowsYou can check the status of the batch translation job in the Google Cloud console by navigating toBigQuery ->SQL Translation. Once complete, the mapping file is stored in a Cloud Storage location specified in the
target_base_uriflag.To generate a token, use the
gcloud auth print-access-tokencommand or theOAuth 2.0 Playground with the scopehttps://www.googleapis.com/auth/cloud-platform.In your Teradata data transfer configuration, specify the path tothe Cloud Storage folder where the mapping file from the previous step isstored. The BigQuery Data Transfer Service uses this mapping to define theschema of your target BigQuery tables.
Custom schema file
We recommend specifying custom schema in the following situations:
If you need to capture important informationabout a table, like partitioning, that would otherwise be lost in themigration.
For example,incremental transfers should have aschema file specified so that data from subsequent transfers can beproperly partitioned when loaded into BigQuery.Without a schema file, everytime a transfer runs, the BigQuery Data Transfer Service automatically applies atable schema by using the source data being transferred, and allinformation about partitioning, clustering, primary keys and changetracking is lost.
If you need to change column names or data types during thedata transfer.
A custom schema file is a JSON file that describes database objects. The schemacontains a set of databases, each containing a set of tables, each of whichcontains a set of columns. Each object has anoriginalName field that indicates the object name in Teradata, and anamefield that indicates the target name for the object in BigQuery.
Columns have the following fields:
originalType: indicates the column data type in Teradatatype: indicates the target data type for the column inBigQuery.usageType: information about the way thecolumn is used by the system. The following usage types are supported:DEFAULT: You can annotate multiple columns in one target table withthis usage type. ThisusageTypeindicates that the column has nospecial use in the source system. This is the default value.CLUSTERING: You can annotate up to four columns in each target tablewith this usage type. The column order for clustering is determinedbased on the order in which they appear in the custom schema. Thecolumns you select must meet theconstraintsfor clustering in BigQuery. If aPARTITIONINGfield isspecified for the same table, BigQuery uses these columnsto create a clustered table.PARTITIONING: You can annotate only one column in each target tablewith this usage type. This column is used in thepartitionedtable definition for the containingtablesobject. You can only usethis usage type with column that has aTIMESTAMPorDATEdata type.COMMIT_TIMESTAMP: You can annotate only one column in each targettable with this usage type. Use thisusageTypeto identify an updatetimestamp column forincremental updates. This column isused to extract rows that have been created or updated sincethe last transfer run. You can only use this usage type with columns thathave aTIMESTAMPorDATEdata type.PRIMARY_KEY: You can annotate columns in each target table with thisusage type. Use this usage type to identify just one column asthe primary key, or in the case of a composite key, use the same usage typeon multiple columns to identify the unique entities of a table. Thesecolumns work together withCOMMIT_TIMESTAMPto extract rowscreated or updated since the last transfer run.
You can create a custom schema file manually, as seen in the following example,or you can have the migration agent generate one for you when you initialize the agent.
In this example, a user is migrating a Teradata table calledorders in thetpchdatabase with the following table definition:
CREATE SET TABLE TPCH.orders ,FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO, MAP = TD_MAP1 ( O_ORDERKEY INTEGER NOT NULL, O_CUSTKEY INTEGER NOT NULL, O_ORDERSTATUS CHAR(1) CHARACTER SET LATIN CASESPECIFIC NOT NULL, O_TOTALPRICE DECIMAL(15,2) NOT NULL, O_ORDERDATE DATE FORMAT 'yyyy-mm-dd' NOT NULL, O_ORDERPRIORITY CHAR(15) CHARACTER SET LATIN CASESPECIFIC NOT NULL, O_CLERK CHAR(15) CHARACTER SET LATIN CASESPECIFIC NOT NULL, O_SHIPPRIORITY INTEGER NOT NULL, O_COMMENT VARCHAR(79) CHARACTER SET LATIN CASESPECIFIC NOT NULL) UNIQUE PRIMARY INDEX ( O_ORDERKEY );
While migrating to BigQuery, the user wants to configurethe schema with the following changes:
- Rename the
O_CUSTKEYcolumn toO_CUSTOMERKEY - Identify
O_ORDERDATEas the partitioning column
The following example is a custom schema to configure these settings:
{"databases":[{"name":"tpch","originalName":"e2e_db","tables":[{"name":"orders","originalName":"orders","columns":[{"name":"O_ORDERKEY","originalName":"O_ORDERKEY","type":"INT64","originalType":"integer","usageType":["DEFAULT"],"isRequired":true,"originalColumnLength":4},{"name":"O_CUSTOMERKEY","originalName":"O_CUSTKEY","type":"INT64","originalType":"integer","usageType":["DEFAULT"],"isRequired":true,"originalColumnLength":4},{"name":"O_ORDERSTATUS","originalName":"O_ORDERSTATUS","type":"STRING","originalType":"character","usageType":["DEFAULT"],"isRequired":true,"originalColumnLength":1},{"name":"O_TOTALPRICE","originalName":"O_TOTALPRICE","type":"NUMERIC","originalType":"decimal","usageType":["DEFAULT"],"isRequired":true,"originalColumnLength":8},{"name":"O_ORDERDATE","originalName":"O_ORDERDATE","type":"DATE","originalType":"date","usageType":["PARTITIONING"],"isRequired":true,"originalColumnLength":4},{"name":"O_ORDERPRIORITY","originalName":"O_ORDERPRIORITY","type":"STRING","originalType":"character","usageType":["DEFAULT"],"isRequired":true,"originalColumnLength":15},{"name":"O_CLERK","originalName":"O_CLERK","type":"STRING","originalType":"character","usageType":["DEFAULT"],"isRequired":true,"originalColumnLength":15},{"name":"O_SHIPPRIORITY","originalName":"O_SHIPPRIORITY","type":"INT64","originalType":"integer","usageType":["DEFAULT"],"isRequired":true,"originalColumnLength":4},{"name":"O_COMMENT","originalName":"O_COMMENT","type":"STRING","originalType":"varchar","usageType":["DEFAULT"],"isRequired":true,"originalColumnLength":79}]}]}]}On-demand or Incremental transfers
When migrating data from a Teradata database instance toBigQuery, the BigQuery Data Transfer Service supports both full transfers(on-demand transfer) and recurring transfers (incremental transfers). Youdesignate the transfer as on-demand or incremental in the scheduling optionswhensetting up a transfer.
On-demand transfer: Use this mode to perform the full snapshot migrationof schema and data from Teradata to BigQuery.
Scheduled transfer: Use this mode to perform the full snapshot and regularlymigrate new and modified data (incremental data) from Teradata toBigQuery. Incremental transfers requires customizing yourschema to annotate columns with either of the below use cases:
- Annotate columns with only
COMMIT_TIMESTAMPusage type: In this transfer,new or modified rows in Teradata are appended to data inBigQuery. Updated rows in BigQuery tables mightpotentially have duplicate rows with old and new values. - Annotate columns with both
COMMIT_TIMESTAMPandPRIMARY_KEYusage type:In this transfer, new rows are appended and modified rows are updated to thecorresponding row in BigQuery. The column defined inPRIMARY_KEYis used to maintain uniqueness of the data inBigQuery. - The
PRIMARY_KEYcolumn defined in the schema does not have to be thePRIMARY_KEYin the Teradata table. It can be any column, but must containunique data.
- Annotate columns with only
Incremental transfers
In incremental transfers, the first transfer always creates a tablesnapshot in BigQuery. All subsequent incremental transferswill adhere to the annotations defined in the custom schema file explainedbelow.
For each transfer run, a timestamp of the transfer run is saved. For eachsubsequent transfer run, an agent receives the timestamp of a previous transferrun (T1) and a timestamp of when the current transfer run started (T2).
For transfers after initial run, the migration agent will extract datausing the following per-table logic:
- If a table object in a schema file does not have a column with ausage type of
COMMIT_TIMESTAMP, then the table is skipped. - If a table has a column with the usage type of
COMMIT_TIMESTAMP, then allrows with a timestamp between T1 and T2 are extracted and appendedto the existing table in BigQuery. - If a table has a column with the usage type of
COMMIT_TIMESTAMPand acolumn with usage type ofPRIMARY_KEY, then all rows with a timestampbetween T1 and T2 are extracted. Any new rows are appended and modified rowsare updated in the existing table in BigQuery.
The following are example schema files for incremental transfers.
Schema with onlyCOMMIT_TIMESTAMP
{"databases":[{"name":"abc_db","originalName":"abc_db","tables":[{"name":"abc_table","originalName":"abc_table","columns":[{"name":"Id","originalName":"Id","type":"INT64","originalType":"integer","originalColumnLength":4,"usageType":["DEFAULT"],"isRequired":true},{"name":"timestamp","originalName":"timestamp","type":"TIMESTAMP","originalType":"timestamp","originalColumnLength":26,"usageType":["COMMIT_TIMESTAMP"],"isRequired":false}]}]}]}Scheme withCOMMIT_TIMESTAMP and one column (Id) asPRIMARY_KEY
{"databases":[{"name":"abc_db","originalName":"abc_db","tables":[{"name":"abc_table","originalName":"abc_table","columns":[{"name":"Id","originalName":"Id","type":"INT64","originalType":"integer","originalColumnLength":4,"usageType":["PRIMARY_KEY"],"isRequired":true},{"name":"timestamp","originalName":"timestamp","type":"TIMESTAMP","originalType":"timestamp","originalColumnLength":26,"usageType":["COMMIT_TIMESTAMP"],"isRequired":false}]}]}]}Schema withCOMMIT_TIMESTAMP and Composite key (Id + Name) asPRIMARY_KEY
{"databases":[{"name":"abc_db","originalName":"abc_db","tables":[{"name":"abc_table","originalName":"abc_table","columns":[{"name":"Id","originalName":"Id","type":"INT64","originalType":"integer","originalColumnLength":4,"usageType":["PRIMARY_KEY"],"isRequired":true},{"name":"Name","originalName":"Name","type":"STRING","originalType":"character","originalColumnLength":30,"usageType":["PRIMARY_KEY"],"isRequired":false},{"name":"timestamp","originalName":"timestamp","type":"TIMESTAMP","originalType":"timestamp","originalColumnLength":26,"usageType":["COMMIT_TIMESTAMP"],"isRequired":false}]}]}]}The following table describes how the migration agent handles data definitionlanguage (DDL) and data manipulation language (DML) operations in incrementaltransfers.
| Teradata operation | Type | Teradata-to-BigQuery support |
|---|---|---|
CREATE | DDL | A new full snapshot for thetable is created in BigQuery. |
DROP | DDL | Not supported |
ALTER (RENAME) | DDL | A new full snapshot for therenamed table is created in BigQuery. The previous snapshot is not deleted from BigQuery}. The user is not notified of the renamed table. |
INSERT | DML | New rows are added to the BigQuery table. |
UPDATE | DML | Rows are appended to the BigQuery table as new, similar to anINSERT operation if onlyCOMMIT_TIMESTAMP is used. Rows are updated, similar to anUPDATE operation if bothCOMMIT_TIMESTAMP andPRIMARY_KEY are used. |
MERGE | DML | Not supported. See insteadINSERT,UPDATE, andDELETE. |
DELETE | DML | Not supported |
Location considerations
Your Cloud Storage bucket must be in a region or multi-region that iscompatible with the region or multi-region of the destination dataset inBigQuery.
- If your BigQuery dataset is in a multi-region, the Cloud Storage bucket containing the data you're transferring must be in the same multi-region or in a location that is contained within the multi-region. For example, if your BigQuery dataset is in the
EUmulti-region, the Cloud Storage bucket can be located in theeurope-west1Belgium region, which is within the EU. - If your dataset is in a single region, your Cloud Storage bucket must be in the same region. For example, if your dataset is in the
asia-northeast1Tokyo region, your Cloud Storage bucket cannot be in theASIAmulti-region.
For detailed information about transfers and regions, seeDataset locations and transfers.
Pricing
The data transfer with BigQuery is free of charge. However,costs can be incurred outside of Google by using this service, such asplatform outbound data transfer charges.
- Extraction, uploading to a Cloud Storage bucket, and loading data intoBigQuery is free.
- Data isnot automatically deleted from your Cloud Storage bucketafter it is uploaded to BigQuery. Consider deleting the data fromyour Cloud Storage bucket to avoid additional storage costs. SeeCloud Storage pricing.
- Standard BigQueryQuotas & limitson load jobs apply.
- Standard DML BigQueryQuotas & limitson incremental ingestion upserts will apply.
- After data is transferred to BigQuery, standardBigQuerystorage andcompute pricing applies.
- See our transfersPricing page for details.
Limitations
- One-time, on-demand transfers are fully supported.DDL/DML operations in incremental transfersare partially supported.
- During data transfer, data is extracted to a directory on the local filesystem. Make sure there is adequate free space.
- When using the FastExport mode of extraction, you can set the maximumstorage space to be used, and the limit strongly enforced by the migrationagent. Set the
max-local-storagesetting in themigration agent's configuration filewhensetting up a transfer from Teradata to BigQuery. - When using the TPT extraction method, make sure the file system has enoughfree space — larger than the largest table partition in theTeradata instance.
- When using the FastExport mode of extraction, you can set the maximumstorage space to be used, and the limit strongly enforced by the migrationagent. Set the
- The BigQuery Data Transfer Service converts schema automatically (if you don't supply acustom schema file) and transfers Teradata data to BigQuery.Data ismapped from Teradata to BigQuery types.
- Files arenot automatically deleted from your Cloud Storage bucket afterbeing loaded into BigQuery.Consider deleting the data from your Cloud Storage bucket after loading it intoBigQuery, to avoid additional storage costs. SeePricing.
- The speed of the extraction is bounded by your JDBC connection.
- The data extracted from Teradata isnot encrypted. Take appropriatesteps to restrict access to the extracted files in the local file system,and ensure the Cloud Storage bucket is properly secured.
- Other database resources, such as stored procedures, saved queries, views, anduser-defined functions arenot transferred and not in the scope of thisservice.
- Incremental transfers does not support hard deletes. Incremental transfersdoes not sync any deleted rows in Teradata with BigQuery.
What's next
- Get step-by-step instructions toMigrate Teradata to BigQuery.
- Try atest migration ofTeradata to BigQuery.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.