Migrating Teradata to BigQuery tutorial
This document describes how tomigrate from Teradata to BigQueryusing sample data. It provides a proof-of-concept that walks you through theprocess of transferring both schema and data from a Teradata data warehouse toBigQuery.
Objectives
- Generate synthetic data and upload it to Teradata.
- Migrate the schema and data to BigQuery, using theBigQuery Data Transfer Service (BQDT).
- Verify that queries return the same results on Teradata andBigQuery.
Costs
This quickstart uses the following billable components of Google Cloud:
- BigQuery:This tutorial stores close to 1 GB of data in BigQuery andprocesses under 2 GB when executing the queries once. As part of theGoogle Cloud Free Tier, BigQuery offers some resourcesfree of charge up to a specific limit. These free usage limits areavailable during and after the free trial period. If you go over theseusage limits and are no longer in the free trial period, you are chargedaccording to the pricing on theBigQuery pricing page.
You can use thepricing calculator to generate a cost estimate based on your projected usage.
Prerequisites
- Make sure you have write and execute permissions in a machine that hasaccess to the internet, so that you can download the data generation tooland run it.
- Make sure that you can connect to a Teradata database.
Make sure that the machine has the TeradataBTEQ andFastLoad client tools installed. You can get the Teradata client tools from theTeradata website.If you need help installing these tools, ask your system administrator fordetails on installing, configuring, and running them. As an alternative, orin addition to BTEQ, you might do the following:
- Install a tool with a graphical interface such asDBeaver.
- Install theTeradata SQL Driver for Python for scripting interactions with Teradata Database.
Make sure that the machine has network connectivity withGoogle Cloud for the BigQuery Data Transfer Service agent to communicate withBigQuery and transfer the schema and data.
Introduction
This quickstart guides you through a migration proof of concept. During thequickstart, you generate synthetic data and load it into Teradata. Then you usetheBigQuery Data Transfer Service to move the schema and data to BigQuery. Finally, you run querieson both sides to compare results. The end state is that the schema and data fromTeradata are mapped one-for-one into BigQuery.
This quickstart is intended for data warehouse administrators, developers, anddata practitioners in general who are interested in a hands-on experience with aschema and data migration using the BigQuery Data Transfer Service.
Generating the data
TheTransaction Processing Performance Council (TPC) is a non-profit organization that publishes benchmarking specifications.These specifications have become de facto industry standards for running datarelated benchmarks.
TheTPC-H specification is a benchmark that's focused on decision support. In this quickstart, you useparts of this specification to create the tables and generate synthetic data asa model of a real-life data warehouse. Although the specification was createdfor benchmarking, in this quickstart you use this model as part of the migrationproof of concept, not for benchmarking tasks.
- On the computer where you will connect to Teradata, use a web browserto download the latest available version of the TPC-H tools from theTPC website.
- Open a command terminal and change to the directory where you downloadedthe tools.
Extract the downloaded zip file. Replacefile-name with the name ofthe file you downloaded:
unzipfile-name.zip
A directory whose name includes the tools version number is extracted.This directory includes the TPC source code for the DBGEN data generationtool and the TPC-H specification itself.
Go to the
dbgensubdirectory. Use the parent directory namecorresponding to your version, as in the following example:cd2.18.0_rc2/dbgenCreate a makefile using the provided template:
cpmakefile.suitemakefileEdit the makefile with a text editor. For example, use vi to edit the file:
vimakefileIn the makefile, change the values for the following variables:
CC = gcc# TDAT -> TERADATADATABASE = TDATMACHINE = LINUXWORKLOAD = TPCHDepending on your environment, the C compiler (
CC) orMACHINEvaluesmight be different. if needed, ask your system administrator.Save the changes and close the file.
Process the makefile:
makeGenerate the TPC-H data using the
dbgentool:dbgen-vThe data generation takes a couple of minutes. The
-v(verbose) flagcauses the command to report on the progress. When data generation is done,you find 8 ASCII files with the.tblextension in the current folder.They contain pipe-delimited synthetic data to be loaded in each one of theTPC-H tables.
Uploading sample data to Teradata
In this section, you upload the generated data into your Teradata database.
Create the TPC-H database
The Teradata client, calledBasic Teradata Query (BTEQ),is used to communicate with one or more Teradata database servers and to run SQLqueries on those systems. In this section you use BTEQ to create a new databasefor the TPC-H tables.
Open the Teradata BTEQ client:
bteqLog in to Teradata. Replace theteradata-ip andteradata-user with the corresponding values for your environment.
.LOGONteradata-ip/teradata-user
Create a database named
tpchwith 2 GB of allocated space:CREATEDATABASEtpchASPERM=2e+09;Exit BTEQ:
.QUIT
Load the generated data
In this section, you create a FastLoad script to create and load the sampletables. The table definitions are described in section 1.4 of theTPC-H specification.Section 1.2 contains an entity-relationship diagram of the whole databaseschema.
The following procedure shows how to create thelineitem table, which is thelargest and most complex of the TPC-H tables. When you finish with thelineitem table, you repeat this procedure for the remaining tables.
Using a text editor, create a new file named
fastload_lineitem.fl:vifastload_lineitem.flCopy the following script into the file, which connects to the Teradatadatabase and creates a table named
lineitem.In the
logoncommand, replaceteradata-ip,teradata-user,andteradata-pwd with your connection details.logonteradata-ip/teradata-user,teradata-pwd;drop table tpch.lineitem;drop table tpch.error_1;drop table tpch.error_2;CREATE multiset TABLE tpch.lineitem, NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO ( L_ORDERKEY INTEGER NOT NULL, L_PARTKEY INTEGER NOT NULL, L_SUPPKEY INTEGER NOT NULL, L_LINENUMBER INTEGER NOT NULL, L_QUANTITY DECIMAL(15,2) NOT NULL, L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL, L_DISCOUNT DECIMAL(15,2) NOT NULL, L_TAX DECIMAL(15,2) NOT NULL, L_RETURNFLAG CHAR(1) CHARACTER SET LATIN CASESPECIFIC NOT NULL, L_LINESTATUS CHAR(1) CHARACTER SET LATIN CASESPECIFIC NOT NULL, L_SHIPDATE DATE FORMAT 'yyyy-mm-dd' NOT NULL, L_COMMITDATE DATE FORMAT 'yyyy-mm-dd' NOT NULL, L_RECEIPTDATE DATE FORMAT 'yyyy-mm-dd' NOT NULL, L_SHIPINSTRUCT CHAR(25) CHARACTER SET LATIN CASESPECIFIC NOT NULL, L_SHIPMODE CHAR(10) CHARACTER SET LATIN CASESPECIFIC NOT NULL, L_COMMENT VARCHAR(44) CHARACTER SET LATIN CASESPECIFIC NOT NULL)PRIMARY INDEX ( L_ORDERKEY )PARTITION BY RANGE_N(L_COMMITDATE BETWEEN DATE '1992-01-01' AND DATE '1998-12-31' EACH INTERVAL '1' DAY);
The script first makes sure that the
lineitemtable and temporaryerror tables do not exist, and proceeds to create thelineitemtable.In the same file, add the following code, which loads the data into thenewly created table. Complete all of the table fieldsin the three blocks (
define,insertandvalues), making sure you usevarcharas their load data type.begin loading tpch.lineitemerrorfiles tpch.error_1, tpch.error_2; set record vartext;define in_ORDERKEY(varchar(50)), in_PARTKEY(varchar(50)), in_SUPPKEY(varchar(50)), in_LINENUMBER(varchar(50)), in_QUANTITY(varchar(50)), in_EXTENDEDPRICE(varchar(50)), in_DISCOUNT(varchar(50)), in_TAX(varchar(50)), in_RETURNFLAG(varchar(50)), in_LINESTATUS(varchar(50)), in_SHIPDATE(varchar(50)), in_COMMITDATE(varchar(50)), in_RECEIPTDATE(varchar(50)), in_SHIPINSTRUCT(varchar(50)), in_SHIPMODE(varchar(50)), in_COMMENT(varchar(50)) file = lineitem.tbl;insert into tpch.lineitem ( L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT) values ( :in_ORDERKEY, :in_PARTKEY, :in_SUPPKEY, :in_LINENUMBER, :in_QUANTITY, :in_EXTENDEDPRICE, :in_DISCOUNT, :in_TAX, :in_RETURNFLAG, :in_LINESTATUS, :in_SHIPDATE, :in_COMMITDATE, :in_RECEIPTDATE, :in_SHIPINSTRUCT, :in_SHIPMODE, :in_COMMENT);end loading;logoff;
The FastLoad script loads the data from a file in the same directorycalled
lineitem.tbl, which you generated in the previous section.Save the changes and close the file.
Run the FastLoad script:
fastload <fastload_lineitem.flRepeat this procedure for the rest of the TPC-H tables listed insection 1.4 of the TPC-H specification. Make sure that you adjust the stepsfor each table.
Migrating the schema and data to BigQuery
The instructions for how to migrate the schema and data toBigQuery are in a separate tutorial:Migrate data from Teradata.We've included details in this section on how to proceed with certain steps ofthat tutorial. When you've finished the steps in the other tutorial, return tothis document and continue with the next section,Verifying query results.
Note: You run all of the commands in this section inCloud Shell.Create the BigQuery dataset
During the initial Google Cloud configuration steps, you're asked tocreate a dataset in BigQuery to hold the tables after they'remigrated. Name the datasettpch. The queries at the end of this quickstartassume this name, and don't require any modifications.
# Use the bq utility to create the datasetbqmk--location=UStpchCreate a service account
Also as part of the Google Cloud configuration steps, you must create anIdentity and Access Management (IAM) service account. This service account is usedto write the data into BigQuery and to store temporary data inCloud Storage.
# Set the PROJECT variableexportPROJECT=$(gcloudconfigget-valueproject)# Create a service accountgcloudiamservice-accountscreatetpch-transferGrant permissions to the service account that let it administerBigQuery datasets and the staging area inCloud Storage:
# Set TPCH_SVC_ACCOUNT = service account emailexportTPCH_SVC_ACCOUNT=tpch-transfer@${PROJECT}.iam.gserviceaccount.com# Bind the service account to the BigQuery Admin rolegcloudprojectsadd-iam-policy-binding${PROJECT}\--memberserviceAccount:${TPCH_SVC_ACCOUNT}\--roleroles/bigquery.admin# Bind the service account to the Storage Admin rolegcloudprojectsadd-iam-policy-binding${PROJECT}\--memberserviceAccount:${TPCH_SVC_ACCOUNT}\--roleroles/storage.adminCreate the staging Cloud Storage bucket
One additional task in the Google Cloud configuration is to create aCloud Storage bucket. This bucket is used by the BigQuery Data Transfer Serviceas a staging area for data files to be ingested into BigQuery.
# Use gcloud storage to create the bucketgcloudstoragebucketscreategs://${PROJECT}-tpch--location=us-central1Specify the table name patterns
During the configuration of a new transfer in the BigQuery Data Transfer Service, you'reasked to specify an expression that indicates which tables to include in thetransfer. In this quickstart, you include all of the tables from thetpchdatabase.
The format of the expression isdatabase.table, and the table name can bereplaced by a wildcard. Because wildcards in Java start with two dots, theexpression to transfer all of the tables from thetpch database is asfollows:
tpch..*Notice that there are two dots.
Verifying query results
At this point you've created sample data, uploaded the data to Teradata, andthen migrated it to BigQuery using the BigQuery Data Transfer Service, asexplained in the separate tutorial. In this section, you run two of the TPC-Hstandard queries to verify that the results are the same in Teradata and inBigQuery.
Run the pricing summary report query
The first query is the pricing summary report query (section 2.4.1 of the TPC-Hspecification). This query reports the number of items that were billed,shipped, and returned as of a given date.
The following listing shows the complete query:
SELECTl_returnflag,l_linestatus,SUM(l_quantity)ASsum_qty,SUM(l_extendedprice)ASsum_base_price,SUM(l_extendedprice*(1-l_discount))ASsum_disc_price,SUM(l_extendedprice*(1-l_discount)*(1+l_tax))ASsum_charge,AVG(l_quantity)ASavg_qty,AVG(l_extendedprice)ASavg_price,AVG(l_discount)ASavg_disc,COUNT(*)AScount_orderFROMtpch.lineitemWHEREl_shipdateBETWEEN'1996-01-01'AND'1996-01-10'GROUPBYl_returnflag,l_linestatusORDERBYl_returnflag,l_linestatus;Run the query in Teradata:
- Run BTEQ and connect to Teradata. For details, seeCreate the TPC-H database earlier in this document.
Change the output display width to 500 characters:
.setwidth500Copy the query and paste it at the BTEQ prompt.
The result looks similar to the following:
L_RETURNFLAG L_LINESTATUS sum_qty sum_base_price sum_disc_price sum_charge avg_qty avg_price avg_disc count_order------------ ------------ ----------------- ----------------- ----------------- ----------------- ----------------- ----------------- ----------------- -----------N O 629900.00 943154565.63 896323924.4600 932337245.114003 25.45 38113.41 .05 24746
Run the same query in BigQuery:
Go to the BigQuery console:
Copy the query into the query editor.
Make sure that the dataset name in the
FROMline is correct.ClickRun.
The result is the same as the result from Teradata.
Optionally, you can choose wider time intervals in the query to make sure allof the rows in the table are scanned.
Run the local supplier volume query
The second example query is the local supplier volume query report (section2.4.5 of the TPC-H specification). For each nation in a region, this queryreturns the revenue that was produced by each line item in which the customerand the supplier were in that nation. These results are useful for somethinglike planning where to put distribution centers.
The following listing shows the complete query:
SELECTn_nameASnation,SUM(l_extendedprice*(1-l_discount)/1000)ASrevenueFROMtpch.customer,tpch.orders,tpch.lineitem,tpch.supplier,tpch.nation,tpch.regionWHEREc_custkey=o_custkeyANDl_orderkey=o_orderkeyANDl_suppkey=s_suppkeyANDc_nationkey=s_nationkeyANDs_nationkey=n_nationkeyANDn_regionkey=r_regionkeyANDr_name='EUROPE'ANDo_orderdate>='1996-01-01'ANDo_orderdate <'1997-01-01'GROUPBYn_nameORDERBYrevenueDESC;Run the query in Teradata BTEQ and in the BigQuery console asdescribed in the previous section.
This is the result returned by Teradata:

This is the result returned by BigQuery:

Both Teradata and BigQuery return the same results.
Run the product type profit measure query
The final test to verify the migration is the product type profit measure querylast example query (section 2.4.9 in the TPC-H specification). For each nationand each year, this query finds the profit for all parts ordered in that year.It filters the results by a substring in the part names and by a specificsupplier.
The following listing shows the complete query:
SELECTnation,o_year,SUM(amount)ASsum_profitFROM(SELECTn_nameASnation,EXTRACT(YEARFROMo_orderdate)ASo_year,(l_extendedprice*(1-l_discount)-ps_supplycost*l_quantity)/1e+3ASamountFROMtpch.part,tpch.supplier,tpch.lineitem,tpch.partsupp,tpch.orders,tpch.nationWHEREs_suppkey=l_suppkeyANDps_suppkey=l_suppkeyANDps_partkey=l_partkeyANDp_partkey=l_partkeyANDo_orderkey=l_orderkeyANDs_nationkey=n_nationkeyANDp_namelike'%blue%')ASprofitGROUPBYnation,o_yearORDERBYnation,o_yearDESC;Run the query in Teradata BTEQ and in the BigQuery console asdescribed in the previous section.
This is the result returned by Teradata:

This is the result returned by BigQuery:

Both Teradata and BigQuery return the same results, althoughTeradata uses scientific notation for the sum.
Additional queries
Optionally, you can run the rest of the TPC-H queries that are defined insection 2.4 of the TPC-H specification.
You can also generate queries following the TPC-H standard using the QGEN tool,which is in the same directory as the DBGEN tool. QGEN is built using the samemakefile as DBGEN, so when you run make to compiledbgen, you also producedtheqgen executable.
For more information on both tools and on their command-line options, see theREADME file for each tool.
Cleanup
To avoid incurring charges to your Google Cloud account for the resources usedin this tutorial, remove them.
Delete the project
The simplest way to stop billing charges is to delete the project you createdfor this tutorial.
What's next
- Get step-by-step instructions toMigrate Teradata to BigQuery.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.