Migrating Teradata to BigQuery tutorial

This document describes how tomigrate from Teradata to BigQueryusing sample data. It provides a proof-of-concept that walks you through theprocess of transferring both schema and data from a Teradata data warehouse toBigQuery.

Objectives

  • Generate synthetic data and upload it to Teradata.
  • Migrate the schema and data to BigQuery, using theBigQuery Data Transfer Service (BQDT).
  • Verify that queries return the same results on Teradata andBigQuery.

Costs

This quickstart uses the following billable components of Google Cloud:

  • BigQuery:This tutorial stores close to 1 GB of data in BigQuery andprocesses under 2 GB when executing the queries once. As part of theGoogle Cloud Free Tier, BigQuery offers some resourcesfree of charge up to a specific limit. These free usage limits areavailable during and after the free trial period. If you go over theseusage limits and are no longer in the free trial period, you are chargedaccording to the pricing on theBigQuery pricing page.

You can use thepricing calculator to generate a cost estimate based on your projected usage.

Prerequisites

  • Make sure you have write and execute permissions in a machine that hasaccess to the internet, so that you can download the data generation tooland run it.
  • Make sure that you can connect to a Teradata database.
  • Make sure that the machine has the TeradataBTEQ andFastLoad client tools installed. You can get the Teradata client tools from theTeradata website.If you need help installing these tools, ask your system administrator fordetails on installing, configuring, and running them. As an alternative, orin addition to BTEQ, you might do the following:

  • Make sure that the machine has network connectivity withGoogle Cloud for the BigQuery Data Transfer Service agent to communicate withBigQuery and transfer the schema and data.

Note: The instructions to migrate the schema and data are covered in a separatetutorial that is linked from this one, as explained later.

Introduction

This quickstart guides you through a migration proof of concept. During thequickstart, you generate synthetic data and load it into Teradata. Then you usetheBigQuery Data Transfer Service to move the schema and data to BigQuery. Finally, you run querieson both sides to compare results. The end state is that the schema and data fromTeradata are mapped one-for-one into BigQuery.

This quickstart is intended for data warehouse administrators, developers, anddata practitioners in general who are interested in a hands-on experience with aschema and data migration using the BigQuery Data Transfer Service.

Generating the data

TheTransaction Processing Performance Council (TPC) is a non-profit organization that publishes benchmarking specifications.These specifications have become de facto industry standards for running datarelated benchmarks.

TheTPC-H specification is a benchmark that's focused on decision support. In this quickstart, you useparts of this specification to create the tables and generate synthetic data asa model of a real-life data warehouse. Although the specification was createdfor benchmarking, in this quickstart you use this model as part of the migrationproof of concept, not for benchmarking tasks.

  1. On the computer where you will connect to Teradata, use a web browserto download the latest available version of the TPC-H tools from theTPC website.
  2. Open a command terminal and change to the directory where you downloadedthe tools.
  3. Extract the downloaded zip file. Replacefile-name with the name ofthe file you downloaded:

    unzipfile-name.zip

    A directory whose name includes the tools version number is extracted.This directory includes the TPC source code for the DBGEN data generationtool and the TPC-H specification itself.

  4. Go to thedbgen subdirectory. Use the parent directory namecorresponding to your version, as in the following example:

    cd2.18.0_rc2/dbgen
  5. Create a makefile using the provided template:

    cpmakefile.suitemakefile
  6. Edit the makefile with a text editor. For example, use vi to edit the file:

    vimakefile
  7. In the makefile, change the values for the following variables:

    CC       = gcc# TDAT -> TERADATADATABASE = TDATMACHINE  = LINUXWORKLOAD = TPCH

    Depending on your environment, the C compiler (CC) orMACHINE valuesmight be different. if needed, ask your system administrator.

  8. Save the changes and close the file.

  9. Process the makefile:

    make
  10. Generate the TPC-H data using thedbgen tool:

    dbgen-v

    The data generation takes a couple of minutes. The-v (verbose) flagcauses the command to report on the progress. When data generation is done,you find 8 ASCII files with the.tbl extension in the current folder.They contain pipe-delimited synthetic data to be loaded in each one of theTPC-H tables.

Uploading sample data to Teradata

In this section, you upload the generated data into your Teradata database.

Create the TPC-H database

The Teradata client, calledBasic Teradata Query (BTEQ),is used to communicate with one or more Teradata database servers and to run SQLqueries on those systems. In this section you use BTEQ to create a new databasefor the TPC-H tables.

  1. Open the Teradata BTEQ client:

    bteq
  2. Log in to Teradata. Replace theteradata-ip andteradata-user with the corresponding values for your environment.

    .LOGONteradata-ip/teradata-user
  3. Create a database namedtpch with 2 GB of allocated space:

    CREATEDATABASEtpchASPERM=2e+09;
  4. Exit BTEQ:

    .QUIT

Load the generated data

In this section, you create a FastLoad script to create and load the sampletables. The table definitions are described in section 1.4 of theTPC-H specification.Section 1.2 contains an entity-relationship diagram of the whole databaseschema.

The following procedure shows how to create thelineitem table, which is thelargest and most complex of the TPC-H tables. When you finish with thelineitem table, you repeat this procedure for the remaining tables.

  1. Using a text editor, create a new file namedfastload_lineitem.fl:

    vifastload_lineitem.fl
  2. Copy the following script into the file, which connects to the Teradatadatabase and creates a table namedlineitem.

    In thelogon command, replaceteradata-ip,teradata-user,andteradata-pwd with your connection details.

    logonteradata-ip/teradata-user,teradata-pwd;drop table tpch.lineitem;drop table tpch.error_1;drop table tpch.error_2;CREATE multiset TABLE tpch.lineitem,    NO FALLBACK,    NO BEFORE JOURNAL,    NO AFTER JOURNAL,    CHECKSUM = DEFAULT,    DEFAULT MERGEBLOCKRATIO    (     L_ORDERKEY INTEGER NOT NULL,     L_PARTKEY INTEGER NOT NULL,     L_SUPPKEY INTEGER NOT NULL,     L_LINENUMBER INTEGER NOT NULL,     L_QUANTITY DECIMAL(15,2) NOT NULL,     L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,     L_DISCOUNT DECIMAL(15,2) NOT NULL,     L_TAX DECIMAL(15,2) NOT NULL,     L_RETURNFLAG CHAR(1) CHARACTER SET LATIN CASESPECIFIC NOT NULL,     L_LINESTATUS CHAR(1) CHARACTER SET LATIN CASESPECIFIC NOT NULL,     L_SHIPDATE DATE FORMAT 'yyyy-mm-dd' NOT NULL,     L_COMMITDATE DATE FORMAT 'yyyy-mm-dd' NOT NULL,     L_RECEIPTDATE DATE FORMAT 'yyyy-mm-dd' NOT NULL,     L_SHIPINSTRUCT CHAR(25) CHARACTER SET LATIN CASESPECIFIC NOT NULL,     L_SHIPMODE CHAR(10) CHARACTER SET LATIN CASESPECIFIC NOT NULL,     L_COMMENT VARCHAR(44) CHARACTER SET LATIN CASESPECIFIC NOT NULL)PRIMARY INDEX ( L_ORDERKEY )PARTITION BY RANGE_N(L_COMMITDATE BETWEEN DATE '1992-01-01'                                 AND     DATE '1998-12-31'               EACH INTERVAL '1' DAY);

    The script first makes sure that thelineitem table and temporaryerror tables do not exist, and proceeds to create thelineitem table.

  3. In the same file, add the following code, which loads the data into thenewly created table. Complete all of the table fieldsin the three blocks (define,insert andvalues), making sure you usevarchar as their load data type.

    begin loading tpch.lineitemerrorfiles tpch.error_1, tpch.error_2; set record vartext;define in_ORDERKEY(varchar(50)), in_PARTKEY(varchar(50)), in_SUPPKEY(varchar(50)), in_LINENUMBER(varchar(50)), in_QUANTITY(varchar(50)), in_EXTENDEDPRICE(varchar(50)), in_DISCOUNT(varchar(50)), in_TAX(varchar(50)), in_RETURNFLAG(varchar(50)), in_LINESTATUS(varchar(50)), in_SHIPDATE(varchar(50)), in_COMMITDATE(varchar(50)), in_RECEIPTDATE(varchar(50)), in_SHIPINSTRUCT(varchar(50)), in_SHIPMODE(varchar(50)), in_COMMENT(varchar(50)) file = lineitem.tbl;insert into tpch.lineitem (  L_ORDERKEY,  L_PARTKEY,  L_SUPPKEY,  L_LINENUMBER,  L_QUANTITY,  L_EXTENDEDPRICE,  L_DISCOUNT,  L_TAX,  L_RETURNFLAG,  L_LINESTATUS,  L_SHIPDATE,  L_COMMITDATE,  L_RECEIPTDATE,  L_SHIPINSTRUCT,  L_SHIPMODE,  L_COMMENT) values (  :in_ORDERKEY,  :in_PARTKEY,  :in_SUPPKEY,  :in_LINENUMBER,  :in_QUANTITY,  :in_EXTENDEDPRICE,  :in_DISCOUNT,  :in_TAX,  :in_RETURNFLAG,  :in_LINESTATUS,  :in_SHIPDATE,  :in_COMMITDATE,  :in_RECEIPTDATE,  :in_SHIPINSTRUCT,  :in_SHIPMODE,  :in_COMMENT);end loading;logoff;

    The FastLoad script loads the data from a file in the same directorycalledlineitem.tbl, which you generated in the previous section.

  4. Save the changes and close the file.

  5. Run the FastLoad script:

    fastload <fastload_lineitem.fl
  6. Repeat this procedure for the rest of the TPC-H tables listed insection 1.4 of the TPC-H specification. Make sure that you adjust the stepsfor each table.

Migrating the schema and data to BigQuery

The instructions for how to migrate the schema and data toBigQuery are in a separate tutorial:Migrate data from Teradata.We've included details in this section on how to proceed with certain steps ofthat tutorial. When you've finished the steps in the other tutorial, return tothis document and continue with the next section,Verifying query results.

Note: You run all of the commands in this section inCloud Shell.

Create the BigQuery dataset

During the initial Google Cloud configuration steps, you're asked tocreate a dataset in BigQuery to hold the tables after they'remigrated. Name the datasettpch. The queries at the end of this quickstartassume this name, and don't require any modifications.

# Use the bq utility to create the datasetbqmk--location=UStpch

Create a service account

Also as part of the Google Cloud configuration steps, you must create anIdentity and Access Management (IAM) service account. This service account is usedto write the data into BigQuery and to store temporary data inCloud Storage.

# Set the PROJECT variableexportPROJECT=$(gcloudconfigget-valueproject)# Create a service accountgcloudiamservice-accountscreatetpch-transfer

Grant permissions to the service account that let it administerBigQuery datasets and the staging area inCloud Storage:

# Set TPCH_SVC_ACCOUNT = service account emailexportTPCH_SVC_ACCOUNT=tpch-transfer@${PROJECT}.iam.gserviceaccount.com# Bind the service account to the BigQuery Admin rolegcloudprojectsadd-iam-policy-binding${PROJECT}\--memberserviceAccount:${TPCH_SVC_ACCOUNT}\--roleroles/bigquery.admin# Bind the service account to the Storage Admin rolegcloudprojectsadd-iam-policy-binding${PROJECT}\--memberserviceAccount:${TPCH_SVC_ACCOUNT}\--roleroles/storage.admin

Create the staging Cloud Storage bucket

One additional task in the Google Cloud configuration is to create aCloud Storage bucket. This bucket is used by the BigQuery Data Transfer Serviceas a staging area for data files to be ingested into BigQuery.

# Use gcloud storage to create the bucketgcloudstoragebucketscreategs://${PROJECT}-tpch--location=us-central1

Specify the table name patterns

During the configuration of a new transfer in the BigQuery Data Transfer Service, you'reasked to specify an expression that indicates which tables to include in thetransfer. In this quickstart, you include all of the tables from thetpchdatabase.

The format of the expression isdatabase.table, and the table name can bereplaced by a wildcard. Because wildcards in Java start with two dots, theexpression to transfer all of the tables from thetpch database is asfollows:

tpch..*

Notice that there are two dots.

Verifying query results

At this point you've created sample data, uploaded the data to Teradata, andthen migrated it to BigQuery using the BigQuery Data Transfer Service, asexplained in the separate tutorial. In this section, you run two of the TPC-Hstandard queries to verify that the results are the same in Teradata and inBigQuery.

Run the pricing summary report query

The first query is the pricing summary report query (section 2.4.1 of the TPC-Hspecification). This query reports the number of items that were billed,shipped, and returned as of a given date.

The following listing shows the complete query:

SELECTl_returnflag,l_linestatus,SUM(l_quantity)ASsum_qty,SUM(l_extendedprice)ASsum_base_price,SUM(l_extendedprice*(1-l_discount))ASsum_disc_price,SUM(l_extendedprice*(1-l_discount)*(1+l_tax))ASsum_charge,AVG(l_quantity)ASavg_qty,AVG(l_extendedprice)ASavg_price,AVG(l_discount)ASavg_disc,COUNT(*)AScount_orderFROMtpch.lineitemWHEREl_shipdateBETWEEN'1996-01-01'AND'1996-01-10'GROUPBYl_returnflag,l_linestatusORDERBYl_returnflag,l_linestatus;

Run the query in Teradata:

  1. Run BTEQ and connect to Teradata. For details, seeCreate the TPC-H database earlier in this document.
  2. Change the output display width to 500 characters:

    .setwidth500
  3. Copy the query and paste it at the BTEQ prompt.

    The result looks similar to the following:

    L_RETURNFLAG  L_LINESTATUS            sum_qty     sum_base_price     sum_disc_price         sum_charge            avg_qty          avg_price           avg_disc  count_order------------  ------------  -----------------  -----------------  -----------------  -----------------  -----------------  -----------------  -----------------  -----------N             O                     629900.00       943154565.63     896323924.4600   932337245.114003              25.45           38113.41                .05        24746

Run the same query in BigQuery:

  1. Go to the BigQuery console:

    Go to BigQuery

  2. Copy the query into the query editor.

  3. Make sure that the dataset name in theFROM line is correct.

  4. ClickRun.

    The result is the same as the result from Teradata.

Optionally, you can choose wider time intervals in the query to make sure allof the rows in the table are scanned.

Run the local supplier volume query

The second example query is the local supplier volume query report (section2.4.5 of the TPC-H specification). For each nation in a region, this queryreturns the revenue that was produced by each line item in which the customerand the supplier were in that nation. These results are useful for somethinglike planning where to put distribution centers.

The following listing shows the complete query:

SELECTn_nameASnation,SUM(l_extendedprice*(1-l_discount)/1000)ASrevenueFROMtpch.customer,tpch.orders,tpch.lineitem,tpch.supplier,tpch.nation,tpch.regionWHEREc_custkey=o_custkeyANDl_orderkey=o_orderkeyANDl_suppkey=s_suppkeyANDc_nationkey=s_nationkeyANDs_nationkey=n_nationkeyANDn_regionkey=r_regionkeyANDr_name='EUROPE'ANDo_orderdate>='1996-01-01'ANDo_orderdate <'1997-01-01'GROUPBYn_nameORDERBYrevenueDESC;

Run the query in Teradata BTEQ and in the BigQuery console asdescribed in the previous section.

This is the result returned by Teradata:

Teradata results for the local supplier volume results query.

This is the result returned by BigQuery:

BigQuery results for the local supplier volume results query.

Both Teradata and BigQuery return the same results.

Run the product type profit measure query

The final test to verify the migration is the product type profit measure querylast example query (section 2.4.9 in the TPC-H specification). For each nationand each year, this query finds the profit for all parts ordered in that year.It filters the results by a substring in the part names and by a specificsupplier.

The following listing shows the complete query:

SELECTnation,o_year,SUM(amount)ASsum_profitFROM(SELECTn_nameASnation,EXTRACT(YEARFROMo_orderdate)ASo_year,(l_extendedprice*(1-l_discount)-ps_supplycost*l_quantity)/1e+3ASamountFROMtpch.part,tpch.supplier,tpch.lineitem,tpch.partsupp,tpch.orders,tpch.nationWHEREs_suppkey=l_suppkeyANDps_suppkey=l_suppkeyANDps_partkey=l_partkeyANDp_partkey=l_partkeyANDo_orderkey=l_orderkeyANDs_nationkey=n_nationkeyANDp_namelike'%blue%')ASprofitGROUPBYnation,o_yearORDERBYnation,o_yearDESC;

Run the query in Teradata BTEQ and in the BigQuery console asdescribed in the previous section.

This is the result returned by Teradata:

Teradata results for the product type profit measure query.

This is the result returned by BigQuery:

BigQuery results for the product type profit measure query.

Both Teradata and BigQuery return the same results, althoughTeradata uses scientific notation for the sum.

Additional queries

Optionally, you can run the rest of the TPC-H queries that are defined insection 2.4 of the TPC-H specification.

You can also generate queries following the TPC-H standard using the QGEN tool,which is in the same directory as the DBGEN tool. QGEN is built using the samemakefile as DBGEN, so when you run make to compiledbgen, you also producedtheqgen executable.

For more information on both tools and on their command-line options, see theREADME file for each tool.

Cleanup

To avoid incurring charges to your Google Cloud account for the resources usedin this tutorial, remove them.

Delete the project

The simplest way to stop billing charges is to delete the project you createdfor this tutorial.

    Caution: Deleting a project has the following effects:
    • Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
    • Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.
  1. In the Google Cloud console, go to theManage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then clickDelete.
  3. In the dialog, type the project ID, and then clickShut down to delete the project.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.