Generate metadata for translation and assessment

Preview

This product is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA products are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

This document describes how to create metadata and query log files by using thedwh-migration-dumper command-line extraction tool. The metadata files describe the SQL objects in your sourcesystem.

BigQuery Migration Service uses this information to improve thetranslation of your SQL scripts from your source system dialect toGoogleSQL.

The BigQuery migration assessment uses metadata files and query log files to analyze your existing data warehouse and help assess the effort of moving your data warehouse to BigQuery.

Overview

You can use thedwh-migration-dumper tool to extract metadata information from thedatabase platform that you are migrating to BigQuery. While using theextraction tool isn't required for translation, it is required forBigQuery migration assessment and we strongly recommend using it for all migration tasks.

For more information, seeCreate metadata files.

You can use thedwh-migration-dumper tool to extract metadata from thefollowing database platforms:

  • Teradata
  • Amazon Redshift
  • Apache Hive
  • Apache Impala
  • Apache Spark
  • Azure Synapse
  • Greenplum
  • SQL Server
  • IBM Netezza
  • Oracle
  • PostgreSQL
  • Snowflake
  • Trino or PrestoSQL
  • Vertica
  • BigQuery

For most of these databases you can also extract query logs.

Thedwh-migration-dumper tool queries system tables to gather data definitionlanguage (DDL) statements related to user and system databases. It does notquery the contents of user databases. The tool saves the metadata informationfrom the system tables as CSV files and then zips these files intoa single package. You then upload this zip file to Cloud Storage when you uploadyour source files for translation or assessment.

When using the query logs option, thedwh-migration-dumper tool queries system tables for DDL statements and query logs related to user and system databases. These are saved in CSV or yaml format to a subdirectory, and then packed into a zip package. At no point are the contents of user databases queried themselves. At this point, the BigQuery migration assessment requires individual CSV, YAML and text files for query logs so you should unzip all of these files from query logs zip file and upload them for assessment.

Thedwh-migration-dumper tool can run on Windows, macOS, and Linux.

Thedwh-migration-dumper tool is available under theApache 2 license.

If you choose not to use thedwh-migration-dumper tool for translation, you can manually providemetadata files by collecting the data definition language (DDL) statements forthe SQL objects in your source system into separate text files.

Providing metadata and query logs extracted with the tool is required for migration assessment using BigQuery migration assessment.

Compliance requirements

We provide the compileddwh-migration-dumper tool binary for ease of use. If youneed to audit the tool to ensure that it meets compliance requirements, youcan review the source code from thedwh-migration-dumper tool GitHub repository,and compile your own binary.

Prerequisites

Install Java

The server on which you plan to rundwh-migration-dumper tool must haveJava 8 or higher installed. If it doesn't, download Java from theJava downloads pageand install it.

Required permissions

The user account that you specify for connecting thedwh-migration-dumper tool tothe source system must have permissions to read metadata from that system.Confirm that this account has appropriate role membership to query the metadataresources available for your platform. For example,INFORMATION_SCHEMA is ametadata resource that is common across several platforms.

Install thedwh-migration-dumper tool

To install thedwh-migration-dumper tool, follow these steps:

  1. On the machine where you want to run thedwh-migration-dumper tool, download thezip file from thedwh-migration-dumper tool GitHub repository.
  2. To validate thedwh-migration-dumper tool zip file, download theSHA256SUMS.txt file and run the following command:

    Bash

    sha256sum--checkSHA256SUMS.txt

    If verification fails, seeTroubleshooting.

    Windows PowerShell

    (Get-FileHashRELEASE_ZIP_FILENAME).Hash-eq((Get-ContentSHA256SUMS.txt)-Split" ")[0]

    Replace theRELEASE_ZIP_FILENAME with the downloadedzip filename of thedwh-migration-dumper command-line extraction tool release—for example,dwh-migration-tools-v1.0.52.zip

    TheTrue result confirms successful checksum verification.

    TheFalse result indicates verification error. Make sure the checksum andzip files are downloaded from the same release version and placed in thesame directory.

  3. Extract the zip file. The extraction tool binary is in the/bin subdirectory of the folder created by extracting the zip file.

  4. Update thePATH environment variable to include the installation path forthe extraction tool.

Run thedwh-migration-dumper tool

Thedwh-migration-dumper tool uses the following format:

dwh-migration-dumper[FLAGS]

Running thedwh-migration-dumper tool creates an output file nameddwh-migration-<source platform>-metadata.zip—for example,dwh-migration-teradata-metadata.zip, in your working directory.

Tip: When using Windows PowerShell, surround flags with double quotes—forexample,dwh-migration-dumper "-Dteradata-logs.utility-logs-table=historicdb.ArchivedUtilityLogs".

Use the following instructions to learn how to run thedwh-migration-dumper toolfor your source platform.

Teradata

To allow thedwh-migration-dumper tool to connect to Teradata, downloadtheir JDBC driver from Teradata'sdownload page.

The following table describes the commonly used flags for extractingTeradata metadata and query logs by using the extraction tool. Forinformation about all supported flags, seeglobal flags.

NameDefault valueDescriptionRequired
--assessment

Turns on assessment mode when generating database logs or extracting metadata. Thedwh-migration-dumper tool generates required metadata statistics for BigQuery migration assessment when used for metadata extraction. When used for query logs it extracts additional columns for BigQuery migration assessment.

Required when using for running assessment, not required for translation.
--connector The name of the connector to use, in this caseteradata for metadata orteradata-logs for query logs.Yes
--database

A list of the databases to extract, separated by commas. The database names might be case-sensitive, depending on the Teradata server configuration.

If this flag is used in combination with theteradata connector, then thedwh-migration-dumper tool filters the metadata tables and views by the provided list of databases. The exceptions are theDatabasesV andRoleMembersV views - thedwh-migration-dumper tool extracts the databases and users from these views without filtering by the database name.

This flag cannot be used in combination with theteradata-logs connector. Query logs are always extracted for all the databases.

No
--driverThe absolute or relative path to the driver JAR file to use for this connection. You can specify multiple driver JAR files, separating them by commas.Yes
--hostlocalhostThe hostname or IP address of the database server.No
--passwordThe password to use for the database connection.If not specified, the extraction tool uses a secure prompt to request it.
--port1025The port of the database server.No
--user

The username to use for the database connection.

Yes
--query-log-alternates

For theteradata-logs connector only.

To extract the query logs from an alternative location, we recommend that you use the-Dteradata-logs.query-logs-table and-Dteradata-logs.sql-logs-table flags instead.

By default, the query logs are extracted from the tablesdbc.DBQLogTbl anddbc.DBQLSQLTbl. If you use the--assessment flag, then the query logs are extracted from the viewdbc.QryLogV and from the tabledbc.DBQLSQLTbl. If you need to extract the query logs from an alternative location, you can specify the fully-qualified names of the tables or views by using the--query-log-alternates flag. The first parameter references the alternative to thedbc.DBQLogTbl table, and the second parameter references the alternative to thedbc.DBQLSQLTbl table. Both parameters are required.
The-Dteradata-logs.log-date-column flag can be used to improve extraction performance when both tables have an indexed column of typeDATE.

Example:--query-log-alternates historicdb.ArchivedQryLogV,historicdb.ArchivedDBQLSqlTbl

No
-Dteradata.tmode

The transaction mode for the connection. The following values are supported:

  • ANSI: ANSI mode. This is the default mode (if the flag is not specified)
  • TERA: Teradata transaction mode (BTET)
  • DEFAULT: use the default transaction mode configured on the database server
  • NONE: no mode is set for the connection

Example (Bash):
-Dteradata.tmode=TERA

Example (Windows PowerShell):
"-Dteradata.tmode=TERA"

No
-Dteradata-logs.log-date-column

For theteradata-logs connector only.

To improve performance of joining tables that are specified by the-Dteradata-logs.query-logs-table and-Dteradata-logs.sql-logs-table flags, you can include an additional column of typeDATE in theJOIN condition. This column must be defined in both tables and it must be part of the Partitioned Primary Index.

Example (Bash):
-Dteradata-logs.log-date-column=ArchiveLogDate

Example (Windows PowerShell):
"-Dteradata-logs.log-date-column=ArchiveLogDate"

No
-Dteradata-logs.query-logs-table

For theteradata-logs connector only.

By default, the query logs are extracted from thedbc.DBQLogTbl table. If you use the--assessment flag, then the query logs are extracted from the viewdbc.QryLogV. If you need to extract the query logs from an alternative location, you can specify the fully-qualified name of the table or view by using this flag.
See-Dteradata-logs.log-date-column flag to improve extraction performance.

Example (Bash):
-Dteradata-logs.query-logs-table=historicdb.ArchivedQryLogV

Example (Windows PowerShell):
"-Dteradata-logs.query-logs-table=historicdb.ArchivedQryLogV"

No
-Dteradata-logs.sql-logs-table

For theteradata-logs connector only.

By default, the query logs containing SQL text are extracted from thedbc.DBQLSqlTbl table. If you need to extract them from an alternative location, you can specify the fully-qualified name of the table or view by using this flag.
See-Dteradata-logs.log-date-column flag to improve extraction performance.

Example (Bash):
-Dteradata-logs.sql-logs-table=historicdb.ArchivedDBQLSqlTbl

Example (Windows PowerShell):
"-Dteradata-logs.sql-logs-table=historicdb.ArchivedDBQLSqlTbl"

No
-Dteradata-logs.utility-logs-table

For theteradata-logs connector only.

By default, the utility logs are extracted from the tabledbc.DBQLUtilityTbl. If you need to extract the utility logs from an alternative location, you can specify the fully-qualified name of the table by using the-Dteradata-logs.utility-logs-table flag.

Example (Bash):
-Dteradata-logs.utility-logs-table=historicdb.ArchivedUtilityLogs

Example (Windows PowerShell):
"-Dteradata-logs.utility-logs-table=historicdb.ArchivedUtilityLogs"

No
-Dteradata-logs.res-usage-scpu-table

For theteradata-logs connector only.

By default, the SCPU resource usage logs are extracted from the tabledbc.ResUsageScpu. If you need to extract these from an alternative location, you can specify the fully-qualified name of the table by using the-Dteradata-logs.res-usage-scpu-table flag.

Example (Bash):
-Dteradata-logs.res-usage-scpu-table=historicdb.ArchivedResUsageScpu

Example (Windows PowerShell):
"-Dteradata-logs.res-usage-scpu-table=historicdb.ArchivedResUsageScpu"

No
-Dteradata-logs.res-usage-spma-table

For theteradata-logs connector only.

By default, the SPMA resource usage logs are extracted from the tabledbc.ResUsageSpma. If you need to extract these logs from an alternative location, you can specify the fully-qualified name of the table by using the-Dteradata-logs.res-usage-spma-table flag.

Example (Bash):
-Dteradata-logs.res-usage-spma-table=historicdb.ArchivedResUsageSpma

Example (Windows PowerShell):
"-Dteradata-logs.res-usage-spma-table=historicdb.ArchivedResUsageSpma"

No
--query-log-start

The start time (inclusive) for query logs to extract. The value is truncated to the hour. This flag is only available for theteradata-logs connector.

Example:--query-log-start "2023-01-01 14:00:00"

No
--query-log-end

The end time (exclusive) for query logs to extract. The value is truncated to the hour. This flag is only available for theteradata-logs connector.

Example:--query-log-end "2023-01-15 22:00:00"

No
-Dteradata.metadata.tablesizev.max-rows

For theteradata connector only.

Limit the number of rows extracted from the viewTableSizeV. The rows are grouped by the columnsDatabaseName,AccountName, andTableName, and then sorted in descending order by the size of the permanent space (the expressionSUM(CurrentPerm)). Then, the specified number of rows are extracted.

Example (Bash):
-Dteradata.metadata.tablesizev.max-rows=100000

Example (Windows PowerShell):
"-Dteradata.metadata.tablesizev.max-rows=100000"

No
-Dteradata.metadata.diskspacev.max-rows

For theteradata connector only.

Limit the number of rows extracted from the viewDiskSpaceV. The rows are sorted in descending order by the size of the permanent space (columnCurrentPerm), and then the specified number of rows are extracted.

Example (Bash):
-Dteradata.metadata.diskspacev.max-rows=100000

Example (Windows PowerShell):
"-Dteradata.metadata.diskspacev.max-rows=100000"

No
-Dteradata.metadata.databasesv.users.max-rows

For theteradata connector only.

Limit the number of rows that represent users (DBKind='U') that are extracted from the viewDatabasesV. The rows are sorted in descending order by the columnPermSpace, and then the specified number of rows are extracted.

Example (Bash):
-Dteradata.metadata.databasesv.users.max-rows=100000

Example (Windows PowerShell):
"-Dteradata.metadata.databasesv.users.max-rows=100000"

No
-Dteradata.metadata.databasesv.dbs.max-rows

For theteradata connector only.

Limit the number of rows that represent databases (DBKind='D') that are extracted from the viewDatabasesV. The rows are sorted in descending order by the columnPermSpace, and then the specified number of rows are extracted.

Example (Bash):
-Dteradata.metadata.databasesv.dbs.max-rows=100000

Example (Windows PowerShell):
"-Dteradata.metadata.databasesv.dbs.max-rows=100000"

No
-Dteradata.metadata.max-text-length

For theteradata connector only.

Maximum length of the text column when extracting the data from theTableTextV view. Text longer than the defined limit will be split into multiple rows. Allowed range: between 5000 and 32000 (inclusive).

Example (Bash):
-Dteradata.metadata.max-text-length=10000

Example (Windows PowerShell):
"-Dteradata.metadata.max-text-length=10000"

No
-Dteradata-logs.max-sql-length

For theteradata-logs connector only.

Maximum length of theDBQLSqlTbl.SqlTextInfo column. Query text longer than the defined limit will be split into multiple rows. Allowed range: between 5000 and 31000 (inclusive).

Example (Bash):
-Dteradata-logs.max-sql-length=10000

Example (Windows PowerShell):
"-Dteradata-logs.max-sql-length=10000"

No

Examples

The following example shows how to extract metadata for twoTeradata databases on the local host:

dwh-migration-dumper\--connectorteradata\--useruser\--passwordpassword\--databasedatabase1,database2\--driverpath/terajdbc4.jar

The following example shows how to extract query logs for Assessment on thelocal host for authentication:

dwh-migration-dumper\--connectorteradata-logs\--assessment\--useruser\--passwordpassword\--driverpath/terajdbc4.jar

Tables and views extracted by thedwh-migration-dumper tool

The following tables and views are extracted when you use theteradata connector:

  • DBC.ColumnsV
  • DBC.DatabasesV
  • DBC.DBCInfo
  • DBC.FunctionsV
  • DBC.IndicesV
  • DBC.PartitioningConstraintsV
  • DBC.TablesV
  • DBC.TableTextV

The following additional tables and views are extracted when you use theteradata connector with--assessment flag:

  • DBC.All_RI_ChildrenV
  • DBC.All_RI_ParentsV
  • DBC.AllTempTablesVX
  • DBC.DiskSpaceV
  • DBC.RoleMembersV
  • DBC.StatsV
  • DBC.TableSizeV

The following tables and views are extracted when you use theteradata-logs connector:

  • DBC.DBQLogTbl (changes toDBC.QryLogV if--assessment flag is used)
  • DBC.DBQLSqlTbl

The following additional tables and views are extracted when you use theteradata-logs connector with--assessment flag:

  • DBC.DBQLUtilityTbl
  • DBC.ResUsageScpu
  • DBC.ResUsageSpma

Redshift

You can use any of the following Amazon Redshift authentication andauthorization mechanisms with the extraction tool:

  • A username and password.
  • An AWS Identity and Access Management (Identity and Access Management (IAM)) access key ID and secret key.
  • An AWS IAM profile name.

To authenticate with the username and password, use the Amazon Redshiftdefault PostgreSQL JDBC driver. To authenticate with AWS IAM, use the AmazonRedshift JDBC driver, which you can download from theirdownload page.

The following table describes the commonly used flags for extractingAmazon Redshift metadata and query logs by using thedwh-migration-dumper tool. For informationabout all supported flags, seeglobal flags.

NameDefault valueDescriptionRequired
--assessment

Turning on assessment mode when generating database logs or extracting metadata. It generates required metadata statistics for BigQuery migration assessment when used for metadata extraction. When used for query logs extraction it generates query metrics statistics for BigQuery migration assessment.

Required when running in assessment mode, not required for translation.
--connector The name of the connector to use, in this caseredshift for metadata orredshift-raw-logs for query logs.Yes
--databaseIf not specified, Amazon Redshift uses the--user value as the default database name.

The name of the database to connect to.

No
--driverIf not specified, Amazon Redshift uses the default PostgreSQL JDBC driver.The absolute or relative path to the driver JAR file to use for this connection. You can specify multiple driver JAR files, separating them by commas.No
--hostlocalhostThe hostname or IP address of the database server.No
--iam-accesskeyid

The AWS IAM access key ID to use for authentication. The access key is a string of characters, something likeAKIAIOSFODNN7EXAMPLE.

Use in conjunction with the--iam-secretaccesskey flag. Do not use this flag when specifying the--iam-profile or--password flags.

Not explicitly, but you must provide authentication information through one of the following methods:

  • Using this flag in conjunction with the--iam-secretaccesskey flag.
  • Using the--iam-profile flag.
  • Using the--password flag in conjunction with the--user flag.
--iam-profile

The AWS IAM profile to use for authentication. You can retrieve a profile value to use by examining the$HOME/.aws/credentials file or by runningaws configure list-profiles.

Do not use this flag with the--iam-accesskeyid,--iam-secretaccesskey or--password flags.

Not explicitly, but you must provide authentication information through one of the following methods:

  • Using this flag.
  • Using the--iam-accesskeyid flag in conjunction with the--iam-secretaccesskey flag.
  • Using the--password flag in conjunction with the--user flag.
--iam-secretaccesskey

The AWS IAM secret access key to use for authentication. The secret access key is a string of characters, something likewJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY.

Use in conjunction with the--iam-accesskeyid flag. Do not use this flag with the--iam-profile or--password flags.

Not explicitly, but you must provide authentication information through one of the following methods:

  • Using this flag in conjunction with the--iam-accesskeyid flag.
  • Using the--iam-profile flag.
  • Using the--password flag in conjunction with the--user flag.
--passwordThe password to use for the database connection.

Do not use this flag with the--iam-accesskeyid,--iam-secretaccesskey or--iam-profile flags.

Not explicitly, but you must provide authentication information through one of the following methods:

  • Using this flag in conjunction with the--user flag.
  • Using the--iam-accesskeyid flag in conjunction with the--iam-secretaccesskey flag.
  • Using the--password flag.
--port5439The port of the database server.No
--userThe username to use for the database connection.Yes
--query-log-start

The start time (inclusive) for query logs to extract. The value is truncated to the hour. This flag is only available for theredshift-raw-logs connector.

Example:--query-log-start "2023-01-01 14:00:00"

No
--query-log-end

The end time (exclusive) for query logs to extract. The value is truncated to the hour. This flag is only available for theredshift-raw-logs connector.

Example:--query-log-end "2023-01-15 22:00:00"

No

Examples

The following example shows how to extract metadata from an Amazon Redshiftdatabase on a specified host, using AWS IAM keys for authentication:

dwh-migration-dumper\--connectorredshift\--databasedatabase\--driverpath/redshift-jdbc42-version.jar\--hosthost.region.redshift.amazonaws.com\--iam-accesskeyidaccess_key_ID\--iam-secretaccesskeysecret_access-key\--useruser

The following example shows how to extract metadata from an Amazon Redshiftdatabase on the default host, using the username andpassword for authentication:

dwh-migration-dumper\--connectorredshift\--databasedatabase\--passwordpassword\--useruser

The following example shows how to extract metadata from an Amazon Redshiftdatabase on a specified host, using an AWS IAM profilefor authentication:

dwh-migration-dumper\--connectorredshift\--databasedatabase\--driverpath/redshift-jdbc42-version.jar\--hosthost.region.redshift.amazonaws.com\--iam-profileprofile\--useruser\--assessment

The following example shows how to extract query logs for Assessment froman Amazon Redshift database on a specified host, using an AWS IAM profilefor authentication:

dwh-migration-dumper\--connectorredshift-raw-logs\--databasedatabase\--driverpath/redshift-jdbc42-version.jar\--host123.456.789.012\--iam-profileprofile\--useruser\--assessment

Tables and views extracted by thedwh-migration-dumper tool

The following tables and views are extracted when you use theredshift connector:

  • SVV_COLUMNS
  • SVV_EXTERNAL_COLUMNS
  • SVV_EXTERNAL_DATABASES
  • SVV_EXTERNAL_PARTITIONS
  • SVV_EXTERNAL_SCHEMAS
  • SVV_EXTERNAL_TABLES
  • SVV_TABLES
  • SVV_TABLE_INFO
  • INFORMATION_SCHEMA.COLUMNS
  • PG_CAST
  • PG_DATABASE
  • PG_LANGUAGE
  • PG_LIBRARY
  • PG_NAMESPACE
  • PG_OPERATOR
  • PG_PROC
  • PG_TABLE_DEF
  • PG_TABLES
  • PG_TYPE
  • PG_VIEWS

The following additional tables and views are extracted when you use theredshift connector with--assessment flag:

  • SVV_DISKUSAGE
  • STV_MV_INFO
  • STV_WLM_SERVICE_CLASS_CONFIG
  • STV_WLM_SERVICE_CLASS_STATE

The following tables and views are extracted when you use theredshift-raw-logs connector:

  • STL_DDLTEXT
  • STL_QUERY
  • STL_QUERYTEXT
  • PG_USER

The following additional tables and views are extracted when you use theredshift-raw-logs connector with--assessment flag:

  • STL_QUERY_METRICS
  • SVL_QUERY_QUEUE_INFO
  • STL_WLM_QUERY

For information about the system views and tables in Redshift, seeRedshift system views andRedshift system catalog tables.

Hive/Impala/Spark or Trino/PrestoSQL

Thedwh-migration-dumper tool only supports authentication to Apache Hive metastorethrough Kerberos. So the--user and--password flags aren't used, instead usethe--hive-kerberos-url flag to supply the Kerberos authentication details.

The following table describes the commonly used flags for extractingApache Hive, Impala, Spark, Presto, or Trino metadata by using theextraction tool. For information about all supported flags, seeglobal flags.

NameDefault valueDescriptionRequired
--assessment

Turns on assessment mode when extracting metadata. Thedwh-migration-dumper tool generates required metadata statistics for BigQuery migration assessment when used for metadata extraction.

Required for assessment. Not required for translation.
--connector The name of the connector to use, in this casehiveql.Yes
--hive-metastore-dump-partition-metadatatrue

Causes thedwh-migration-dumper tool to extract partition metadata. You might want to set this flag tofalse for production metastore with a significant number of partitions, due to Thrift client performance implications. This improves the extraction tool performance, but causes some loss of partition optimization on the BigQuery side.

Don't use this flag with the--assessment flag, as it will have no effect.

No
--hive-metastore-version2.3.6

When you run thedwh-migration-dumper tool, it selects theappropriateThrift specification to use for communicating with your Apache Hive server, based on the value of this flag. If the extraction tool doesn't have an appropriate Thrift specification, it uses the 2.3.6 client and emits a warning tostdout. If this occurs, pleasecontact Support and provide the Apache Hive version number you requested.

No
--hostlocalhostThe hostname or IP address of the database server.No
--port9083The port of the database server.No
--hive-kerberos-urlThe Kerberos principal and host to use for authentication.Required for clusters with enabled Kerberos authentication.
-Dhiveql.rpc.protection

The RPC protection configuration level. This determines the Quality of Protection (QOP) of the Simple Authentication and Security Layer (SASL) connection between cluster and thedwh-migration-dumper tool.

Must be equal to the value of thehadoop.rpc.protection parameter inside the/etc/hadoop/conf/core-site.xml file on the cluster, with one of the following values:

  • authentication
  • integrity
  • privacy

Example (Bash):
-Dhiveql.rpc.protection=privacy

Example (Windows PowerShell):
"-Dhiveql.rpc.protection=privacy"

Required for clusters with enabled Kerberos authentication.

Examples

The following example shows how to extract metadata for a Hive 2.3.7 databaseon a specified host, without authentication and using an alternate port for connection:

dwh-migration-dumper\--connectorhiveql\--hive-metastore-version2.3.7\--hosthost\--portport

To use Kerberos authentication, sign in as a user that has read permissionsto the Hive metastore and generate a Kerberos ticket. Then, generate themetadata zip file with the following command:

JAVA_OPTS="-Djavax.security.auth.useSubjectCredsOnly=false"\dwh-migration-dumper\--connectorhiveql\--hosthost\--portport\--hive-kerberos-urlprincipal/kerberos_host

Azure Synapse or Microsoft SQL Server

To allow thedwh-migration-dumper tool to connect to Azure Synapse orMicrosoft SQL Server, download their JDBC driver from Microsoft'sdownload page.

The following table describes the commonly used flags for extractingAzure Synapse or Microsoft SQL Server metadata by using the extraction tool.For information about all supported flags, seeglobal flags.

NameDefault valueDescriptionRequired
--connector The name of the connector to use, in this casesqlserver.Yes
--database

The name of the database to connect to.

Yes
--driverThe absolute or relative path to the driver JAR file to use for this connection. You can specify multiple driver JAR files, separating them by commas.Yes
--hostlocalhostThe hostname or IP address of the database server.No
--passwordThe password to use for the database connection.Yes
--port1433The port of the database server.No
--userThe username to use for the database connection.Yes

Examples

The following example shows how to extract metadata from an Azure Synapsedatabase on a specified host:

dwh-migration-dumper\--connectorsqlserver\--databasedatabase\--driverpath/mssql-jdbc.jar\--hostserver_name.sql.azuresynapse.net\--passwordpassword\--useruser

Greenplum

To allow thedwh-migration-dumper tool to connect to Greenplum, download theirJDBC driver from VMware Greenplum'sdownload page.

The following table describes the commonly used flags for extractingGreenplum metadata by using the extraction tool. Forinformation about all supported flags, seeglobal flags.

NameDefault valueDescriptionRequired
--connector The name of the connector to use, in this casegreenplum.Yes
--database

The name of the database to connect to.

Yes
--driverThe absolute or relative path to the driver JAR file to use for this connection. You can specify multiple driver JAR files, separating them by commas.Yes
--hostlocalhostThe hostname or IP address of the database server.No
--passwordThe password to use for the database connection.If not specified, the extraction tool uses a secure prompt to request it.
--port5432The port of the database server.No
--userThe username to use for the database connection.Yes

Examples

The following example shows how to extract metadata for a Greenplum databaseon a specified host:

dwh-migration-dumper\--connectorgreenplum\--databasedatabase\--driverpath/greenplum.jar\--hosthost\--passwordpassword\--useruser\

Netezza

To allow thedwh-migration-dumper tool to connect to IBM Netezza, you must gettheir JDBC driver. You can usually get the driver from the/nz/kit/sbindirectory on your IBM Netezza appliance host. If you can't locate it there, askyour system administrator for help, or readInstalling and Configuring JDBCin the IBM Netezza documentation.

The following table describes the commonly used flags for extractingIBM Netezza metadata by using the extraction tool. Forinformation about all supported flags, seeglobal flags.

NameDefault valueDescriptionRequired
--connector The name of the connector to use, in this casenetezza.Yes
--database

A list of the databases to extract, separated by commas.

Yes
--driverThe absolute or relative path to the driver JAR file to use for this connection. You can specify multiple driver JAR files, separating them by commas.Yes
--hostlocalhostThe hostname or IP address of the database server.No
--passwordThe password to use for the database connection.Yes
--port5480The port of the database server.No
--userThe username to use for the database connection.Yes

Examples

The following example shows how to extract metadata for two IBM Netezzadatabases on a specified host:

dwh-migration-dumper\--connectornetezza\--databasedatabase1,database2\--driverpath/nzjdbc.jar\--hosthost\--passwordpassword\--useruser

PostgreSQL

To allow thedwh-migration-dumper tool to connect to PostgreSQL, download theirJDBC driver from PostgreSQL'sdownload page.

The following table describes the commonly used flags for extractingPostgreSQL metadata by using the extraction tool. Forinformation about all supported flags, seeglobal flags.

NameDefault valueDescriptionRequired
--connector The name of the connector to use, in this casepostgresql.Yes
--database

The name of the database to connect to.

Yes
--driverThe absolute or relative path to the driver JAR file to use for this connection. You can specify multiple driver JAR files, separating them by commas.Yes
--hostlocalhostThe hostname or IP address of the database server.No
--passwordThe password to use for the database connection.If not specified, the extraction tool uses a secure prompt to request it.
--port5432The port of the database server.No
--userThe username to use for the database connection.Yes

Examples

The following example shows how to extract metadata for a PostgreSQL databaseon a specified host:

dwh-migration-dumper\--connectorpostgresql\--databasedatabase\--driverpath/postgresql-version.jar\--hosthost\--passwordpassword\--useruser

Oracle

To allow thedwh-migration-dumper tool to connect to Oracle, download theirJDBC driver from Oracle'sdownload page.

The following table describes the commonly used flags for extractingOracle metadata by using the extraction tool. Forinformation about all supported flags, seeglobal flags.

NameDefault valueDescriptionRequired
--connector The name of the connector to use, in this caseoracle.Yes
--driverThe absolute or relative path to the driver JAR file to use for this connection. You can specify multiple driver JAR files, separating them by commas.Yes
--hostlocalhostThe hostname or IP address of the database server.No
--oracle-service

The Oracle service name to use for the connection.

Not explicitly, but you must specify either this flag or the--oracle-sid flag.
--oracle-sid

The Oracle system identifier (SID) to use for the connection.

Not explicitly, but you must specify either this flag or the--oracle-service flag.
--passwordThe password to use for the database connection.If not specified, the extraction tool uses a secure prompt to request it.
--port1521The port of the database server.No
--user

The username to use for the database connection.

The user you specify must have the roleSELECT_CATALOG_ROLE in order to extract metadata. To see whether the user has the required role, run the queryselect granted_role from user_role_privs; against the Oracle database.

Yes

Examples

The following example shows how to extract metadata for an Oracle databaseon a specified host, using the Oracle service for the connection:

dwh-migration-dumper\--connectororacle\--driverpath/ojdbc8.jar\--hosthost\--oracle-serviceservice_name\--passwordpassword\--useruser

Snowflake

The following table describes the commonly used flags for extractingSnowflake metadata by using thedwh-migration-dumper tool. Forinformation about all supported flags, seeglobal flags.

NameDefault valueDescriptionRequired
--assessment

Turns on assessment mode when generating database logs or extracting metadata. Thedwh-migration-dumper tool generates required metadata statistics for BigQuery migration assessment when used for metadata extraction. When used for query logs, the tool extracts additional columns for BigQuery migration assessment.

Only for assessment.
--connector The name of the connector to use, in this casesnowflake.Yes
--database

The name of the database to extract.

You can only extract from one database at a time from Snowflake. This flag is not allowed in assessment mode.

Only for translation.
--hostlocalhostThe hostname or IP address of the database server.No
--private-key-file

The path to the RSA private key used for authentication. We recommend using aSERVICE user with a key-pair based authentication. This provides the secure method for accessing Snowflake data platform without a need to generate MFA tokens.

No, if not provided extraction tool uses a password based authentication.
--private-key-password

The password that was used when creating the RSA private key.

No, it is required only if the private key is encrypted.
--passwordThe password to use for the database connection. If not specified, the extraction tool uses a secure prompt to request it. However, we recommend using key-pair based authentication instead.
--query-log-start

The start time (inclusive) for query logs to extract. The value is truncated to the hour. This flag is only available for thesnowflake-logs connector.

Example:--query-log-start "2023-01-01 14:00:00"

No
--query-log-end

The end time (exclusive) for query logs to extract. The value is truncated to the hour. This flag is only available for thesnowflake-logs connector.

Example:--query-log-end "2023-01-15 22:00:00"

No
--roleThe Snowflake role to use for authorization. You only need to specify this for large installations where you need to get metadata from theSNOWFLAKE.ACCOUNT_USAGE schema instead ofINFORMATION_SCHEMA. For more information, seeWorking with large Snowflake instances.No
--user

The username to use for the database connection.

Yes
--warehouse

The Snowflake warehouse to use for processing metadata queries.

Yes

Examples

The following example shows how to extract metadata for assessment:

dwh-migration-dumper\--connectorsnowflake\--assessment\--host"account.snowflakecomputing.com"\--rolerole\--useruser\--private-key-fileprivate-key-file\--private-key-passwordprivate-key-password\--warehousewarehouse

The following example shows how to extract metadata for a typically sizedSnowflake database on the local host:

dwh-migration-dumper\--connectorsnowflake\--databasedatabase\--useruser\--private-key-fileprivate-key-file\--private-key-passwordprivate-key-password\--warehousewarehouse

The following example shows how to extract metadata for a largeSnowflake database on a specified host:

dwh-migration-dumper\--connectorsnowflake\--databasedatabase\--host"account.snowflakecomputing.com"\--rolerole\--useruser\--private-key-fileprivate-key-file\--private-key-passwordprivate-key-password\--warehousewarehouse

Alternatively, you can use the following example to extract metadata usingpassword-based authentication:

dwh-migration-dumper\--connectorsnowflake\--databasedatabase\--host"account.snowflakecomputing.com"\--passwordpassword\--useruser\--warehousewarehouse

Working with large Snowflake instances

Thedwh-migration-dumper tool reads metadata from the SnowflakeINFORMATION_SCHEMA. However, there is a limit to the amount of data you canretrieve fromINFORMATION_SCHEMA. If you run theextraction tool and receive the errorSnowflakeSQLException:Information schema query returned too much data, youmust take the following steps so that you can read metadata from theSNOWFLAKE.ACCOUNT_USAGE schema instead:

  1. Open theShares option in the Snowflake web interface.
  2. Create a database from theSNOWFLAKE.ACCOUNT_USAGE share:

    -- CREATE DATABASEdatabase FROM SHARE SNOWFLAKE.ACCOUNT_USAGE;
  3. Create a role:

    CREATE ROLErole;
  4. GrantIMPORTED privileges on the new database to the role:

    GRANT IMPORTED PRIVILEGES ON DATABASEdatabase TO ROLErole;
  5. Grant the role to the user you intend to use to run thedwh-migration-dumper tool:

    GRANT ROLErole TO USERuser;

Vertica

To allow thedwh-migration-dumper tool to connect to Vertica,download their JDBC driver from theirdownload page.

The following table describes the commonly used flags for extractingVertica metadata by using the extraction tool. Forinformation about all supported flags, seeglobal flags.

NameDefault valueDescriptionRequired
--connector The name of the connector to use, in this casevertica.Yes
--database

The name of the database to connect to.

Yes
--driverThe absolute or relative path to the driver JAR file to use for this connection. You can specify multiple driver JAR files, separating them by commas.Yes
--hostlocalhostThe hostname or IP address of the database server.No
--passwordThe password to use for the database connection.Yes
--port5433The port of the database server.No
--userThe username to use for the database connection.Yes

Examples

The following example shows how to extract metadata from a Verticadatabase on the local host:

dwh-migration-dumper\--driverpath/vertica-jdbc.jar\--connectorvertica\--databasedatabase--useruser--passwordpassword

BigQuery

The following table describes the commonly used flags for extractingBigQuery metadata by using the extraction tool. Forinformation about all supported flags, seeglobal flags.

NameDefault valueDescriptionRequired
--connector The name of the connector to use, in this casebigquery.Yes
--database

The list of projects to extract metadata and query logs from, separated by commas.

Yes
--schema

The list of datasets to extract metadata and query logs from, separated by commas.

Yes

Examples

The following example shows how to extract metadata from a Verticadatabase on the local host:

dwh-migration-dumper\--connectorbigquery\--databasePROJECT1,PROJECT2--schemaDATASET1,DATASET2

Global flags

The following table describes the flags that can be used with any of thesupported source platforms.

NameDescription
--connectorThe connector name for the source system.
--databaseUsage varies by source system.
--driverThe absolute or relative path to the driver JAR file to use when connecting to the source system. You can specify multiple driver JAR files, separating them by commas.
--dry-run or-nShow what actions the extraction tool would make without executing them.
--helpDisplays command-line help.
--hostThe hostname or IP address of the database server to connect to.
--jdbcDriverClassOptionally overrides the vendor-specified JDBC driver class name. Use this if you have a custom JDBC client.
--outputThe path of the output zip file. For example,dir1/dir2/teradata-metadata.zip. If you don't specify a path, the output file is created in your working directory. If you specify the path to a directory, the default zip filename is created in the specified directory. If the directory does not exist, it is created.

To use Cloud Storage, use the following format:
gs://<BUCKET>/<PATH>.

To authenticate using Google Cloud credentials, seeAuthenticate for using client libraries.

--passwordThe password to use for the database connection.
--portThe port of the database server.
--save-response-fileSaves your command line flags in a JSON file for easy re-use. The file is nameddumper-response-file.json and is created in the working directory. To use the response file, provide the path to it prefixed by@ when you run the extraction tool, for exampledwh-migration-dumper @path/to/dumper-response-file.json.
--schema

A list of the schemas to extract, separated by commas.

Oracle doesn't differentiate between aschema and the database user who created the schema, so you can use either schema names or user names with the--schema flag. For example,--schema schema1,user2,schema3.

--thread-pool-size

Sets the thread pool size, which affects the connection pool size. The default size of the thread pool is the number of cores on the server running thedwh-migration-dumper tool.

If the extraction tool seems slow or otherwise in need of more resources, you can raise the number of threads used. If there are indications that other processes on the server require more bandwidth, you can lower the number of threads used.

--url

The URL to use for the database connection, instead of the URI generated by the JDBC driver.

The generated URI should be sufficient in most cases. Only override the generated URI when you need to use a JDBC connection setting that is specific to the source platform and is not already set by one of the flags listed in this table.

--userThe username to use for the database connection.
--versionDisplays the product version.
--telemetry

Collects insights into the performance characteristics of runs, such as duration, run counts, and resource usage. This is enabled by default. To disable telemetry, set this flag tofalse.

Troubleshooting

This section explains some common issues and troubleshooting techniques forthedwh-migration-dumper tool.

Out of memory error

Thejava.lang.OutOfMemoryError error in thedwh-migration-dumper tool terminaloutput is often related to insufficient memory for processing retrieved data.To address this issue, increase available memory or reduce the number ofprocessing threads.

You can increase maximum memory by exporting theJAVA_OPTS environmentvariable:

Linux

exportJAVA_OPTS="-Xmx4G"

Windows

setJAVA_OPTS="-Xmx4G"

You can reduce the number of processing threads (default is 32) by includingthe--thread-pool-size flag. This option is supported forhiveql andredshift* connectors only.

dwh-migration-dumper--thread-pool-size=1

Handling aWARN...Task failed error

You might sometimes see aWARN [main]o.c.a.d.MetadataDumper [MetadataDumper.java:107] Task failed: … error in thedwh-migration-dumper tool terminal output. The extraction toolsubmits multiple queries to the source system, and the output of each queryis written to its own file. Seeing this issue indicates that one of thesequeries failed. However, failure of one query doesn't prevent the executionof the other queries. If you see more than a couple ofWARNerrors, review the issue details and see if there isanything that you need to correct in order for the query to run appropriately.For example, if the database user you specified when running theextraction tool lacks permissions to read all metadata, try againwith a user with the correct permissions.

Corrupted ZIP file

To validate thedwh-migration-dumper tool zip file, download theSHA256SUMS.txt file and run the following command:

Bash

sha256sum--checkSHA256SUMS.txt

TheOK result confirms successful checksum verification. Any other messageindicates verification error:

  • FAILED: computed checksum did NOT match: the zip file is corrupted andhas to be downloaded again.
  • FAILED: listed file could not be read: the zip file version can't belocated. Make sure the checksum and zip files are downloaded from thesame release version and placed in the same directory.

Windows PowerShell

(Get-FileHashRELEASE_ZIP_FILENAME).Hash-eq((Get-ContentSHA256SUMS.txt)-Split" ")[0]

Replace theRELEASE_ZIP_FILENAME with the downloadedzip filename of thedwh-migration-dumper command-line extraction tool release—for example,dwh-migration-tools-v1.0.52.zip

TheTrue result confirms successful checksum verification.

TheFalse result indicates verification error. Make sure the checksum andzip files are downloaded from the same release version and placed in thesame directory.

Teradata query logs extraction is slow

To improve performance of joining tables that are specified bythe-Dteradata-logs.query-logs-table and-Dteradata-logs.sql-logs-tableflags, you can include an additional columnof typeDATE in theJOIN condition. This column must be defined inboth tables and it must be part of the Partitioned Primary Index. To includethis column, use the-Dteradata-logs.log-date-column flag.

Example:

Bash

dwh-migration-dumper\-Dteradata-logs.query-logs-table=historicdb.ArchivedQryLogV\-Dteradata-logs.sql-logs-table=historicdb.ArchivedDBQLSqlTbl\-Dteradata-logs.log-date-column=ArchiveLogDate

Windows PowerShell

dwh-migration-dumper`"-Dteradata-logs.query-logs-table=historicdb.ArchivedQryLogV"`"-Dteradata-logs.sql-logs-table=historicdb.ArchivedDBQLSqlTbl"`"-Dteradata-logs.log-date-column=ArchiveLogDate"

Teradata row size limit exceeded

Teradata 15 has a 64kB row size limit. If the limit is exceeded, the dumperfails with the following message:none[Error 9804] [SQLState HY000] Response Row size or Constant Row size overflow

To resolve this error, either extend the row limit to 1MB or split the rows intomultiple rows:

  • Install and enable the 1MB Perm and Response Rows feature and current TTUsoftware. For more information, seeTeradata Database Message 9804
  • Split the long query text into multiple rows by using the-Dteradata.metadata.max-text-length and-Dteradata-logs.max-sql-lengthflags.

The following command shows the usage of the-Dteradata.metadata.max-text-length flag to split the long query text intomultiple rows of at most 10000 characters each:

Bash

dwh-migration-dumper\--connectorteradata\-Dteradata.metadata.max-text-length=10000

Windows PowerShell

dwh-migration-dumper`--connectorteradata`"-Dteradata.metadata.max-text-length=10000"

The following command shows the usage of the-Dteradata-logs.max-sql-length flag to split the long query text intomultiple rows of at most 10000 characters each:

Bash

dwh-migration-dumper\--connectorteradata-logs\-Dteradata-logs.max-sql-length=10000

Windows PowerShell

dwh-migration-dumper`--connectorteradata-logs`"-Dteradata-logs.max-sql-length=10000"

Oracle connection issue

In common cases like invalid password or hostname,dwh-migration-dumper toolprints a meaningful error message describing the root issue. However, in somecases, the error message returned by the Oracle server may be generic anddifficult to investigate.

One of these issues isIO Error: Got minus one from a read call. This errorindicates that the connection to Oracle server has been established but theserver did not accept the client and closed the connection. This issue typicallyoccurs when the server acceptsTCPS connections only. By default,dwh-migration-dumper tool uses theTCP protocol. To solve this issue you mustoverride the Oracle JDBC connection URL.

Instead of providing theoracle-service,host andport flags, you canresolve this issue by providing theurl flag in the following format:jdbc:oracle:thin:@tcps://{HOST_NAME}:{PORT}/{ORACLE_SERVICE}. Typically, theTCPS port number used by the Oracle server is2484.

Example dumper command:

dwh-migration-dumper\--connectororacle-stats\--url"jdbc:oracle:thin:@tcps://host:port/oracle_service"\--assessment\--driver"jdbc_driver_path"\--user"user"\--password

In addition to changing connection protocol to TCPS you might need to providethe trustStore SSL configuration that is required to verify Oracle servercertificate. A missing SSL configuration will result in anUnable to find validcertification path error message. To resolve this, set the JAVA_OPTSenvironment variable:

setJAVA_OPTS=-Djavax.net.ssl.trustStore="jks_file_location"-Djavax.net.ssl.trustStoreType=JKS-Djavax.net.ssl.trustStorePassword="password"

Depending on your Oracle server configuration, you might also need to providethe keyStore configuration. SeeSSL With Oracle JDBCDriver for moreinformation about configuration options.

What's next

After you run thedwh-migration-dumper tool,upload the output to Cloud Storagealong with the source files for translation.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.