Migration assessment

The BigQuery migration assessment lets you plan and review the migration of yourexisting data warehouse into BigQuery. You can run theBigQuery migration assessment to generate a report to assess the cost to storeyour data in BigQuery, to see how BigQuery can optimize yourexisting workload for cost savings, and to prepare a migration plan thatoutlines the time and effort required to complete your data warehouse migrationto BigQuery.

This document describes how to use the BigQuery migration assessment andthe different ways you can review the assessment results. This document isintended for users who are familiar with theGoogle Cloud console and thebatch SQL translator.

Before you begin

To prepare and run a BigQuery migration assessment, follow these steps:

  1. Create a Cloud Storage bucket.

    Note:Use the--pap flagto prevent your Cloud Storage bucket data from being publicly accessible.
  2. Extract metadata and query logs from your data warehouse using thedwh-migration-dumper tool.

  3. Upload your metadata and query logs to your Cloud Storage bucket.

  4. Run the migration assessment.

  5. Review the Looker Studio report.

  6. Optional:Query the assessment results to find detailed or specificassessment information.

Extract metadata and query logs from your data warehouse

Both metadata and query logs are needed for preparing the assessment withrecommendations.

To extract the metadata and query logs necessary to run the assessment, selectyour data warehouse:

Teradata

Requirements

  • A machine connected to your source Teradata data warehouse(Teradata 15 and later are supported)
  • A Google Cloud account with a Cloud Storage bucket to store thedata
  • An empty BigQuery dataset to store the results
  • Read permissions on the dataset to view the results
  • Recommended: Administrator-level access rights to the source databasewhen using the extraction tool to access system tables

Requirement: Enable logging

Thedwh-migration-dumper tool extracts three types of logs: query logs, utilitylogs, and resource usage logs. You need to enable logging for the followingtypes of logs to view more thorough insights:

Run thedwh-migration-dumper tool

Download thedwh-migration-dumper tool.

Download theSHA256SUMS.txt file and run the following command to verify zip correctness:

Bash

sha256sum--checkSHA256SUMS.txt

Windows PowerShell

(Get-FileHashRELEASE_ZIP_FILENAME).Hash-eq((Get-ContentSHA256SUMS.txt)-Split" ")[0]

Replace theRELEASE_ZIP_FILENAME with thedownloaded zip filename of thedwh-migration-dumper command-line extraction tool release—forexample,dwh-migration-tools-v1.0.52.zip

TheTrue result confirms successful checksum verification.

TheFalse result indicates verification error. Make sure the checksumand zip files are downloaded from the same release version and placed inthe same directory.

For details about how to set up and use the extraction tool, seeGenerate metadata for translation and assessment.

Use the extraction toolto extract logs and metadata from your Teradata data warehouse astwo zip files.Run the following commands on a machine with access to the sourcedata warehouse to generate the files.

Generate the metadata zip file:

dwh-migration-dumper\--connectorteradata\--databaseDATABASES\--driverpath/terajdbc4.jar\--hostHOST\--assessment\--userUSER\--passwordPASSWORD

Note: The--database flag is optional for theteradata connector. If omitted, then the metadata for all of the databases is extracted. This flag is only valid for theteradataconnector and can't be used withteradata-logs.

Generate the zip file containing query logs:

dwh-migration-dumper\--connectorteradata-logs\--driverpath/terajdbc4.jar\--hostHOST\--assessment\--userUSER\--passwordPASSWORD

Note: The--database flag isn't used when extractingquery logs with theteradata-logs connector. Query logs arealways extracted for all databases.

Replace the following:

  • PATH: the absolute or relative path to the driverJAR file to use for this connection
  • VERSION: the version of your driver
  • HOST: the host address
  • USER: the username to use for the databaseconnection
  • DATABASES: (Optional) the comma-separated list ofdatabase names to extract. If not provided, all databases are extracted.
  • PASSWORD: (Optional) the password to use for thedatabase connection. If left empty, the user is prompted for their password.

By default, the query logs are extractedfrom the viewdbc.QryLogV and from the tabledbc.DBQLSqlTbl. If you needto extract the query logs from an alternative location, you canspecify the names of the tables or views by using the-Dteradata-logs.query-logs-table and-Dteradata-logs.sql-logs-tableflags.

Tip: To improve performance of joining tables that are specified by the-Dteradata-logs.query-logs-table and-Dteradata-logs.sql-logs-tableflags, you can include an additional column of typeDATE in theJOIN condition. This column must be defined in both tablesand it must be part of the Partitioned Primary Index. To include thiscolumn, use the-Dteradata-logs.log-date-column flag.

By default, the utility logs are extracted from the tabledbc.DBQLUtilityTbl. If you need to extract the utility logs from analternative location, you can specify the name of the table using the-Dteradata-logs.utility-logs-table flag.

By default, the resource usage logs are extracted from the tablesdbc.ResUsageScpu anddbc.ResUsageSpma. If you need to extract theresource usage logs from an alternative location, you can specify the namesof the tables using the-Dteradata-logs.res-usage-scpu-table and-Dteradata-logs.res-usage-spma-table flags.

For example:

Bash

dwh-migration-dumper\--connectorteradata-logs\--driverpath/terajdbc4.jar\--hostHOST\--assessment\--userUSER\--passwordPASSWORD\-Dteradata-logs.query-logs-table=pdcrdata.QryLogV_hst\-Dteradata-logs.sql-logs-table=pdcrdata.DBQLSqlTbl_hst\-Dteradata-logs.log-date-column=LogDate\-Dteradata-logs.utility-logs-table=pdcrdata.DBQLUtilityTbl_hst\-Dteradata-logs.res-usage-scpu-table=pdcrdata.ResUsageScpu_hst\-Dteradata-logs.res-usage-spma-table=pdcrdata.ResUsageSpma_hst

Windows PowerShell

dwh-migration-dumper`--connectorteradata-logs`--driverpath\terajdbc4.jar`--hostHOST`--assessment`--userUSER`--passwordPASSWORD`"-Dteradata-logs.query-logs-table=pdcrdata.QryLogV_hst"`"-Dteradata-logs.sql-logs-table=pdcrdata.DBQLSqlTbl_hst"`"-Dteradata-logs.log-date-column=LogDate"`"-Dteradata-logs.utility-logs-table=pdcrdata.DBQLUtilityTbl_hst"`"-Dteradata-logs.res-usage-scpu-table=pdcrdata.ResUsageScpu_hst"`"-Dteradata-logs.res-usage-spma-table=pdcrdata.ResUsageSpma_hst"

By default, thedwh-migration-dumper tool extracts the last seven days ofquery logs.Google recommends that you provide at least two weeks of query logs to beable to view more thorough insights. You can specify a custom time range byusing the--query-log-start and--query-log-end flags. For example:

dwh-migration-dumper\--connectorteradata-logs\--driverpath/terajdbc4.jar\--hostHOST\--assessment\--userUSER\--passwordPASSWORD\--query-log-start"2023-01-01 00:00:00"\--query-log-end"2023-01-15 00:00:00"

You can also generate multiple zip files containing query logs coveringdifferent periods and provide all of them for assessment.

Redshift

Requirements

Run thedwh-migration-dumper tool

Download thedwh-migration-dumper command-line extraction tool.

Download theSHA256SUMS.txt file and run the following command to verify zip correctness:

Bash

sha256sum--checkSHA256SUMS.txt

Windows PowerShell

(Get-FileHashRELEASE_ZIP_FILENAME).Hash-eq((Get-ContentSHA256SUMS.txt)-Split" ")[0]

Replace theRELEASE_ZIP_FILENAME with thedownloaded zip filename of thedwh-migration-dumper command-line extraction tool release—forexample,dwh-migration-tools-v1.0.52.zip

TheTrue result confirms successful checksum verification.

TheFalse result indicates verification error. Make sure the checksumand zip files are downloaded from the same release version and placed inthe same directory.

For details about how to use thedwh-migration-dumper tool,see thegenerate metadata page.

Use thedwh-migration-dumper tool to extract logs and metadata from yourAmazon Redshift data warehouse as two zip files.Run the following commands on a machine with access to the sourcedata warehouse to generate the files.

Generate the metadata zip file:

dwh-migration-dumper\--connectorredshift\--databaseDATABASE\--driverPATH/redshift-jdbc42-VERSION.jar\--hosthost.region.redshift.amazonaws.com\--assessment\--userUSER\--iam-profileIAM_PROFILE_NAME

Generate the zip file containing query logs:

dwh-migration-dumper\--connectorredshift-raw-logs\--databaseDATABASE\--driverPATH/redshift-jdbc42-VERSION.jar\--hosthost.region.redshift.amazonaws.com\--assessment\--userUSER\--iam-profileIAM_PROFILE_NAME

Replace the following:

  • DATABASE: the name of the database to connect to
  • PATH: the absolute or relative path to the driverJAR file to use for this connection
  • VERSION: the version of your driver
  • USER: the username to use for the databaseconnection
  • IAM_PROFILE_NAME: theAmazon Redshift IAM profile name.Required for Amazon Redshift authentication and for AWSAPI access. To get the description of Amazon Redshift clusters, usethe AWS API.

By default, Amazon Redshift stores three to five days of query logs.

By default, thedwh-migration-dumper tool extracts the last seven days of querylogs.

Google recommends that you provide at least two weeks of query logs to beableto view more thorough insights. You might need to run theextraction tool a few timesover the course of two weeks to get the best results. You can specify a customrange by using the--query-log-start and--query-log-end flags.For example:

dwh-migration-dumper\--connectorredshift-raw-logs\--databaseDATABASE\--driverPATH/redshift-jdbc42-VERSION.jar\--hosthost.region.redshift.amazonaws.com\--assessment\--userUSER\--iam-profileIAM_PROFILE_NAME\--query-log-start"2023-01-01 00:00:00"\--query-log-end"2023-01-02 00:00:00"

You can also generate multiple zip files containing query logs coveringdifferent periods and provide all of them for assessment.

Note: Earlier versions of thedwh-migration-dumper tool preferred the--password option over--iam-profile. This option still works, but it isdeprecated, and it might lead to some gaps in the report.

Redshift Serverless

Preview: BigQuery Migration Assessment for Amazon RedshiftServerless is inPreview.To use this feature, you must be added to the allowlist. To request access,fill out theapplication formor send an email tobq-edw-migration-support@google.com.

Requirements

  • A machine connected to your source Amazon Redshift Serverless data warehouse
  • A Google Cloud account with a Cloud Storage bucket to store thedata
  • An empty BigQuery dataset to store the results
  • Read permissions on the dataset to view the results
  • Recommended: Super user access to the database when using the extractiontool to access system tables

Run thedwh-migration-dumper tool

Download thedwh-migration-dumper command-line extraction tool.

For details about how to use thedwh-migration-dumper tool, see theGenerate metadata page.

Use thedwh-migration-dumper tool to extract usage logs and metadata fromyour Amazon Redshift Serverless namespace as two zip files. Run the followingcommands on a machine with access to the source data warehouse to generatethe files.

Generate the metadata zip file:

dwh-migration-dumper\--connectorredshift\--databaseDATABASE\--driverPATH/redshift-jdbc42-VERSION.jar\--hosthost.region.redshift-serverless.amazonaws.com\--assessment\--userUSER\--iam-profileIAM_PROFILE_NAME

Generate the zip file containing query logs:

dwh-migration-dumper\--connectorredshift-serverless-logs\--databaseDATABASE\--driverPATH/redshift-jdbc42-VERSION.jar\--hosthost.region.redshift-serverless.amazonaws.com\--assessment\--userUSER\--iam-profileIAM_PROFILE_NAME

Replace the following:

  • DATABASE: the name of the database to connect to
  • PATH: the absolute or relative path to the driverJAR file to use for this connection
  • VERSION: the version of your driver
  • USER: the username to use for the databaseconnection
  • IAM_PROFILE_NAME: theAmazon Redshift IAM profile name.Required for Amazon Redshift authentication and for AWSAPI access. To get the description of Amazon Redshift clusters, usethe AWS API.

Amazon Redshift Serverless stores usage logs for seven days. If a widerrange is required, Google recommends extracting data multiple times over alonger period.

Snowflake

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Requirements

You must meet the following requirements in order to extract metadata andquery logs from Snowflake:

  • A machine that can connect to your Snowflake instance(s).
  • A Google Cloud account with a Cloud Storage bucket to store thedata.
  • An empty BigQuery dataset to store the results.Alternatively, you can create a BigQuery dataset when youcreate the assessment job using the Google Cloud console UI.
  • Snowflake user withIMPORTED PRIVILEGES access on the databaseSnowflake. We recommend creating aSERVICEuser with a key-pair basedauthentication. This provides the secure method for accessingSnowflake data platform without a need to generate MFA tokens.
    • To create a new service user follow theofficial Snowflakeguide.You will have to generate the RSA key-pair and assign public key to theSnowflake user.
    • Service user should have theACCOUNTADMIN role, or begranted a rolewith theIMPORTED PRIVILEGES privileges on the databaseSnowflake by an account administrator.
    • Alternatively to key-pair authentication, you can use the password-basedauthentication. However, starting from August 2025, Snowflakeenforces MFA on all password-based users. This requires you to approvethe MFA push notification when using our extraction tool.

Run thedwh-migration-dumper tool

Download thedwh-migration-dumper command-line extraction tool.

Download theSHA256SUMS.txt file and run the following command to verify zip correctness:

Bash

sha256sum--checkSHA256SUMS.txt

Windows PowerShell

(Get-FileHashRELEASE_ZIP_FILENAME).Hash-eq((Get-ContentSHA256SUMS.txt)-Split" ")[0]

Replace theRELEASE_ZIP_FILENAME with thedownloaded zip filename of thedwh-migration-dumper command-line extraction tool release—forexample,dwh-migration-tools-v1.0.52.zip

TheTrue result confirms successful checksum verification.

TheFalse result indicates verification error. Make sure the checksumand zip files are downloaded from the same release version and placed inthe same directory.

For details about how to use thedwh-migration-dumper tool,see thegenerate metadata page.

Use thedwh-migration-dumper tool to extract logs and metadata from yourSnowflake data warehouse as two zip files. Run the followingcommands on a machine with access to the source data warehouse to generatethe files.

Generate the metadata zip file:

dwh-migration-dumper\--connectorsnowflake\--hostHOST_NAME\--userUSER_NAME\--roleROLE_NAME\--warehouseWAREHOUSE\--assessment\--private-key-filePRIVATE_KEY_PATH\--private-key-passwordPRIVATE_KEY_PASSWORD

Generate the zip file containing query logs:

dwh-migration-dumper\--connectorsnowflake-logs\--hostHOST_NAME\--userUSER_NAME\--roleROLE_NAME\--warehouseWAREHOUSE\--query-log-startSTARTING_DATE\--query-log-endENDING_DATE\--assessment\--private-key-filePRIVATE_KEY_PATH\--private-key-passwordPRIVATE_KEY_PASSWORD

Replace the following:

  • HOST_NAME: the hostname of your Snowflakeinstance.
  • USER_NAME: the username to use for the databaseconnection, where the user must have the access permissions as detailed intherequirements section.
  • PRIVATE_KEY_PATH: the path to the RSA private keyused for authentication.
  • PRIVATE_KEY_PASSWORD: (Optional) the password thatwas used when creating the RSA private key. It is required only if privatekey is encrypted.
  • ROLE_NAME: (Optional) the user role when running thedwh-migration-dumper tool—for example,ACCOUNTADMIN.
  • WAREHOUSE: the warehouse used to execute thedumping operations. If you have multiple virtual warehouses, you canspecify any warehouse to execute this query. Running this query with theaccess permissions detailed in therequirements sectionextracts all warehouse artefacts in this account.
  • STARTING_DATE: (Optional) used to indicate thestart date in a date range of query logs, written in the formatYYYY-MM-DD.
  • ENDING_DATE: (Optional) used to indicate theend date in a date range of query logs, written in the formatYYYY-MM-DD.

You can also generate multiple zip files containing query logs coveringnon-overlapping periods and provide all of them for assessment.

Oracle

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

To request feedback or support for this feature, send an email tobq-edw-migration-support@google.com.

Requirements

You must meet the following requirements in order to extract metadata andquery logs from Oracle:

  • Your Oracle database must be version 11g R1 or higher.
  • A machine that can connect to your Oracle instance(s).
  • Java 8 or higher.
  • A Google Cloud account with a Cloud Storage bucket to store thedata.
  • An empty BigQuery dataset to store the results.Alternatively, you can create a BigQuery dataset when youcreate the assessment job using the Google Cloud console UI.
  • An Oracle common user with SYSDBA privileges.

Run thedwh-migration-dumper tool

Download thedwh-migration-dumper command-line extraction tool.

Download theSHA256SUMS.txt fileand run the following command to verify zip correctness:

sha256sum--checkSHA256SUMS.txt

For details about how to use thedwh-migration-dumper tool,see thegenerate metadata page.

Use thedwh-migration-dumper tool to extract metadata and performancestatistics to the zip file. By default, statistics are extracted from theOracle AWR that requires the Oracle Tuning and DiagnosticsPack. If this data is not available,dwh-migration-dumper uses STATSPACKinstead.

For multitenant databases, thedwh-migration-dumper tool must be executedin the root container. Running it in one of the pluggable databases resultsin missing performance statistics and metadata about other pluggabledatabases.

Generate the metadata zip file:

dwh-migration-dumper\--connectororacle-stats\--hostHOST_NAME\--portPORT\--oracle-serviceSERVICE_NAME\--assessment\--driverJDBC_DRIVER_PATH\--userUSER_NAME\--password

Replace the following:

  • HOST_NAME: the hostname of your Oracleinstance.
  • PORT: the connection port number. The defaultvalue is 1521.
  • SERVICE_NAME: the Oracle service name to use forthe connection.
  • JDBC_DRIVER_PATH: the absolute or relative path tothe driver JAR file. You can download this file from theOracle JDBC driver downloadspage. You should select the driver version that is compatible with yourdatabase version.
  • USER_NAME: name of the user used to connect toyour Oracle instance. The user must have the access permissionsas detailed in therequirements section.

Hadoop / Cloudera

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

To request feedback or support for this feature, send an email tobq-edw-migration-support@google.com.

Requirements

You must have the following to extract metadata from Cloudera:

  • A machine that can connect to the Cloudera Manager API.
  • A Google Cloud account with a Cloud Storage bucket to store thedata.
  • An empty BigQuery dataset to store the results.Alternatively, you can create a BigQuery dataset when youcreate the assessment job.

Run thedwh-migration-dumper tool

  1. Download thedwh-migration-dumper command-line extraction tool.

  2. Download theSHA256SUMS.txt file.

  3. In your command-line environment, verify zip correctness:

    sha256sum--checkSHA256SUMS.txt

    For details about how to use thedwh-migration-dumper tool, seeGenerate metadata for translation and assessment.

  4. Use thedwh-migration-dumper tool to extract metadata and performancestatistics to the zip file:

    dwh-migration-dumper\--connectorcloudera-manager\--userUSER_NAME\--passwordPASSWORD\--urlURL_PATH\--yarn-application-types"APP_TYPES"\--pagination-page-sizePAGE_SIZE\--start-dateSTART_DATE\--end-dateEND_DATE\--assessment

    Replace the following:

    • USER_NAME: the name of the user to connect toyour Cloudera Manager instance.
    • PASSWORD: the password for your ClouderaManager instance.
    • URL_PATH: the URL path to the ClouderaManager API, for example,https://localhost:7183/api/v55/.
    • APP_TYPES (optional): the comma-separated YARNapplication types that are dumped from the cluster. The default valueisMAPREDUCE,SPARK,Oozie Launcher.
    • PAGE_SIZE (optional): the number of recordsper Cloudera response. The default value is1000.
    • START_DATE (optional): the start date for yourhistory dump in ISO 8601 format, for example2025-05-29. The defaultvalue is 90 days before the current date.
    • END_DATE (optional): the end date for yourhistory dump in ISO 8601 format, for example2025-05-30. The defaultvalue is the current date.

Use Oozie in your Cloudera cluster

If you use Oozie in your Cloudera cluster, you can dump Oozie job historywith the Oozie connector. You can use Oozie with Kerberos authenticationor basic authentication.

For Kerberos authentication, run the following:

kinitdwh-migration-dumper\--connectoroozie\--urlURL_PATH\--assessment

Replace the following:

  • URL_PATH (optional): the Oozie server URL path.If you don't specify the URL path, it's taken from theOOZIE_URLenvironment variable.

For basic authentication, run the following:

dwh-migration-dumper\--connectoroozie\--userUSER_NAME\--passwordPASSWORD\--urlURL_PATH\--assessment

Replace the following:

  • USER_NAME: the name of the Oozie user.
  • PASSWORD: the user password.
  • URL_PATH (optional): the Oozie server URL path.If you don't specify the URL path, it's taken from theOOZIE_URLenvironment variable.

Use Airflow in your Cloudera cluster

If you use Airflow in your Cloudera cluster, you can dump DAGs historywith the Airflow connector:

dwh-migration-dumper\--connectorairflow\--userUSER_NAME\--passwordPASSWORD\--urlURL\--driver"DRIVER_PATH"\--start-dateSTART_DATE\--end-dateEND_DATE\--assessment

Replace the following:

  • USER_NAME: the name of the Airflow user
  • PASSWORD: the user password
  • URL: the JDBC string to the Airflow database
  • DRIVER_PATH: the path to the JDBC driver
  • START_DATE (optional): the start date for yourhistory dump in ISO 8601 format
  • END_DATE (optional): the end date for your historydump in ISO 8601 format

Use Hive in your Cloudera cluster

To use the Hive connector, see theApache Hive tab.

Apache Hive

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Requirements

  • A machine connected to your source Apache Hive data warehouse(BigQuery migration assessment supports Hive on Tez andMapReduce, and supports Apache Hive versions between 2.2 and 3.1,inclusively)
  • A Google Cloud account with a Cloud Storage bucket to store thedata
  • An empty BigQuery dataset to store the results
  • Read permissions on the dataset to view the results
  • Access to your source Apache Hive data warehouse to configurequery logs extraction
  • Up to date tables, partitions, and columns statistics

The BigQuery migration assessment uses tables, partitions, and columns statistics tounderstand your Apache Hive data warehouse better and providethorough insights. If thehive.stats.autogather configurationsetting is set tofalse in your source Apache Hive data warehouse,Google recommends enabling it or updating statistics manually beforerunning thedwh-migration-dumper tool.

Run thedwh-migration-dumper tool

Download thedwh-migration-dumper command-line extraction tool.

Download theSHA256SUMS.txt file and run the following command to verify zip correctness:

Bash

sha256sum--checkSHA256SUMS.txt

Windows PowerShell

(Get-FileHashRELEASE_ZIP_FILENAME).Hash-eq((Get-ContentSHA256SUMS.txt)-Split" ")[0]

Replace theRELEASE_ZIP_FILENAME with thedownloaded zip filename of thedwh-migration-dumper command-line extraction tool release—forexample,dwh-migration-tools-v1.0.52.zip

TheTrue result confirms successful checksum verification.

TheFalse result indicates verification error. Make sure the checksumand zip files are downloaded from the same release version and placed inthe same directory.

For details about how to use thedwh-migration-dumper tool, seeGenerate metadata for translation and assessment.

Use thedwh-migration-dumper tool to generate metadata from yourHive data warehouse as a zip file.

Without Authentication

To generate the metadata zip file, run the following command on a machinethat has access to the source data warehouse:

dwh-migration-dumper\--connectorhiveql\--databaseDATABASES\--hosthive.cluster.host\--port9083\--assessment

With Kerberos Authentication

To authenticate to the metastore, sign in as a user that has access to theApache Hive metastore and generate a Kerberos ticket. Then,generate the metadata zip file with the following command:

JAVA_OPTS="-Djavax.security.auth.useSubjectCredsOnly=false"\dwh-migration-dumper\--connectorhiveql\--databaseDATABASES\--hosthive.cluster.host\--port9083\--hive-kerberos-urlPRINCIPAL/HOST\-Dhiveql.rpc.protection=hadoop.rpc.protection\--assessment

Replace the following:

  • DATABASES: the comma-separated list of databasenames to extract. If not provided, all databases are extracted.
  • PRINCIPAL: the kerberos principal that the ticket is issued to
  • HOST: the kerberos hostname that the ticket is issued to
  • hadoop.rpc.protection: the Quality of Protection (QOP)of the Simple Authentication and Security Layer (SASL) configuration level,equal to the value ofhadoop.rpc.protection parameter inside the/etc/hadoop/conf/core-site.xml file, with one of the following values:
    • authentication
    • integrity
    • privacy

Extract query logs with thehadoop-migration-assessment logging hook

To extract query logs, follow these steps:

  1. Upload thehadoop-migration-assessment logging hook.
  2. Configure the logging hook properties.
  3. Verify the logging hook.

Upload thehadoop-migration-assessment logging hook

  1. Download thehadoop-migration-assessment query logs extraction logging hook that contains theHive logging hook JAR file.

  2. Extract the JAR file.

    If you need to audit the tool to ensure that it meets compliancerequirements, review the source code from thehadoop-migration-assessment logging hook GitHub repository, and compile your own binary.

  3. Copy the JAR file into the auxiliary library folder on all clusters whereyou plan to enable the query logging. Depending on your vendor, you needto locate the auxiliary library folder in cluster settings and transferthe JAR file to the auxiliary library folder on the Hive cluster.

  4. Set up configuration properties forhadoop-migration-assessment logging hook.Depending on your Hadoop vendor, you need to usethe UI console to edit cluster settings. Modify the/etc/hive/conf/hive-site.xml file or apply the configurationwith the configuration manager.

Configure properties

If you already have other values for the followingconfiguration keys, append the settings using a comma (,).To set uphadoop-migration-assessment logging hook, the following configurationsettings are required:

  • hive.exec.failure.hooks:com.google.cloud.bigquery.dwhassessment.hooks.MigrationAssessmentLoggingHook
  • hive.exec.post.hooks :com.google.cloud.bigquery.dwhassessment.hooks.MigrationAssessmentLoggingHook
  • hive.exec.pre.hooks:com.google.cloud.bigquery.dwhassessment.hooks.MigrationAssessmentLoggingHook
  • hive.aux.jars.path: include the path to the logging hook JAR file, for examplefile:///HiveMigrationAssessmentQueryLogsHooks_deploy.jar.
  • dwhassessment.hook.base-directory: path to the query logsoutput folder. For example,hdfs://tmp/logs/.
  • You can also set the following optional configurations:

    • dwhassessment.hook.queue.capacity: the queue capacity forthe query events logging threads. The default value is64.
    • dwhassessment.hook.rollover-interval: the frequency atwhich the file rollover must be performed. For example,600s.The default value is 3600 seconds (1 hour).
    • dwhassessment.hook.rollover-eligibility-check-interval: thefrequency at which the file rollover eligibility check is triggered in thebackground. For example,600s. The default value is 600seconds (10 minutes).
Important: To apply the configuration changes, you must restart Hiveservices.

Verify the logging hook

After you restart thehive-server2 process, run a test queryand analyze your debug logs. You can see the following message:

Logger successfully started, waiting for query events. Log directory is '[dwhassessment.hook.base-directory value]'; rollover interval is '60' minutes;rollover eligibility check is '10' minutes

The logging hook creates a date-partitioned subfolder inthe configured folder. The Avro file with query events appears in thatfolder after thedwhassessment.hook.rollover-interval intervalorhive-server2 process termination. You can look for similarmessages in your debug logs to see the status of the rollover operation:

Updated rollover time for logger ID 'my_logger_id' to '2023-12-25T10:15:30'
Performed rollover check for logger ID 'my_logger_id'. Expected rollover timeis '2023-12-25T10:15:30'

Rollover happens at the specified intervals or when the day changes. Whenthe date changes, the logging hook also creates a newsubfolder for that date.

Google recommends that you provide at least two weeks of query logs to beable to view more thorough insights.

You can also generate folders containing query logs from differentHive clusters and provide all of them for a singleassessment.

Informatica

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

To request feedback or support for this feature, send an email tobq-edw-migration-support@google.com.

Requirements

  • Access to Informatica PowerCenter Repository Manager client
  • A Google Cloud account with a Cloud Storage bucket to store thedata.
  • An empty BigQuery dataset to store the results.Alternatively, you can create a BigQuery dataset when youcreate the assessment job using the Google Cloud console.

Requirement: Export object files

You can use the Informatica PowerCenter Repository Manager GUI to export yourobject files. For information, seeSteps to Export Objects

Alternatively, you can also run thepmrep command to export your objectfiles with the following steps:

  1. Run thepmrep connect command to connect to the repository:
pmrepconnect-r`REPOSITORY_NAME`-d`DOMAIN_NAME`-n`USERNAME`-x`PASSWORD`

Replace the following:

  • REPOSITORY_NAME: name of the repository you want to connect to
  • DOMAIN_NAME: name of the domain for the repository
  • USERNAME: username to connect to the repository
  • PASSWORD: password of the username
  1. Once connected to the repository, use thepmrep objectexport command to export the required objects:
pmrepobjectexport-n`OBJECT_NAME`-o`OBJECT_TYPE`-f`FOLDER_NAME`-u`OUTPUT_FILE_NAME.xml`

Replace the following:

  • OBJECT_NAME: name of a specific object to export
  • OBJECT_TYPE: object type of the specified object
  • FOLDER_NAME: name of the folder containing the object to export
  • OUTPUT_FILE_NAME: name of the XML file to contain the object information

Upload metadata and query logs to Cloud Storage

Once you have extracted the metadata and query logs from your data warehouse,you can upload the files to a Cloud Storage bucket to proceed with themigration assessment.

Teradata

Upload the metadata and one or more zip files containing query logs to yourCloud Storage bucket. For more information about creating buckets anduploading files to Cloud Storage, seeCreate bucketsandUpload objects from a filesystem.The limit for the total uncompressed size of all the files inside the metadatazip file is 50 GB.

The entries in all the zip files containing query logs are divided into thefollowing:

  • Query history files with thequery_history_ prefix.
  • Time series files with theutility_logs_,dbc.ResUsageScpu_, anddbc.ResUsageSpma_ prefixes.

The limit for the total uncompressed size of all the query history files is5 TB.The limit for the total uncompressed size of all the time series files is1 TB.

In case the query logs are archived in a different database, seethe description of the-Dteradata-logs.query-logs-table and-Dteradata-logs.sql-logs-table flags earlier in this section, which explainshow to provide an alternative location for the query logs.

Redshift

Upload the metadata and one or more zip files containing query logs to yourCloud Storage bucket. For more information about creating buckets anduploading files to Cloud Storage, seeCreate bucketsandUpload objects from a filesystem.The limit for the total uncompressed size of all the files inside the metadatazip file is 50 GB.

The entries in all the zip files containing query logs are divided into thefollowing:

  • Query history files with thequerytext_ andddltext_ prefixes.
  • Time series files with thequery_queue_info_,wlm_query_, andquerymetrics_ prefixes.

The limit for the total uncompressed size of all the query history files is5 TB.The limit for the total uncompressed size of all the time series files is1 TB.

Redshift Serverless

Preview: BigQuery Migration Assessment for Amazon RedshiftServerless is inPreview.To use this feature, you must be added to the allowlist. To request access,fill out theapplication formor send an email tobq-edw-migration-support@google.com.

Upload the metadata and one or more zip files containing query logs to yourCloud Storage bucket. For more information about creating buckets anduploading files to Cloud Storage, seeCreate bucketsandUpload objects from a filesystem.

Snowflake

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Upload the metadata and the zip file(s) containing query logs and usagehistories to your Cloud Storagebucket. When uploading these files to Cloud Storage, the followingrequirements must be met:

  • The total uncompressed size of all the files inside the metadata zip filemust be less than 50 GB.
  • The metadata zip file and the zip file containing query logs must beuploaded to a Cloud Storage folder. If you have multiple zip filescontaining non-overlapping query logs, you can upload all of them.
  • You must upload all the files to the same Cloud Storage folder.
  • You must upload all of the metadata and query logs zip files exactly asthey are output bydwh-migration-dumper tool. Don't extract, combine, orotherwise modify them.
  • The total uncompressed size of all the query history files must be lessthan 5 TB.

For more information about creating buckets and uploading files toCloud Storage, seeCreate bucketsandUpload objects from a filesystem.

Oracle

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

To request feedback or support for this feature, send email tobq-edw-migration-support@google.com.

Upload the zip file containing metadata and performance statistics to aCloud Storage bucket. By default, the filename for the zip file isdwh-migration-oracle-stats.zip, but you can customize this by specifying itin the--output flag. The limit for the total uncompressed size of all thefiles inside the zip file is 50 GB.

For more information about creating buckets and uploading files toCloud Storage, seeCreate bucketsandUpload objects from a filesystem.

Hadoop / Cloudera

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

To request feedback or support for this feature, send email tobq-edw-migration-support@google.com.

Upload the zip file containing metadata and performance statistics to aCloud Storage bucket. By default, the filename for the zip file isdwh-migration-cloudera-manager-RUN_DATE.zip (forexampledwh-migration-cloudera-manager-20250312T145808.zip), but you cancustomize it with the--output flag. The limit for the total uncompressedsize of all files inside the zip file is 50 GB.

For more information about creating buckets and uploading files toCloud Storage, seeCreate a bucketandUpload objects from a file system.

Apache Hive

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Upload the metadata and folders containing query logs from one ormultiple Hive clusters to your Cloud Storagebucket. For more information about creating buckets and uploading files toCloud Storage, seeCreate bucketsandUpload objects from a filesystem.

The limit for the total uncompressed size of all the files inside the metadatazip file is 50 GB.

You can useCloud Storage connectorto copy query logs directly to the Cloud Storage folder.The folders containing subfolders with query logs must be uploaded to thesame Cloud Storage folder, where the metadata zip file is uploaded.

Query logs folders have query history files with thedwhassessment_prefix. The limit for the total uncompressed size of all the query historyfiles is 5 TB.

Informatica

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

To request feedback or support for this feature, send email tobq-edw-migration-support@google.com.

Upload a zip file containing your Informatica XML repository objects to aCloud Storage bucket. This zip file must also include acompilerworks-metadata.yaml file that contains the following:

product:arguments:"ConnectorArguments{connector=informatica, assessment=true}"

The limit for the total uncompressed size of all files inside the zip fileis 50 GB.

For more information about creating buckets and uploading files toCloud Storage, seeCreate bucketsandUpload objects from a filesystem.

Run a BigQuery migration assessment

Follow these steps to run the BigQuery migration assessment. These steps assume you haveuploaded the metadata files into a Cloud Storage bucket, as described in theprevious section.

Required permissions

To enable the BigQuery Migration Service, you need the followingIdentity and Access Management (IAM) permissions:

  • resourcemanager.projects.get
  • resourcemanager.projects.update
  • serviceusage.services.enable
  • serviceusage.services.get

To access and use the BigQuery Migration Service, you need the followingpermissions on the project:

  • bigquerymigration.workflows.create
  • bigquerymigration.workflows.get
  • bigquerymigration.workflows.list
  • bigquerymigration.workflows.delete
  • bigquerymigration.subtasks.get
  • bigquerymigration.subtasks.list
Note: You can only set the permissions and roles with thebigquerymigration.*prefix using the Google Cloud CLI. For information on how to set up and use theGoogle Cloud CLI, see thegcloud CLI tool overview.

To run the BigQuery Migration Service, you need the following additionalpermissions.

  • Permission to access the Cloud Storage buckets for input and output files:

    • storage.objects.get on the source Cloud Storage bucket
    • storage.objects.list on the source Cloud Storage bucket
    • storage.objects.create on the destination Cloud Storage bucket
    • storage.objects.delete on the destination Cloud Storage bucket
    • storage.objects.update on the destination Cloud Storage bucket
    • storage.buckets.get
    • storage.buckets.list
  • Permission to read and update the BigQuery dataset where theBigQuery Migration Service writes the results:

    • bigquery.datasets.update
    • bigquery.datasets.get
    • bigquery.datasets.create
    • bigquery.datasets.delete
    • bigquery.jobs.create
    • bigquery.jobs.delete
    • bigquery.jobs.list
    • bigquery.jobs.update
    • bigquery.tables.create
    • bigquery.tables.get
    • bigquery.tables.getData
    • bigquery.tables.list
    • bigquery.tables.updateData

To share the Looker Studio report with a user, you need to grant thefollowing roles:

  • roles/bigquery.dataViewer
  • roles/bigquery.jobUser

The following example shows you how to grant the required roles to a user that you want to share the report with:

gcloudprojectsadd-iam-policy-binding\" translate="no">PROJECT \  --member=user:REPORT_VIEWER_EMAIL \  --role=roles/bigquery.dataViewergcloud projects add-iam-policy-binding \PROJECT \  --member=user:REPORT_VIEWER_EMAIL \  --role=roles/bigquery.jobUser

Replace the following:

  • PROJECT: the project that the user is in
  • REPORT_VIEWER_EMAIL: the email of the user that you want to share the report with

Create a project for the assessment

We recommend that you create and set up a new project to run your migration assessment.You can use the following script to create a new Google Cloud project withall the necessary permissions and role assignments to run the assessment:

#!/bin/bash# --- Configuration ---# Replace with your desired project ID, the email of the user that runs# the assessment, and your organization ID.exportPROJECT_ID="PROJECT_ID"exportASSESSMENT_RUNNER_EMAIL="RUNNER_EMAIL"exportORGANIZATION_ID="ORGANIZATION_ID"# --- Project Creation ---echo"Creating project:$PROJECT_ID"gcloudprojectscreate$PROJECT_ID--organization=$ORGANIZATION_ID# Set the new project as the default for subsequent gcloud commandsgcloudconfigsetproject$PROJECT_ID# --- IAM Role Creation ---echo"Creating custom role 'BQMSrole' in project$PROJECT_ID"gcloudiamrolescreateBQMSrole\--project=$PROJECT_ID\--title=BQMSrole\--permissions=bigquerymigration.subtasks.get,bigquerymigration.subtasks.list,bigquerymigration.workflows.create,bigquerymigration.workflows.get,bigquerymigration.workflows.list,bigquerymigration.workflows.delete,resourcemanager.projects.update,resourcemanager.projects.get,serviceusage.services.enable,serviceusage.services.get,storage.objects.get,storage.objects.list,storage.objects.create,storage.objects.delete,storage.objects.update,bigquery.datasets.get,bigquery.datasets.update,bigquery.datasets.create,bigquery.datasets.delete,bigquery.tables.get,bigquery.tables.create,bigquery.tables.updateData,bigquery.tables.getData,bigquery.tables.list,bigquery.jobs.create,bigquery.jobs.update,bigquery.jobs.list,bigquery.jobs.delete,storage.buckets.list,storage.buckets.get# --- IAM Policy Binding for Assessment Runner ---echo"Granting IAM roles to the assessment runner:$ASSESSMENT_RUNNER_EMAIL"# Grant the custom BQMSrole to the assessment runner usergcloudprojectsadd-iam-policy-binding\$PROJECT_ID\--member=user:$ASSESSMENT_RUNNER_EMAIL\--role=projects/$PROJECT_ID/roles/BQMSrole# Grant the BigQuery Data Viewer role to the assessment runner usergcloudprojectsadd-iam-policy-binding\$PROJECT_ID\--member=user:$ASSESSMENT_RUNNER_EMAIL\--role=roles/bigquery.dataViewer# Grant the BigQuery Job User role to the assessment runner usergcloudprojectsadd-iam-policy-binding\$PROJECT_ID\--member=user:$ASSESSMENT_RUNNER_EMAIL\--role=roles/bigquery.jobUserecho"Project$PROJECT_ID created and configured for BigQuery Migration Assessment."echo"Assessment Runner:$ASSESSMENT_RUNNER_EMAIL"

Replace the following:

  • PROJECT_ID: the name of a new project ID
  • RUNNER_EMAIL: the email of the user running the migration assessment
  • ORGANIZATION_ID: the organization ID. For example,123456789012

Supported locations

The BigQuery migration assessment feature is supported in all BigQuerylocations. For a list of BigQuery locations, seeSupported locations.

Before you begin

Before you run the assessment, you must enable the BigQuery Migration API andcreate a BigQuery dataset to store the results of the assessment.

Enable the BigQuery Migration API

Enable the BigQuery Migration API as follows:

  1. In the Google Cloud console, go to theBigQuery MigrationAPI page.

    Go to BigQuery Migration API

  2. ClickEnable.

Create a dataset for the assessment results

The BigQuery migration assessment writes the assessment results to tables inBigQuery. Before you begin,create adataset to hold these tables. When you share theLooker Studio report, you must also give users permission to read thisdataset. For more information, seeMake the report available tousers.

Note: The dataset should be in the same region as the Cloud Storage bucket containing the metadata and log files extracted from the source database. However, if the Cloud Storage bucket is located in a multi-region, then the dataset must be in any of the regions inside this multi-region.

Run the migration assessment

Console

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the navigation menu underMigration, clickServices.

  3. ClickStart Assessment.

  4. Fill in the assessment configuration dialog.

    1. ForDisplay name, enter the name which can containletters, numbers or underscores. This name is only for display purposesand does not have to be unique.
    2. ForAssessment data source, choose your data warehouse.
    3. ForPath to input files, enter the path to the Cloud Storagebucket that contains your extracted files.
    4. To choose how your assessment results are stored, do one of thefollowing options:

      • Keep theAutomatically create the new BigQuery dataset checkbox selected to have theBigQuery dataset created automatically. The name ofthe dataset is generated automatically.
      • Clear theAutomatically create the new BigQuery dataset checkbox and either choose the existing emptyBigQuery dataset using the formatprojectId.datasetId, or create a new dataset name. In this optionyou can choose the BigQuery dataset name.
      Important: The Cloud Storage bucket location and theBigQuery dataset location must be in the samemulti-region or in the location inside this multi-region. For moreinformation on location constraints, seeLocation considerations.

    Option 1 - automatic BigQuery dataset generation(default)Assessment configuration dialog.

    Option 2 - manual BigQuery dataset creation:Assessment configuration dialog with manual dataset creation.

  5. ClickCreate. You can see the status of the job in the assessment jobslist.

    While the assessment is running, you can check its progress and estimatedtime to complete in the tooltip of the status icon.

    Assessment progress in the tooltip.

  6. While the assessment is running, you can click theView report link inthe assessment jobs list to view the assessment report with partial data inLooker Studio. TheView report link might take some time toappear while the assessment is running. The report opens in a new tab.

    The report is updated with new data as they are processed. Refresh the tabwith the report or clickView report again to see the updated report.

  7. After the assessment is complete, clickView report to view thecomplete assessment report in Looker Studio. The report opens in anew tab.

API

Call thecreatemethod with a definedworkflow.

Then call thestartmethod to start the assessment workflow.

The assessment creates tablesin the BigQuery dataset you created earlier. You can query thesefor information about the tables and queries used in your existing datawarehouse.For information about the output files of the translation, seeBatch SQL translator.

Shareable aggregated assessment result

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

For Amazon Redshift, Teradata, and Snowflake assessments, inaddition to the previously created BigQuery dataset, the workflowcreates another lightweight dataset with the same name, plus the_shareableRedactedAggregate suffix. This dataset contains highlyaggregated data that is derived from the output dataset, and contains nopersonally identifiable information (PII).

To find, inspect, and securely share the dataset with other users, seeQuery the migration assessment output tables.

The feature is on by default, but you can opt out using thepublic API.

Assessment details

To view the Assessment details page, click the display name in the assessmentjobs list.

Assessment list page.

The assessment details page contains theConfiguration tab, where you canview more information about an assessment job, and theErrors tab, where youcan review any errors that happened during the assessment processing.

View theConfiguration tab to see the properties of the assessment.

Assessment details page - configuration tab.

View theErrors tab to see the errors that happened during assessment processing.

Assessment details page - errors tab.

Review and share the Looker Studio report

After the assessment task completes, you can create and share aLooker Studio report of the results.

Review the report

Click theView report link listed next to your individual assessment task.The Looker Studio report opens in a new tab, in a preview mode. Youcan use preview mode to review the content of the report before sharing itfurther.

The report looks similar to the following screenshot:

Assessment report.

To see which views are contained in the report, select your data warehouse:

Teradata

The report is a three-part narrative that's prefaced by a summaryhighlights page. That page includes the following sections:

  • Existing system. This section is a snapshot of the existingTeradata system and usage, including the number of databases,schemas, tables, and total size in TB. It also lists the schemas by sizeand points to potential sub-optimal resource utilization (tables with nowrites or few reads).
  • BigQuery steady state transformations (suggestions).This sectionshows what the system will look like on BigQueryafter migration. It includes suggestions for optimizingworkloads on BigQuery (and avoiding wastage).
  • Migration plan. This section provides information aboutthe migration effortitself—for example, getting from the existing systemto the BigQuery steadystate. This section includes the count of queries thatwere automaticallytranslated and the expected time to move each table intoBigQuery.

The details of each section include the following:

Existing system

  • Compute & Queries
    • CPU utilization:
      • Heatmap of hourly average CPU utilization (overall systemresource utilization view)
      • Queries by hour and day with CPU utilization
      • Queries by type (read/write) with CPU utilization
      • Applications with CPU utilization
      • Overlay of the hourly CPU utilization with average hourly queryperformance and average hourly application performance
    • Queries histogram by type and query durations
    • Applications details view (app, user, unique queries, reportingversus ETL breakdown)
  • Storage Overview
    • Databases by volume, views, and access rates
    • Tables with access rates by users, queries, writes, andtemporary table creations
  • Applications: Access rates and IP addresses

BigQuery steady state transformations (suggestions)

  • Join indexes converted to materialized views
  • Clustering and partitioning candidates based on metadata and usage
  • Low latency queries identified as candidates for BigQuery BI Engine
  • Columns configured with default values that usethe column description feature to store default values
  • Unique indexes in Teradata(to prevent rows with non-unique keys ina table) use staging tables and aMERGE statementto insert onlyunique records into the target tables and then discard duplicates
  • Remaining queries and schema translated as-is

Migration plan

  • Detailed view with automatically translated queries
    • Count of total queries with ability to filter by user,application, affected tables, queried tables, and query type
    • Buckets of queries with similar patterns grouped and showntogether so that the user is able to see the translationphilosophy by query types
  • Queries requiring human intervention
    • Queries with BigQuery lexical structureviolations
    • User-defined functions and procedures
    • BigQuery reserved keywords
  • Tables schedules by writes and reads (to group them for moving)
  • Data migration with the BigQuery Data Transfer Service:Estimated time to migrate by table

TheExisting System section contains the following views:

System Overview
The System Overview view provides the high-level volume metrics of the key componentsin the existing system for a specified time period. The timeline that isevaluated depends on the logs that were analyzed by the BigQuery migration assessment.This view gives you quick insight into the source data warehouse utilization,which you can use for migration planning.
Table Volume
The Table Volume view provides statistics on the largest tables and databasesfound by the BigQuery migration assessment. Because large tables may take longer toextract from the source data warehouse system, this view can be helpful inmigration planning and sequencing.
Table Usage
The Table Usage view provides statistics on which tables are heavily usedwithin the source data warehouse system. Heavily used tables can help you tounderstand which tables might have many dependencies and require additionalplanning during the migration process.
Applications
The Applications Usage view and the Applications Patterns view providestatistics on applications found during processing of logs. These viewslet users understand usage of specific applications over time andthe impact onresource usage. During a migration, it's important to visualize theingestion and consumption of data to gain a better understanding of thedependencies of the data warehouse, and to analyze the impact of movingvarious dependent applications together. The IP Address table can beuseful for pinpointing the exact application using the data warehouseover JDBC connections.
Queries
The Queries view gives a breakdown of the types of SQL statements executed andstatistics of their usage. You can use the histogram of Query Type and Time toidentify low periods of system utilization and optimal times of day totransfer data. You can also use this view to identify frequently executedqueries and the users invoking those executions.
Databases
The Databases view provides metrics on the size, tables, views, and proceduresdefined in the source data warehouse system. This view can give insight intothe volume of objects that you need to migrate.
Database Coupling
The Database Coupling view provides a high-level view on databases andtables that are accessed together in a single query. This view can showwhat tables and databases are referenced often and what you can use formigration planning.

TheBigQuery steady state section contains the followingviews:

Tables With No Usage
The Tables With No Usage view displays tables in which theBigQuery migration assessment couldn't find any usage during the logs periodthat was analyzed.A lack of usage might indicate that you don't need to transfer that tableto BigQuery during migration or that the costs of storingdata in BigQuerycould be lower. You should validate the list of unused tables because theycould have usage outside of the logs period, such asa table that is only used once every three or six months.
Tables With No Writes
The Tables With No Writes view displays tables in which theBigQuery migration assessment couldn't find any updates during the logsperiod that was analyzed. A lack of writes can indicate where you mightlower your storage costs in BigQuery.
Low-Latency Queries
The Low-Latency Queries view displays a distribution of query runtimes basedon the log data analyzed. If the query duration distribution chart displays alarge number of queries with < 1 second in runtime, consider enablingBigQuery BI Engine to accelerate BI and other low-latency workloads.
Materialized Views
The Materialized View provides further optimization suggestions to boostperformance on BigQuery.
Clustering and Partitioning

The Partitioning and Clustering view displays tables that would benefitfrom partitioning, clustering, or both.

The Metadata suggestions are achieved by analyzing the source datawarehouse schema (like Partitioning and Primary Key in the source table)and finding the closest BigQuery equivalent to achievesimilar optimization characteristics.

The Workload suggestions are achieved by analyzing the source query logs.The recommendation is determined by analyzing the workloads, especiallyWHERE orJOIN clauses in the analyzed query logs.

Clustering Recommendation

The Partitioning view displays tables which might have greater than 10,000partitions, based on their partitioning constraint definition. These tablestend to be good candidates for BigQuery clustering, whichenables fine-grained table partitions.

Unique Constraints

The Unique Constraints view displays bothSET tables and unique indexesdefined within the source data warehouse. In BigQuery, it'srecommended to use staging tables and aMERGE statement to insert onlyunique records into a target table. Use the contents of this view to helpdetermine which tables you might need to adjust ETL for during themigration.

Default Values / Check Constraints

This view shows tables that use check constraints to set default columnvalues. In BigQuery, seeSpecify default column values.

TheMigration path section of the report contains the following views:

SQL Translation
The SQL Translation view lists the count and details of queries that wereautomatically converted by BigQuery migration assessment and don't need manualintervention. Automated SQL Translation typically achieves hightranslation rates if metadata is provided. This view is interactive andallows analysis of common queries and how these are translated.
Offline Effort
The Offline Effort view captures the areas that need manual intervention,including specific UDFs and potential lexical structure and syntaxviolations for tables or columns.
BigQuery Reserved Keywords
The BigQuery Reserved Keywords view displays detected usageof keywords that have special meaning in the GoogleSQL language,and cannot be used as identifiers unless enclosed by backtick (`)characters.
Table Updates Schedule
The Table Updates Schedule view shows when and how frequently tablesare updated to help you plan how and when to move them.
Data Migration to BigQuery
The Data Migration to BigQuery view outlines the migrationpath with theexpected time to migrate your data using the BigQuery Data Transfer Service.For more information, see theBigQuery Data Transfer Service for Teradata guide.

The Appendix section contains the following views:

Case Sensitivity
The Case Sensitivity view shows tables in the source data warehouse that areconfigured to perform case-insensitive comparisons.By default, string comparisons inBigQuery are case-sensitive. For more information, seeCollation.

Redshift

Migration Highlights
The Migration Highlights view provides an executive summary of the threesections of the report:
  1. TheExisting System panel provides information on the number ofdatabases, schemas, tables, and the total size of the existing RedshiftSystem. It also lists the schemas by size and potential sub-optimalresource utilization. You can use this information to optimize yourdata by removing, partitioning, or clustering your tables.
  2. TheBigQuery Steady State panel provides informationon what yourdata will look like post-migration on BigQuery,including the number of queries that can be automatically translatedusing BigQuery Migration Service.This section also shows the costs of storing your data inBigQuery based on your annual data ingestion rate, alongwith optimization suggestions for tables, provisioning, and space.
  3. TheMigration Path panel provides informationon the migration effort itself. For each table it shows the expectedtime to migrate, the number of rows in the table, and its size.

TheExisting System section contains the following views:

Queries by Type and Schedule
The Queries by Type and Schedule view categorizes your queries intoETL/Write and Reporting/Aggregation. Seeing your query mix over time helpsyou understand your existing usage patterns, and identify burstinessand potential over-provisioning that can impact cost and performance.
Query Queuing
The Query Queuing view provides additional details on system loadincluding query volume, mix, and any performance impacts due to queuing,such as insufficient resources.
Queries and WLM Scaling
The Queries and WLM Scaling view identifies concurrency scaling as anadded cost and configuration complexity. It shows how your Redshift systemroutes queries based on the rules you specified, and performance impactsdue to queuing, concurrency scaling, and evicted queries.
Queuing and Waiting
The Queuing and Waiting view is a deeper look into queue and wait timesfor queries over time.
WLM Classes and Performance
The WLM Classes and Performance view provides an optional way to map yourrules to BigQuery. However, we recommend you letBigQuery automatically route your queries.
Query & Table volume insights
The Query & Table volume insights view lists queries by size, frequency,and top users. This helps you categorize the sources of load on the systemand plan how to migrate your workloads.
Databases and Schemas
The Databases and Schemas view provides metrics on the size, tables,views, and procedures defined in the source data warehouse system. Thisprovides insight into the volume of objects which need to be migrated.
Table Volume
The Table Volume view provides statistics on the largest tables anddatabases, showing how they are accessed. Because large tables may takelonger to extract from the source data warehouse system, this view helpsyou with migration planning and sequencing.
Table Usage
The Table Usage view provides statistics on which tables are heavily usedwithin the source data warehouse system. Heavily used tables can beleveraged to understand tables which might have many dependencies andwarrant additional planning during the migration process.
Importers & Exporters
The Importers & Exporters view provides information on data and usersinvolved in data import (usingCOPY queries) and data export (usingUNLOADqueries). This view helps to identify staging layer and processes related toingestion and exports.
Cluster Utilization
The Cluster Utilization view provides general information about allavailable clusters and displays CPU utilization for each cluster. Thisview can help you understand system capacity reserve.

TheBigQuery steady state section contains the followingviews:

Clustering & Partitioning

The Partitioning and Clustering view displays tables that would benefitfrom partitioning, clustering, or both.

The Metadata suggestions are achieved by analyzing the source datawarehouse schema (like Sort Key and Dist Key in the source table) andfinding the closest BigQuery equivalent to achievesimilar optimization characteristics.

The Workload suggestions are achieved by analyzing the source querylogs. The recommendation is determined by analyzing the workloads,especiallyWHERE orJOIN clauses in the analyzed query logs.

At the bottom of the page, there is a translated create table statementwith all optimizations provided. All translated DDL statements can bealso extracted from the dataset. Translated DDL statements are stored inSchemaConversion table inCreateTableDDL column.

The recommendations in the report are provided only for tables largerthan 1 GB because small tables won't benefit from clustering andpartitioning. However, DDL for all tables (including tables smaller than1GB) are available inSchemaConversion table.

Tables With No Usage

The Tables With No Usage view displays tables where theBigQuery migration assessment did not identify any usageduring the analyzed logs period. A lack of usage might indicatethat you don't need to transfer that table to BigQueryduring migration or that the costs of storing data inBigQuery could be lower (billed asLong-term storage).We recommend validating the list of unused tables because they could haveusage outside of the logs period, such as a table that is only used onceevery three or six months.

Tables With No Writes

The Tables With No Writes view displays tables where the BigQuerymigration assessment did not identify any updates during the analyzed logs period. A lack of writes can indicate where you might loweryour storage costs in BigQuery (billed asLong-term storage).

BigQuery BI Engine and Materialized Views

The BigQuery BI Engine and Materialized Views provides furtheroptimization suggestions to boost performance on BigQuery.

TheMigration path section contains the following views:

SQL Translation
The SQL Translation view lists the count and details of queries that wereautomatically converted by BigQuery migration assessment and don't need manualintervention. Automated SQL Translation typically achieves hightranslation rates if metadata is provided.
SQL Translation Offline Effort
The SQL Translation Offline Effort view captures the areas that needmanual intervention, including specific UDFs and queries with potentialtranslation ambiguities.
Alter Table Append Support
The Alter Table Append Support view shows details about common RedshiftSQL constructs that don't have a direct BigQuery counterpart.
Copy Command Support
The Copy Command Support view shows details about common Redshift SQLconstructs that don't have a direct BigQuery counterpart.
SQL Warnings
The SQL Warnings view captures areas that are successfully translated,but require a review.
Lexical Structure & Syntax Violations
The Lexical Structure & Syntax Violations view displays names ofcolumns, tables, functions, and procedures that violateBigQuery syntax.
BigQuery Reserved Keywords
The BigQuery Reserved Keywords view displays detected usageof keywords that have special meaning in the GoogleSQL language,and cannot be used as identifiers unless enclosed by backtick (`)characters.
Schema Coupling
The Schema Coupling view provides a high-level view on databases,schemas, and tables that are accessed together in a single query. Thisview can show what tables, schemas, and databases are referenced oftenand what you can use for migration planning.
Table Updates Schedule
The Table Updates Schedule view shows how when and how frequently tablesare updated to help you plan how and when to move them.
Table Scale
The Table Scale view lists your tables with the most columns.
Data Migration to BigQuery
The Data Migration to BigQuery view outlines the migrationpath withthe expected time to migrate your data using the BigQuery Migration ServiceData Transfer Service. For more information, see theBigQuery Data Transfer Servicefor Redshift guide.
Assessment execution summary

The Assessment execution summary contains the report completeness,the progress of the on-going assessment, and the status of processed filesand errors.

Report completeness represents the percentage of successfully processeddata that is recommended to display meaningful insights in theassessment report. If the data for a particularsection of the report is missing, this information is listed in theAssessment Modules table under theReport Completenessindicator.

Theprogress metric indicates the percentage of the data processedso far along with the estimate of the remaining time to process all ofthe data. After the processing is complete, the progress metric is notdisplayed.

Assessment execution summary.

Redshift Serverless

Preview: BigQuery Migration Assessment for Amazon RedshiftServerless is inPreview.To use this feature, you must be added to the allowlist. To request access,fill out theapplication formor send an email tobq-edw-migration-support@google.com.
Migration Highlights
This report page shows the summary of existing Amazon RedshiftServerless databases including the size and number of tables.Additionally, it provides the high level estimate of the Annual ContractValue (ACV)—the cost of compute and storage in BigQuery.The Migration Highlights view provides an executive summary of the threesections of the report.

TheExisting System section has the following views:

Databases and Schemas
Provides a breakdown of total storage size in GB for each database,schema, or table.
External Databases and Schemas
Provides a breakdown of total storage size in GB for each externaldatabase, schema, or table.
System Utilization
Provides general information about the historical system utilization. Thisview displays the historical usage of RPU (Amazon Redshift ProcessingUnits) and daily storage consumption. This view can help you understandthe system capacity reserve.

TheBigQuery Steady State section provides informationon what your data will look like post-migration on BigQuery,including the number of queries that can be automatically translatedusing BigQuery Migration Service. This section also shows the costs of storingyour data in BigQuery based on your annual data ingestionrate, along with optimization suggestions for tables, provisioning, andspace. The Steady State section has the following views:

Amazon Redshift Serverless versus BigQuery pricing
Provides a comparison of Amazon Redshift Serverless andBigQuery pricing models to help you understand thebenefits and potential cost savings after you migrate toBigQuery.
BigQuery Compute Cost (TCO)
Lets you estimate the cost of compute in BigQuery. Thereare four manual inputs in the calculator: BigQuery Edition,Region, Commitment period, and Baseline. By default, the calculatorprovides optimal, cost-effective baseline commitments that you canmanually override.
Total Cost of Ownership
Lets you estimate the Annual Contract Value (ACV)—the cost ofcompute and storage in BigQuery. The calculator also letsyou calculate the storage cost, which varies for active storage andlong-term storage, depending on the table modifications during theanalyzed time period. For more information, seeStorage pricing.

TheAppendix section contains this view:

Assessment Execution Summary
Provides the assessment execution details including the list of processedfiles, errors and report completeness. You can use this page toinvestigate missing data in the report and to better understand thereport's completeness.

Snowflake

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

The report consists of different sections that can be used either separatelyor together. The following diagram organizes these sections into threecommon user goals to help you assess your migration needs:

Migration assessment report flowchart for Snowflake

Migration Highlights views

TheMigration Highlights section contains the following views:

Snowflake versus BigQuery Pricing Models
Listing of the pricings with different tiers/editions. Also includes anillustration of how BigQuery autoscaling can help save morecost compared to that of Snowflake.
Total Cost of Ownership
Interactive table, allowing the user to define: BigQueryEdition, commitment, baseline slot commitment, percentage of activestorage, and percentage of data loaded or changed. Helps better estimatethe cost for custom cases.
Automatic Translation Highlights
Aggregated translation ratio, grouped by either user or database, orderedascending or descending. Also includes the most common error message forfailed auto translation.

Existing System views

TheExisting System section contains the following views:

System Overview
The System Overview view provides the high-level volume metrics of the keycomponents in the existing system for a specified time period. Thetimeline that is evaluated depends on the logs that were analyzed by theBigQuery migration assessment. This view gives you quick insight into the sourcedata warehouse utilization, which you can use for migration planning.
Virtual Warehouses Overview
Shows the Snowflake cost by warehouse, as well as the node-basedrescaling over the period.
Table Volume
The Table Volume view provides statistics on the largest tables anddatabases found by the BigQuery migration assessment. Because large tables maytake longer to extract from the source data warehouse system, this viewcan be helpful in migration planning and sequencing.
Table Usage
The Table Usage view provides statistics on which tables are heavily usedwithin the source data warehouse system. Heavily used tables can help youto understand which tables might have many dependencies and requireadditional planning during the migration process.
Queries
The Queries view gives a breakdown of the types of SQL statements executedand statistics of their usage. You can use the histogram of Query Type andTime to identify low periods of system utilization and optimal times ofday to transfer data. You can also use this view to identify frequentlyexecuted queries and the users invoking those executions.
Databases
The Databases view provides metrics on the size, tables, views, andprocedures defined in the source data warehouse system. This view providesinsight into the volume of objects that you need to migrate.

BigQuery steady state views

TheBigQuery steady state section contains the followingviews:

Tables With No Usage
The Tables With No Usage view displays tables in which theBigQuery migration assessment couldn't find any usage during the logsperiod that was analyzed. This can indicate which tables might not need tobe transferred to BigQuery during migration or that thecosts of storing data in BigQuery could be lower. Youmust validate the list of unused tables since they could have usageoutside of the logs period analyzed, such as a table which is only usedonce per quarter or half.
Tables With No Writes
The Tables With No Writes view displays tables in which theBigQuery migration assessment couldn't find any updates during the logsperiod that was analyzed. This can indicate that the costs of storing datain BigQuery could be lower.

Migration Plan views

TheMigration Plan section of the report contains the following views:

SQL Translation
The SQL Translation view lists the count and details of queries that wereautomatically converted by BigQuery migration assessment and don't need manualintervention. Automated SQL Translation typically achieves hightranslation rates if metadata is provided. This view is interactive andallows analysis of common queries and how these are translated.
SQL Translation Offline Effort
The Offline Effort view captures the areas that need manual intervention,including specific UDFs and potential lexical structure and syntaxviolations for tables or columns.
SQL Warnings - To Review
The Warnings To Review view captures the areas that are mostly translatedbut requires some human inspection.
BigQuery Reserved Keywords
The BigQuery Reserved Keywords view displays detected usageof keywords that have special meaning in the GoogleSQL language,and cannot be used as identifiers unless enclosed by backtick (`)characters.
Database and Table Coupling
The Database Coupling view provides a high-level view on databases andtables that are accessed together in a single query. This view can showwhat tables and databases are often referenced and what can be used formigration planning.
Table Updates Schedule
The Table Updates Schedule view shows when and how frequently tablesare updated to help you plan how and when to move them.

Proof of Concept views

ThePoC (proof of concept) section contains the following views:

PoC for demonstrating steady state BigQuery savings
Includes the most frequent queries, the queries reading the most data, theslowest queries, and the tables impacted by these aforementioned queries.
PoC for demonstrating BigQuery migration plan
Showcases how BigQuery translate the most complex queriesand the tables they impact.

Oracle

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

To request feedback or support for this feature, send email tobq-edw-migration-support@google.com.

Migration Highlights

TheMigration Highlights section contains the following views:

  • Existing system: a snapshot of the existing Oracle systemand usage, including the number of databases, schemas, tables, and totalsize in GB. It also provides the workload classification summary foreach database to help you decide if BigQuery is the right migrationtarget.
  • Compatibility: provides information about the migration effort itself.For each analyzed database it shows the expected time to migrate andthe number of database objects that can be migrated automaticallywith Google provided tools.
  • BigQuery steady state: contains information on what your datawill look like post-migration on BigQuery, including the costs of storingyour data in BigQuery based on your annual data ingestion rate and the compute cost estimation.In addition, it provides insights into any underutilized tables.

Existing System

TheExisting System section contains the following views:

  • Workloads Characteristic: describes the workload type for each databasebased on the analyzed performance metrics. Each database is classified asOLAP, Mixed, or OLTP. This information can help you to make a decision onwhich databases can be migrated to BigQuery.
  • Databases and Schemas: provides a breakdown of total storage size in GBfor each database, schema, or table. In addition you can use this view toidentify materialized views and external tables.
  • Database Features and Links: shows the list of Oracle featuresused in your database, together with the BigQueryequivalent features or services that can be used after the migration. Inaddition, you can explore the Database Links to better understandconnections between the databases.
  • Database Connections: provides insight into the database sessions startedby the user or application. Analyzing this data can help you identifyexternal applications that may require additional effort during themigration.
  • Query Types: provides a breakdown of the types of SQL statements executedand statistics of their usage. You can use the hourly histogram ofQuery Executions or Query CPU Time to identify low periods of systemutilization and optimal times of day to transfer data.
  • PL/SQL Source Code: provides insight into the PL/SQL objects, likefunctions or procedures, and their size for each database and schema. Inaddition, the hourly executions histogram can be used to identify peakhours with most PL/SQL executions.
  • System Utilization: provides general information about the historicalsystem utilization. This view displays the hourly usage of CPU and dailystorage consumption. This view can help to understand the system capacityreserve.

BigQuery Steady State

TheBigQuery Steady State section contains the following views:

  • Exadata versus BigQuery pricing: provides the generalcomparison of Exadata and BigQuery pricing models to helpyou understand the benefits and potential cost savings after the migrationto BigQuery.
  • BigQuery Database Read/Writes: provides insights about thedatabase's physical disk operations. Analyzing this data can help you findthe best time to perform data migration from Oracle toBigQuery.
  • BigQuery Compute Cost: letsyou estimate the cost of compute in BigQuery. There are fourmanual inputs in the calculator:BigQuery Edition,Region,Commitment period, andBaseline. By default, thecalculator provides optimal, cost-effective baseline commitment that youcan manually override. TheAnnual Autoscaling Slot Hours value providesthe number of slot hours used outside of commitment. This value iscalculated using system utilization. The visual explanation ofrelationships between the baseline, autoscaling, and utilization isprovided at the end of the page. Each estimation shows the probable numberand an estimation range.
  • Total Cost of Ownership (TCO): lets you estimate theAnnual Contract Value (ACV) - the cost of compute and storage inBigQuery. The calculator also lets youcalculate the storage cost. The calculator also lets you calculate thestorage cost, which varies foractive storage andlong-term storage,depending on the table modifications during the analyzed time period. Formore information about storage pricing,seeStorage pricing.
  • Underutilized Tables: provides information about unused and read-onlytables based on the usage metrics from the analyzed time period. A lack ofusage might indicate that you don't need to transfer the tableto BigQuery during a migration or that the costs of storingdata in BigQuery could be lower (billed as long-termstorage). We recommend that you validate the list of unused tables in casethey have usage outside of the analyzed time period.

Migration Hints

TheMigration Hints section contains the following views:

  • Database Objects Compatibility: provides the overview of database objectscompatibility with BigQuery, including the number of objects that can beautomatically migrated with Google provided tools or require manual action.This information is shown for each database, schemma, and database objecttype.
  • Database Objects Migration Effort: shows the estimate of migration effort inhours for each database, schema, or database object type. In addition itshows the percentage of small, medium, and large objects based on themigration effort.
  • Database Schema Migration Effort: provides the list of all detected databaseobject types, their number, compatibility with BigQuery and the estimatedmigration effort in hours.
  • Database Schema Migration Effort Detailed: provides more deep dive insightinto the database schema migration effort, including the information for eachsingle object.

Proof of Concept views

TheProof of Concept views section contains the following views:

  • Proof of concept migration: shows the suggested list of databases with thelowest migration effort that are good candidates for initial migration. Inaddition, it shows the top queries that can help to demonstrate the timeand cost savings, and value of BigQuery using a proof ofconcept.

Appendix

TheAppendix section contains the following views:

  • Assessment Execution Summary: provides the assessment execution detailsincluding the list of processed files, errors, and report completeness. Youcan use this page to investigate missing data in the report and betterunderstand the overall report completeness.

Apache Hive

The report consisting of a three-part narrative is prefaced by a summaryhighlights page that includes the following sections:

  • Existing System - Apache Hive. This section consists of asnapshot of the existing Apache Hive system and usage includingthe number of databases, tables, their total size in GB, and the numberof query logs processed. This section also lists the databases by sizeand points to potential sub-optimal resource utilization (tables with nowrites or few reads) and provisioning. The details of this sectioninclude the following:

    • Compute and queries
      • CPU utilization:
        • Queries by hour and day with CPU utilization
        • Queries by type (read/write)
        • Queues and applications
        • Overlay of the hourly CPU utilization with average hourly queryperformance and average hourly application performance
      • Queries histogram by type and query durations
      • Queueing and waiting page
      • Queues detailed view (queue, user, unique queries, reportingversus ETL breakdown, by metrics)
    • Storage overview
      • Databases by volume, views, and access rates
      • Tables with access rates by users, queries, writes, andtemporary table creations
    • Queues and applications: Access rates and client IP addresses
  • BigQuery Steady State.This sectionshows what the system will look like on BigQueryafter migration. It includes suggestions for optimizingworkloads on BigQuery (and avoiding wastage).The details of this section include the following:

    • Tables identified as candidates for materialized views.
    • Clustering and partitioning candidates based on metadata and usage.
    • Low-latency queries identified as candidates for BigQuery BI Engine.
    • Tables without read or write usage.
    • Partitioned tables with the data skew.
  • Migration Plan. This section provides information aboutthe migration effort itself. For example, getting from the existingsystem to the BigQuery steadystate. This section contains identified storage targets for each table,tables identified as significant for migration, and the count of queriesthat were automatically translated.The details of this section include the following:

    • Detailed view with automatically translated queries
      • Count of total queries with ability to filter by user, application, affected tables, queried tables, and query type.
      • Query buckets with similar patterns grouped together, enabling users to see the translation philosophy by query types.
    • Queries requiring human intervention
      • Queries with BigQuery lexical structure violations
      • User-defined functions and procedures
      • BigQuery reserved keywords
    • Query requiring review
    • Tables schedules by writes and reads (to group them for moving)
    • Identified storage target for external and managed tables

TheExisting System - Hive section contains the following views:

System Overview
This view provides the high-level volume metrics of the key componentsin the existing system for a specified time period. The timeline that isevaluated depends on the logs that were analyzed by the BigQuery migration assessment.This view gives you quick insight into the source data warehouse utilization,which you can use for migration planning.
Table Volume
This view provides statistics on the largest tables and databasesfound by the BigQuery migration assessment. Because large tables may take longer toextract from the source data warehouse system, this view can be helpful inmigration planning and sequencing.
Table Usage
This view provides statistics on which tables are heavily usedwithin the source data warehouse system. Heavily used tables can help you tounderstand which tables might have many dependencies and require additionalplanning during the migration process.
Queues Utilization
This view provides statistics on YARN queuesusage found during processing of logs. These views let users understandusage of specific queues and applications over time and the impact onresource usage. These views also help identify and prioritize workloadsfor migration. During a migration, it's important to visualize theingestion and consumption of data to gain a better understanding of thedependencies of the data warehouse, and to analyze the impact of movingvarious dependent applications together. The IP address table can beuseful for pinpointing the exact application using the data warehouseover JDBC connections.
Queues Metrics
This view provides a breakdown of the different metrics onYARN queues found during processing of logs. This view lets users tounderstand patterns of usage in specific queues and impact on migration.You can also use this view to identify connections between tablesaccessed in queries and queues where the query was executed.
Queuing and Waiting
This view provides an insight on the query queuingtime in the source data warehouse. Queuing times indicate performancedegradation due to under provisioning, and additional provisioningrequires increased hardware and maintenance costs.
Queries
This view gives a breakdown of the types of SQL statements executed andstatistics of their usage. You can use the histogram of Query Type and Time toidentify low periods of system utilization and optimal times of day totransfer data. You can also use this view to identify most-usedHive execution engines and frequently executedqueries along with the user details.
Databases
This view provides metrics on the size, tables, views, and proceduresdefined in the source data warehouse system. This view can give insight intothe volume of objects that you need to migrate.
Database & Table Coupling
This view provides a high-level view on databases andtables that are accessed together in a single query. This view can showwhat tables and databases are referenced often and what you can use formigration planning.

TheBigQuery Steady State section contains the followingviews:

Tables With No Usage
The Tables With No Usage view displays tables in which theBigQuery migration assessment couldn't find any usage during the logs periodthat was analyzed.A lack of usage might indicate that you don't need to transfer that tableto BigQuery during migration or that the costs of storingdata in BigQuerycould be lower. You must validate the list of unused tables because theycould have usage outside of the logs period, such asa table that is only used once every three or six months.
Tables With No Writes
The Tables With No Writes view displays tables in which theBigQuery migration assessment couldn't find any updates during the logsperiod that was analyzed. A lack of writes can indicate where you mightlower your storage costs in BigQuery.
Clustering and Partitioning Recommendations

This view displays tables that would benefitfrom partitioning, clustering, or both.

The Metadata suggestions are achieved by analyzing the source datawarehouse schema (like Partitioning and Primary Key in the source table)and finding the closest BigQuery equivalent to achievesimilar optimization characteristics.

The Workload suggestions are achieved by analyzing the source query logs.The recommendation is determined by analyzing the workloads, especiallyWHERE orJOIN clauses in the analyzed query logs.

Partitions converted to Clusters

This view displays tables that have more than 10,000partitions, based on their partitioning constraint definition. These tablestend to be good candidates for BigQuery clustering, whichenables fine-grained table partitions.

Skewed partitions

The Skewed Partitions view displays tables that are based on the metadataanalysis and have data skew on one or several partitions. These tables aregood candidates for schema change, as queries on skewed partitionsmight not perform well.

BI Engine and Materialized Views

The Low-Latency Queries and Materialized Views view displays adistribution of query runtimes based on the log data analyzed and afurther optimization suggestions to boost performance onBigQuery. If the query duration distribution chartdisplays a large number of queries with runtime less than 1 second, considerenabling BI Engine to accelerate BI and other low-latency workloads.

TheMigration Plan section of the report contains the following views:

SQL Translation
The SQL Translation view lists the count and details of queries that wereautomatically converted by BigQuery migration assessment and don't need manualintervention. Automated SQL Translation typically achieves hightranslation rates if metadata is provided. This view is interactive andallows analysis of common queries and how these are translated.
SQL Translation Offline Effort
The Offline Effort view captures the areas that need manual intervention,including specific UDFs and potential lexical structure and syntaxviolations for tables or columns.
SQL Warnings
The SQL Warnings view captures areas that are successfully translated, but require a review.
BigQuery Reserved Keywords
The BigQuery Reserved Keywords view displays detected usageof keywords that have special meaning in the GoogleSQL language.These keywords can't be used as identifiers unless enclosed bybacktick (`) characters.
Table Updates Schedule
The Table Updates Schedule view shows when and how frequently tablesare updated to help you plan how and when to move them.
BigLake External Tables
The BigLake External Tables view outlines tables that areidentified as targets to migration to BigLake instead ofBigQuery.

TheAppendix section of the report contains the following views:

Detailed SQL Translation Offline Effort Analysis
The Detailed Offline Effort Analysis view provides an additional insight of the SQL areas thatneed manual intervention.
Detailed SQL Warnings Analysis
The Detailed Warnings Analysis view provides an additional insight of the SQL areas that aresuccessfully translated, but require a review.

Share the report

The Looker Studio report is a frontend dashboard for the migrationassessment. It relies on the underlying dataset access permissions. To share thereport, the recipient must have access to both the Looker Studioreport itself and the BigQuery dataset that contains theassessment results.

When you open the report from the Google Cloud console, you are viewing thereport in the preview mode. To create and share the report with other users,perform the following steps:

  1. ClickEdit and share. Looker Studio prompts you to attachnewly created Looker Studio connectors to the new report.
  2. ClickAdd to report. The report receives an individual report ID, whichyou can use to access the report.
  3. To share the Looker Studio report with other users, follow thesteps given inShare reports with viewers and editors.
  4. Grant the users permission to view the BigQuery dataset thatwas used to run the assessment task. For more information, seeGranting access to a dataset.

Query the migration assessment output tables

Although the Looker Studio reports are the most convenient way to viewthe assessment results, you can alsoview and query the underlying datain the BigQuery dataset.

Example query

The following example gets the total number of unique queries, the number ofqueries that failed translation, and the percentage of unique queries thatfailed translation.

SELECTQueryCount.vASQueryCount,ErrorCount.vasErrorCount,(ErrorCount.v*100)/QueryCount.vASFailurePercentageFROM(SELECTCOUNT(*)ASvFROM`your_project.your_dataset.TranslationErrors`WHERESeverity="ERROR")ASErrorCount,(SELECTCOUNT(DISTINCT(QueryHash))ASvFROM`your_project.your_dataset.Queries`)ASQueryCount;

Share your dataset with users in other projects

After inspecting the dataset, if you would like to share it with a user that isnot in your project, you can do so by utilizing thepublisher workflow of BigQuery sharing (formerly Analytics Hub).

Note: There is no additional cost for managingdata exchanges or listings in sharing.
  1. In the Google Cloud console, go to theBigQuery page.

    Goto BigQuery

  2. Click the dataset to view its details.

  3. ClickSharing>Publish as listing.

  4. In the dialog that opens, create a listing as prompted.

    If you already have a data exchange, skip step 5.

  5. Create a data exchange and set permissions.To allow a user to view your listings in this exchange, add them to theSubscribers list.

  6. Enter the listing details.

    Display name is the name of this listing and is required; other fieldsare optional.

  7. ClickPublish.

    A private listing is created.

  8. For your listing, selectMore actions underActions.

  9. ClickCopy share link.

    You can share the link with users that have subscription access to yourexchange or listing.

Troubleshooting

This section explains some common issues and troubleshooting techniques formigrating your data warehouse to BigQuery.

dwh-migration-dumper tool errors

To troubleshoot errors and warnings in thedwh-migration-dumper tool terminaloutput that occurred during metadata or query logs extraction, seegenerate metadata troubleshooting.

Hive migration errors

This section describes common issues that you might run into when you planto migrate your data warehouse from Hive to BigQuery.

The logging hook writes debug log messages in yourhive-server2 logs. If you run into any issues, review thelogging hook debug logs, which contains theMigrationAssessmentLoggingHook string.

Handle theClassNotFoundException error

The error might be caused by the logging hook JAR filemisplacement. Ensure that you added the JAR file to the auxlib folder on theHive cluster. Alternatively, you can specify full path tothe JAR file in thehive.aux.jars.path property, for example,file:///HiveMigrationAssessmentQueryLogsHooks_deploy.jar.

Subfolders don't appear in the configured folder

This issue might be caused by the misconfiguration or problems duringlogging hook initialization.

Search yourhive-server2 debug logs for the followinglogging hook messages:

Unable to initialize logger, logging disabled
Log dir configuration key 'dwhassessment.hook.base-directory' is not set,logging disabled.
Error while trying to set permission

Review the issue details and see if there is anything that you need to correctto fix the problem.

Files don't appear in the folder

This issue might be caused by the problems encountered during an eventprocessing or while writing to a file.

Search in yourhive-server2 debug logs for the followinglogging hook messages:

Failed to close writer for file
Got exception while processing event
Error writing record for query

Review the issue details and see if there is anything that you need to correctto fix the problem.

Some query events are missed

This issue might be caused by the logging hook thread queueoverflow.

Search in yourhive-server2 debug logs for the followinglogging hook message:

Writer queue is full. Ignoring event

If there are such messages, consider increasing thedwhassessment.hook.queue.capacity parameter.

What's next

For more information about thedwh-migration-dumper tool, seedwh-migration-tools.

You can also learn more about the following steps in data warehouse migration:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.