Apache Hive SQL translation guide

This document details the similarities and differences in SQL syntax betweenApache Hive and BigQuery to help you plan yourmigration. To migrate your SQL scripts in bulk, usebatch SQL translation. To translatead hoc queries, useinteractive SQL translation.

In some cases, there's no direct mapping between a SQL element inHive and BigQuery. However, in most cases,BigQuery offers an alternative element toHive to help you achieve the same functionality, asshown in the examples in this document.

The intended audience for this document is enterprise architects, databaseadministrators, application developers, and IT security specialists. Itassumes that you're familiar with Hive.

Data types

Hive and BigQuery have different data typesystems. In most cases, you can map data types in Hive toBigQuery data typeswith a few exceptions, such asMAP andUNION. Hivesupports more implicit type casting than BigQuery. As a result,the batch SQL translator inserts many explicit casts.

HiveBigQuery
TINYINTINT64
SMALLINTINT64
INTINT64
BIGINTINT64
DECIMALNUMERIC
FLOATFLOAT64
DOUBLEFLOAT64
BOOLEANBOOL
STRINGSTRING
VARCHARSTRING
CHARSTRING
BINARYBYTES
DATEDATE
-DATETIME
-TIME
TIMESTAMPDATETIME/TIMESTAMP
INTERVAL-
ARRAYARRAY
STRUCTSTRUCT
MAPSSTRUCT with key values (REPEAT field)
UNIONSTRUCT with different types
-GEOGRAPHY
-JSON

Query syntax

This section addresses differences in query syntax betweenHive and BigQuery.

SELECT statement

Most HiveSELECT statements arecompatible with BigQuery. The following table contains a list ofminor differences:

CaseHiveBigQuery
Subquery

SELECT * FROM (
SELECT 10 as col1, "test" as col2, "test" as col3
) tmp_table;

SELECT * FROM (
SELECT 10 as col1, "test" as col2, "test" as col3
);

Column filtering

SET hive.support.quoted.identifiers=none;
SELECT `(col2|col3)?+.+` FROM (
SELECT 10 as col1, "test" as col2, "test" as col3
) tmp_table;

SELECT * EXCEPT(col2,col3) FROM (
SELECT 10 as col1, "test" as col2, "test" as col3
);

Exploding an array

SELECT tmp_table.pageid, adid FROM (
SELECT 'test_value' pageid, Array(1,2,3) ad_id) tmp_table
LATERAL VIEW
explode(tmp_table.ad_id) adTable AS adid;

SELECT tmp_table.pageid, ad_id FROM (
SELECT 'test_value' pageid, [1,2,3] ad_id) tmp_table,
UNNEST(tmp_table.ad_id) ad_id;

FROM clause

TheFROM clause in a query lists the table references from which data isselected. In Hive, possible table references includetables, views, and subqueries. BigQuery also supports all thesetable references.

You can reference BigQuery tables in theFROM clause by usingthe following:

  • [project_id].[dataset_id].[table_name]
  • [dataset_id].[table_name]
  • [table_name]

BigQuery also supportsadditional table references:

Comparison operators

The following table provides details about converting operators fromHive to BigQuery:

Function or operatorHiveBigQuery
- Unary minus
* Multiplication
/ Division
+ Addition
- Subtraction
Allnumber typesAllnumber types.

To prevent errors during the divide operation, consider usingSAFE_DIVIDE orIEEE_DIVIDE.

~ Bitwise not
| Bitwise OR
& Bitwise AND
^ Bitwise XOR
Boolean data typeBoolean data type.
Left shift

shiftleft(TINYINT|SMALLINT|INT a, INT b)
shiftleft(BIGINT a, INT b)

<< Integer or bytes

A<< B, whereB must be same type asA

Right shift

shiftright(TINYINT|SMALLINT|INT a, INT b)
shiftright(BIGINT a, INT b)

>> Integer or bytes

A >> B, whereB must be same type asA

Modulus (remainder)X % Y

Allnumber types

MOD(X, Y)
Integer divisionA DIV B andA/B for detailed precisionAllnumber types.

Note: To prevent errors during the divide operation, consider usingSAFE_DIVIDE orIEEE_DIVIDE.

Unary negation!,NOTNOT
Types supporting equality comparisonsAllprimitive typesAllcomparable types andSTRUCT.
a<=> bNot supported. Translate to the following:

(a = b AND b IS NOT NULL OR a IS NULL)

a<> bNot supported. Translate to the following:

NOT (a = b AND b IS NOT NULL OR a IS NULL)

Relational operators ( =, ==, !=,<, >, >=)Allprimitive typesAllcomparable types.
String comparisonRLIKE,REGEXPREGEXP_CONTAINS built-in function. Uses BigQueryregex syntax for string functions for the regular expression patterns.
[NOT] LIKE, [NOT] BETWEEN, IS [NOT] NULLA [NOT] BETWEEN B AND C, A IS [NOT] (TRUE|FALSE), A [NOT] LIKE BSame as Hive. In addition, BigQuery also supports theIN operator.

JOIN conditions

Both Hive and BigQuery support thefollowing types of joins:

  • [INNER] JOIN

  • LEFT [OUTER] JOIN

  • RIGHT [OUTER] JOIN

  • FULL [OUTER] JOIN

  • CROSS JOIN and the equivalent implicitcomma cross join

For more information, seeJoin operationandHive joins.

Type conversion and casting

The following table provides details about converting functions fromHive to BigQuery:

Function or operatorHiveBigQuery
Type castingWhen a cast fails, `NULL` is returned.

Same syntax as Hive. For more information about BigQuery type conversion rules, seeConversion rules.

If cast fails, you see an error. To have the same behavior as Hive, useSAFE_CAST instead.

SAFE function callsIf you prefix function calls withSAFE, the function returnsNULL instead of reporting failure. For example,SAFE.SUBSTR('foo', 0, -2) AS safe_output; returnsNULL.

Note: When casting safely without errors, useSAFE_CAST.

Implicit conversion types

When migrating to BigQuery, you need to convert most of yourHive implicit conversions toBigQuery explicit conversions except for the followingdata types, which BigQuery implicitly converts.

From BigQuery typeTo BigQuery type
INT64FLOAT64,NUMERIC,BIGNUMERIC
BIGNUMERICFLOAT64
NUMERICBIGNUMERIC,FLOAT64

BigQuery also performs implicit conversions for the followingliterals:

From BigQuery typeTo BigQuery type
STRING literal (for example,"2008-12-25")DATE
STRING literal (for example,"2008-12-25 15:30:00")TIMESTAMP
STRING literal (for example,"2008-12-25T07:30:00")DATETIME
STRING literal (for example,"15:30:00")TIME

Explicit conversion types

If you want to convert Hive data types that BigQuery doesn'timplicitly convert, use the BigQueryCAST(expression AS type) function.

Functions

This section covers common functions used in Hive andBigQuery.

Aggregate functions

The following table shows mappings between common Hiveaggregate, statistical aggregate, and approximate aggregate functions with theirBigQuery equivalents:

HiveBigQuery
count(DISTINCT expr[, expr...])count(DISTINCT expr[, expr...])
percentile_approx(DOUBLE col, array(p1 [, p2]...) [, B]) WITHIN GROUP (ORDER BY expression)APPROX_QUANTILES(expression, 100)[OFFSET(CAST(TRUNC(percentile * 100) as INT64))]

BigQuery doesn't support the rest of the arguments that Hive defines.

AVGAVG
X | YBIT_OR / X | Y
X ^ YBIT_XOR / X ^ Y
X & YBIT_AND / X & Y
COUNTCOUNT
COLLECT_SET(col), \COLLECT_LIST(col)ARRAY_AGG(col)
COUNTCOUNT
MAXMAX
MINMIN
REGR_AVGXAVG(

IF(dep_var_expr is NULL

OR ind_var_expr is NULL,

NULL, ind_var_expr)

)

REGR_AVGYAVG(

IF(dep_var_expr is NULL

OR ind_var_expr is NULL,

NULL, dep_var_expr)

)

REGR_COUNTSUM(

IF(dep_var_expr is NULL

OR ind_var_expr is NULL,

NULL, 1)

)

REGR_INTERCEPTAVG(dep_var_expr)

- AVG(ind_var_expr)

* (COVAR_SAMP(ind_var_expr,dep_var_expr)

/ VARIANCE(ind_var_expr)

)

REGR_R2(COUNT(dep_var_expr) *

SUM(ind_var_expr * dep_var_expr) -

SUM(dep_var_expr) * SUM(ind_var_expr))

/ SQRT(

(COUNT(ind_var_expr) *

SUM(POWER(ind_var_expr, 2)) *

POWER(SUM(ind_var_expr),2)) *

(COUNT(dep_var_expr) *

SUM(POWER(dep_var_expr, 2)) *

POWER(SUM(dep_var_expr), 2)))

REGR_SLOPECOVAR_SAMP(ind_var_expr,

dep_var_expr)

/ VARIANCE(ind_var_expr)

REGR_SXXSUM(POWER(ind_var_expr, 2)) - COUNT(ind_var_expr) * POWER(AVG(ind_var_expr),2)
REGR_SXYSUM(ind_var_expr*dep_var_expr) - COUNT(ind_var_expr) * AVG(ind) * AVG(dep_var_expr)
REGR_SYYSUM(POWER(dep_var_expr, 2)) - COUNT(dep_var_expr) * POWER(AVG(dep_var_expr),2)
ROLLUPROLLUP
STDDEV_POPSTDDEV_POP
STDDEV_SAMPSTDDEV_SAMP, STDDEV
SUMSUM
VAR_POPVAR_POP
VAR_SAMPVAR_SAMP, VARIANCE
CONCAT_WSSTRING_AGG

Analytical functions

The following table shows mappings between common Hiveanalytical functions with their BigQuery equivalents:

HiveBigQuery
AVGAVG
COUNTCOUNT
COVAR_POPCOVAR_POP
COVAR_SAMPCOVAR_SAMP
CUME_DISTCUME_DIST
DENSE_RANKDENSE_RANK
FIRST_VALUEFIRST_VALUE
LAST_VALUELAST_VALUE
LAGLAG
LEADLEAD
COLLECT_LIST, \COLLECT_SETARRAY_AGGARRAY_CONCAT_AGG
MAXMAX
MINMIN
NTILENTILE(constant_integer_expression)
PERCENT_RANKPERCENT_RANK
RANK ()RANK
ROW_NUMBERROW_NUMBER
STDDEV_POPSTDDEV_POP
STDDEV_SAMPSTDDEV_SAMP,STDDEV
SUMSUM
VAR_POPVAR_POP
VAR_SAMPVAR_SAMP,VARIANCE
VARIANCEVARIANCE ()
WIDTH_BUCKETA user-defined function (UDF) can be used.

Date and time functions

The following table shows mappings between common Hivedate and time functions and their BigQuery equivalents:

DATE_ADDDATE_ADD(date_expression, INTERVAL int64_expression date_part)
DATE_SUBDATE_SUB(date_expression, INTERVAL int64_expression date_part)
CURRENT_DATECURRENT_DATE
CURRENT_TIMECURRENT_TIME
CURRENT_TIMESTAMPCURRENT_DATETIME is recommended, as this value is timezone-free and synonymous withCURRENT_TIMESTAMP \CURRENT_TIMESTAMP in Hive.
EXTRACT(field FROM source)EXTRACT(part FROM datetime_expression)
LAST_DAY DATE_SUB( DATE_TRUNC( DATE_ADD(

date_expression, INTERVAL 1 MONTH

), MONTH ), INTERVAL 1 DAY)

MONTHS_BETWEENDATE_DIFF(date_expression, date_expression, MONTH)
NEXT_DAY DATE_ADD(

DATE_TRUNC(

date_expression,

WEEK(day_value)

),

INTERVAL 1 WEEK

)

TO_DATEPARSE_DATE
FROM_UNIXTIMEUNIX_SECONDS
FROM_UNIXTIMESTAMPFORMAT_TIMESTAMP
YEAR \QUARTER \MONTH \HOUR \MINUTE \SECOND \WEEKOFYEAREXTRACT
DATEDIFFDATE_DIFF

BigQuery offers the following additional date and time functions:

String functions

The following table shows mappings between Hive stringfunctions and their BigQuery equivalents:

HiveBigQuery
ASCIITO_CODE_POINTS(string_expr)[OFFSET(0)]
HEXTO_HEX
LENGTHCHAR_LENGTH
LENGTHCHARACTER_LENGTH
CHRCODE_POINTS_TO_STRING
CONCATCONCAT
LOWERLOWER
LPADLPAD
LTRIMLTRIM
REGEXP_EXTRACTREGEXP_EXTRACT
REGEXP_REPLACEREGEXP_REPLACE
REPLACEREPLACE
REVERSEREVERSE
RPADRPAD
RTRIMRTRIM
SOUNDEXSOUNDEX
SPLITSPLIT(instring, delimiter)[ORDINAL(tokennum)]
SUBSTR, \SUBSTRINGSUBSTR
TRANSLATETRANSLATE
LTRIMLTRIM
RTRIMRTRIM
TRIMTRIM
UPPERUPPER

BigQuery offers the following additional string functions:

Math functions

The following table shows mappings between Hivemath functions and their BigQuery equivalents:

HiveBigQuery
ABSABS
ACOSACOS
ASINASIN
ATANATAN
CEILCEIL
CEILINGCEILING
COSCOS
FLOORFLOOR
GREATESTGREATEST
LEASTLEAST
LNLN
LNNVLUse withISNULL.
LOGLOG
MOD (% operator)MOD
POWERPOWER,POW
RANDRAND
ROUNDROUND
SIGNSIGN
SINSIN
SQRTSQRT
HASHFARM_FINGERPRINT, MD5, SHA1, SHA256, SHA512
STDDEV_POPSTDDEV_POP
STDDEV_SAMPSTDDEV_SAMP
TANTAN
TRUNCTRUNC
NVLIFNULL(expr, 0), COALESCE(exp, 0)

BigQuery offers the following additional math functions:

Logical and conditional functions

The following table shows mappings between Hive logicaland conditional functions and their BigQuery equivalents:

HiveBigQuery
CASECASE
COALESCECOALESCE
NVLIFNULL(expr, 0), COALESCE(exp, 0)
NULLIFNULLIF
IFIF(expr, true_result, else_result)
ISNULLIS NULL
ISNOTNULLIS NOT NULL
NULLIFNULLIF

UDFs and UDAFs

Apache Hive supports writing user defined functions (UDFs) in Java.You can load UDFs into Hive to be used in regular queries.BigQuery UDFsmust be written in GoogleSQL or JavaScript. Converting the Hive UDFsto SQL UDFs is recommended because SQL UDFs performbetter. If you need to use JavaScript, readBest Practices for JavaScript UDFs.For other languages, BigQuery supportsremote functionsthat let you invoke your functions inCloud Run functions orCloud Run fromGoogleSQL queries.

BigQuery does not support user-defined aggregation functions(UDAFs).

DML syntax

This section addresses differences in data manipulation language (DML)syntax between Hive and BigQuery.

INSERT statement

Most HiveINSERT statements are compatible withBigQuery. The following table shows exceptions:

HiveBigQuery
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]INSERT INTOtable (...) VALUES (...);

Note: In BigQuery, omitting column names in theINSERT statement only works if values for all columns in the target table are included in ascending order based on their ordinal positions.

INSERT OVERWRITE [LOCAL] DIRECTORY directory1

[ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)

SELECT ... FROM ...

BigQuery doesn't support the insert-overwrite operations.This Hive syntax can be migrated toTRUNCATE andINSERT statements.

BigQuery imposesDML quotas thatrestrict the number of DML statements that you can execute daily. To make thebest use of your quota, consider the following approaches:

  • Combine multiple rows in a singleINSERT statement, instead of one row foreachINSERT operation.

  • Combine multiple DML statements (includingINSERT) by using aMERGEstatement.

  • UseCREATE TABLE ... AS SELECT to create and populate new tables.

UPDATE statement

Most HiveUPDATE statements are compatible withBigQuery. The following table shows exceptions:

HiveBigQuery
UPDATE tablename SET column = value [, column = value ...] [WHERE expression]UPDATE table

SET column = expression [,...]

[FROM ...]

WHERE TRUE

Note: AllUPDATE statements in BigQuery require aWHERE keyword, followed by a condition.

DELETE andTRUNCATE statements

You can useDELETE orTRUNCATE statements to remove rows from a tablewithout affecting the table schema or indexes.

In BigQuery, theDELETE statement must have aWHERE clause.For more information aboutDELETE in BigQuery, seeDELETE examples.

HiveBigQuery
DELETE FROM tablename [WHERE expression]DELETE FROM table_nameWHERE TRUE

BigQueryDELETE statements require aWHEREclause.

TRUNCATE [TABLE] table_name [PARTITION partition_spec];TRUNCATE TABLE [[project_name.]dataset_name.]table_name

MERGE statement

TheMERGE statement can combineINSERT,UPDATE, andDELETE operationsinto a singleupsert statement and perform the operations. TheMERGE operation must match one source row at most for each target row.

HiveBigQuery
MERGE INTO AS T USING AS SON

WHEN MATCHED [AND] THEN UPDATE SET

WHEN MATCHED [AND] THEN DELETE

WHEN NOT MATCHED [AND] THEN INSERT VALUES

MERGE targetUSING source

ON target.key = source.key

WHEN MATCHED AND source.filter = 'filter_exp' THEN

UPDATE SET

target.col1 = source.col1,

target.col2 = source.col2,

...

Note: You must list all columns that need to be updated.

ALTER statement

The following table provides details about convertingCREATE VIEW statementsfrom Hive to BigQuery:

FunctionHiveBigQuery
Rename tableALTER TABLE table_name RENAME TO new_table_name;Not supported. A workaround is to use a copy job with the name that you want as the destination table, and then delete the old one.

bq copy project.dataset.old_table project.dataset.new_table

bq rm --table project.dataset.old_table

Table propertiesALTER TABLE table_name SET TBLPROPERTIES table_properties;

table_properties:

: (property_name = property_value, property_name = property_value, ... )

Table Comment:ALTER TABLE table_name SET TBLPROPERTIES ('comment' = new_comment);

{ALTER TABLE | ALTER TABLE IF EXISTS}

table_name

SET OPTIONS(table_set_options_list)

SerDe properties (Serialize and deserialize)ALTER TABLE table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES serde_properties];

ALTER TABLE table_name [PARTITION partition_spec] SET SERDEPROPERTIES serde_properties;

serde_properties:

: (property_name = property_value, property_name = property_value, ... )

Serialization and deserialization is managed by the BigQuery service and isn't user configurable.

To learn how to let BigQuery read data from CSV, JSON, AVRO,PARQUET, or ORC files, seeCreate Cloud Storage external tables.

Supports CSV, JSON, AVRO, and PARQUET export formats. For more information, seeExport formats and compression types.

Table storage propertiesALTER TABLE table_name CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name, ...)] INTO num_buckets BUCKETS;Not supported for theALTER statements.
Skewed tableSkewed:ALTER TABLE table_name SKEWED BY (col_name1, col_name2, ...) ON ([(col_name1_value, col_name2_value, ...) [, (col_name1_value, col_name2_value), ...]

[STORED AS DIRECTORIES];

Not Skewed:ALTER TABLE table_name NOT SKEWED;

Not Stored as Directories:ALTER TABLE table_name NOT STORED AS DIRECTORIES;

Skewed Location:ALTER TABLE table_name SET SKEWED LOCATION (col_name1="location1" [, col_name2="location2", ...] );

Balancing storage for performance queries is managed by the BigQuery service and isn't configurable.
Table constraintsALTER TABLE table_name ADD CONSTRAINT constraint_name PRIMARY KEY (column, ...) DISABLE NOVALIDATE;ALTER TABLE table_name ADD CONSTRAINT constraint_name FOREIGN KEY (column, ...) REFERENCES table_name(column, ...) DISABLE NOVALIDATE RELY;

ALTER TABLE table_name DROP CONSTRAINT constraint_name;

ALTER TABLE [[project_name.]dataset_name.]table_name
ADD [CONSTRAINT [IF NOT EXISTS] [constraint_name]] constraint NOT ENFORCED;
ALTER TABLE [[project_name.]dataset_name.]table_name
ADD PRIMARY KEY(column_list) NOT ENFORCED;

For more information, seeALTER TABLE ADD PRIMARY KEY statement.

Add partitionALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location'][, PARTITION partition_spec [LOCATION 'location'], ...];

partition_spec:

: (partition_column = partition_col_value, partition_column = partition_col_value, ...)

Not supported. Additional partitions are added as needed when data with new values in the partition columns are loaded.

For more information, seeManaging partitioned tables.

Rename partitionALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;Not supported.
Exchange partition-- Move partition from table_name_1 to table_name_2

ALTER TABLE table_name_2EXCHANGE PARTITION (partition_spec) WITH TABLE table_name_1;-- multiple partitions

ALTER TABLE table_name_2 EXCHANGE PARTITION (partition_spec, partition_spec2, ...) WITH TABLE table_name_1;

Not supported.
Recover partitionMSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS];Not supported.
Drop partitionALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...] [IGNORE PROTECTION] [PURGE];Supported using the following methods:
  • bq rm 'mydataset.table_name$partition_id'
  • DELETE from table_name$partition_id WHERE 1=1

  • For more information, seeDelete a partition.

(Un)Archive partitionALTER TABLE table_name ARCHIVE PARTITION partition_spec;ALTER TABLE table_name UNARCHIVE PARTITION partition_spec;Not supported.
Table and partition file formatALTER TABLE table_name [PARTITION partition_spec] SET FILEFORMAT file_format;Not supported.
Table and partition locationALTER TABLE table_name [PARTITION partition_spec] SET LOCATION "new location";Not supported.
Table and partition touchALTER TABLE table_name TOUCH [PARTITION partition_spec];Not supported.
Table and partition protectionALTER TABLE table_name [PARTITION partition_spec] ENABLE|DISABLE NO_DROP [CASCADE];

ALTER TABLE table_name [PARTITION partition_spec] ENABLE|DISABLE OFFLINE;

Not supported.
Table and partition compactALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])] COMPACT 'compaction_type'[AND WAIT]

[WITH OVERWRITE TBLPROPERTIES ("property"="value" [, ...])];

Not supported.
Table and artition concatenateALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])] CONCATENATE;Not supported.
Table and partition columnsALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])] UPDATE COLUMNS;Not supported for theALTER TABLE statements.
Column name, type, position, and commentALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];Not supported.

DDL syntax

This section addresses differences in Data Definition Language (DDL) syntaxbetween Hive and BigQuery.

CREATE TABLE andDROP TABLE statements

The following table provides details about convertingCREATE TABLE statementsfrom Hive to BigQuery:

TypeHiveBigQuery
Managed tablescreate table table_name (

id int,

dtDontQuery string,

name string

)

CREATE TABLE `myproject`.mydataset.table_name (

id INT64,

dtDontQuery STRING,

name STRING

)

Partitioned tablescreate table table_name (

id int,

dt string,

name string

)

partitioned by (date string)

CREATE TABLE `myproject`.mydataset.table_name (

id INT64,

dt DATE,

name STRING

)

PARTITION BY dt

OPTIONS(

partition_expiration_days=3,

description="a table partitioned by date_col"

)

Create table as select (CTAS)CREATE TABLE new_key_value_store

ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"

STORED AS RCFile

AS

SELECT (key % 1024) new_key, concat(key, value) key_value_pair, dt

FROM key_value_store

SORT BY new_key, key_value_pair;

CREATE TABLE `myproject`.mydataset.new_key_value_store

When partitioning by date, uncomment the following:

PARTITION BY dt

OPTIONS(

description="Table Description",

When partitioning by date, uncomment the following. It's recommended to userequire_partition when the table is partitioned.

require_partition_filter=TRUE

) AS

SELECT (key % 1024) new_key, concat(key, value) key_value_pair, dt

FROM key_value_store

SORT BY new_key, key_value_pair'

Create Table Like:

TheLIKE form ofCREATE TABLE lets you copy an existing table definition exactly.

CREATE TABLE empty_key_value_store

LIKE key_value_store [TBLPROPERTIES (property_name=property_value, ...)];

Not supported.
Bucketed sorted tables (clustered in BigQuery terminology)CREATE TABLE page_view(

viewTime INT,

userid BIGINT,

page_url STRING,

referrer_url STRING,

ip STRING COMMENT 'IP Address of the User'

)

COMMENT 'This is the page view table'

PARTITIONED BY(dt STRING, country STRING)

CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\001'

COLLECTION ITEMS TERMINATED BY '\002'

MAP KEYS TERMINATED BY '\003'

STORED AS SEQUENCEFILE;

CREATE TABLE `myproject` mydataset.page_view (

viewTime INT,

dt DATE,

userId BIGINT,

page_url STRING,

referrer_url STRING,

ip STRING OPTIONS (description="IP Address of the User")

)

PARTITION BY dt

CLUSTER BY userId

OPTIONS (

partition_expiration_days=3,

description="This is the page view table",

require_partition_filter=TRUE

)'

For more information, seeCreate and use clustered tables.

Skewed tables (tables where one or more columns have skewed values)CREATE TABLE list_bucket_multiple (col1 STRING, col2 int, col3 STRING)

SKEWED BY (col1, col2) ON (('s1',1), ('s3',3), ('s13',13), ('s78',78)) [STORED AS DIRECTORIES];

Not supported.
Temporary tablesCREATE TEMPORARY TABLE list_bucket_multiple (

col1 STRING,

col2 int,

col3 STRING);

You can achieve this using expiration time as follows:

CREATE TABLE mydataset.newtable

(

col1 STRING OPTIONS(description="An optional INTEGER field"),

col2 INT64,

col3 STRING

)

PARTITION BY DATE(_PARTITIONTIME)

OPTIONS(

expiration_timestamp=TIMESTAMP "2020-01-01 00:00:00 UTC",

partition_expiration_days=1,

description="a table that expires in 2020, with each partition living for 24 hours",

labels=[("org_unit", "development")]

)

Transactional tablesCREATE TRANSACTIONAL TABLE transactional_table_test(key string, value string) PARTITIONED BY(ds string) STORED AS ORC;All table modifications in BigQuery are ACID (atomicity, consistency, isolation, durability) compliant.
Drop tableDROP TABLE [IF EXISTS] table_name [PURGE];{DROP TABLE | DROP TABLE IF EXISTS}

table_name

Truncate tableTRUNCATE TABLE table_name [PARTITION partition_spec];

partition_spec:

: (partition_column = partition_col_value, partition_column = partition_col_value, ...)

Not supported. The following workarounds are available:

  • Drop and create the table again with the same schema.
  • Set write disposition for table toWRITE_TRUNCATE if the truncate operation is a common use case for the given table.
  • Use theCREATE OR REPLACE TABLE statement.
  • Use theDELETE from table_name WHERE 1=1 statement.

Note: Specific partitions can also be truncated. For more information, seeDelete a partition.

CREATE EXTERNAL TABLE andDROP EXTERNAL TABLE statements

For external table support in BigQuery, seeIntroduction to external data sources.

CREATE VIEW andDROP VIEW statements

The following table provides details about convertingCREATE VIEW statementsfrom Hive to BigQuery:

HiveBigQuery
CREATE VIEW [IF NOT EXISTS] [db_name.]view_name [(column_name [COMMENT column_comment], ...) ]

[COMMENT view_comment]

[TBLPROPERTIES (property_name = property_value, ...)]

AS SELECT ...;

{CREATE VIEW | CREATE VIEW IF NOT EXISTS | CREATE OR REPLACE VIEW}

view_name

[OPTIONS(view_option_list)]

AS query_expression

CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_view_name

[DISABLE REWRITE]

[COMMENT materialized_view_comment]

[PARTITIONED ON (col_name, ...)]

[

[ROW FORMAT row_format]

[STORED AS file_format]

| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]

]

[LOCATION hdfs_path]

[TBLPROPERTIES (property_name=property_value, ...)]

AS

;

CREATE MATERIALIZED VIEW [IF NOT EXISTS] \[project_id].[dataset_id].materialized_view_name

-- cannot disable rewrites in BigQuery

[OPTIONS(

[description="materialized_view_comment",] \ [othermaterialized_view_option_list]

)]

[PARTITION BY (col_name)] --same as source table

CREATE FUNCTION andDROP FUNCTION statements

The following table provides details about converting stored procedures fromHive to BigQuery:

HiveBigQuery
CREATE TEMPORARY FUNCTION function_name AS class_name;CREATE { TEMPORARY | TEMP } FUNCTION function_name ([named_parameter[, ...]])

[RETURNS data_type]

AS (sql_expression)

named_parameter:

param_name param_type

DROP TEMPORARY FUNCTION [IF EXISTS] function_name;Not supported.
CREATE FUNCTION [db_name.]function_name AS class_name

[USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];

Supported for allowlisted projects as an alpha feature.

CREATE { FUNCTION | FUNCTION IF NOT EXISTS | OR REPLACE FUNCTION }

function_name ([named_parameter[, ...]])

[RETURNS data_type]

AS (expression);

named_parameter:

param_name param_type

DROP FUNCTION [IF EXISTS] function_name;DROP FUNCTION [ IF EXISTS ] function_name
RELOAD FUNCTION;Not supported.

CREATE MACRO andDROP MACRO statements

The following table provides details about converting procedural SQL statementsused in creating macro from Hive toBigQuery with variable declaration and assignment:

HiveBigQuery
CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression;Not supported. In some cases, this can be substituted with a UDF.
DROP TEMPORARY MACRO [IF EXISTS] macro_name;Not supported.

Error codes and messages

Hive error codes andBigQuery error codes are different. If your application logic is catching errors, eliminate the source of the error because BigQuery doesn't return the same error codes.

In BigQuery, it's common to use theINFORMATION_SCHEMA views oraudit logging to examine errors.

Consistency guarantees and transaction isolation

Both Hive and BigQuery support transactionswith ACID semantics.Transactions areenabled by default in Hive 3.

ACID semantics

Hive supportssnapshot isolation.When you execute a query, the query is provided with a consistent snapshot ofthe database, which it uses until the end of its execution.Hive provides full ACID semantics at the row level,letting one application add rows when another application reads from thesame partition without interfering with each other.

BigQuery providesoptimistic concurrency control (first to commit wins) withsnapshot isolation,in which a query reads the last committed data before the query starts. Thisapproach guarantees the same level of consistency for each row and mutation, andacross rows within the same DML statement, while avoiding deadlocks. Formultiple DML updates to the same table, BigQuery switches topessimistic concurrency control.Load jobs can run independently and append tables; however,BigQuery doesn't provide an explicit transaction boundary orsession.

Transactions

Hive doesn't support multi-statement transactions. Itdoesn't supportBEGIN,COMMIT, andROLLBACK statements. InHive, all language operations are auto-committed.

BigQuery supports multi-statement transactions inside a singlequery or across multiple queries when you use sessions. A multi-statementtransaction lets you perform mutating operations, such as inserting or deletingrows from one or more tables and either committing or rolling back the changes.For more information, seeMulti-statement transactions.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.