Hive 🔗

Iceberg supports reading and writing Iceberg tables throughHive by usingaStorageHandler.

Feature support🔗

The following features matrix illustrates the support for different features across Hive releases for Iceberg tables -

Feature support	Hive 2 / 3	Hive 4
SQL create table	✔️	✔️
SQL create table as select (CTAS)	✔️	✔️
SQL create table like table (CTLT)	✔️	✔️
SQL drop table	✔️	✔️
SQL insert into	✔️	✔️
SQL insert overwrite	✔️	✔️
SQL delete from		✔️
SQL update		✔️
SQL merge into		✔️
Branches and tags		✔️

Iceberg compatibility with Hive 2.x and Hive 3.1.2/3 supports the following features:

Creating a table
Dropping a table
Reading a table
Inserting into a table (INSERT INTO)

Warning

DML operations work only with MapReduce execution engine.

Hive supports the following additional features with Hive version 4.0.0 and above:

Creating an Iceberg identity-partitioned table
Creating an Iceberg table with any partition spec, including the various transforms supported by Iceberg
Creating a table from an existing table (CTAS table)
Altering a table while keeping Iceberg and Hive schemas in sync
Altering the partition schema (updating columns)
Altering the partition schema by specifying partition transforms
Truncating a table / partition, dropping a partition.
Migrating tables in Avro, Parquet, or ORC (Non-ACID) format to Iceberg
Reading the schema of a table.
Querying Iceberg metadata tables.
Time travel applications.
Inserting into a table / partition (INSERT INTO).
Inserting data overwriting existing data (INSERT OVERWRITE) in a table / partition.
Copy-on-write support for delete, update and merge queries, CRUD support for Iceberg V1 tables.
Altering a table with expiring snapshots.
Create a table like an existing table (CTLT table)
Support adding parquet compression type via Table propertiesCompression types
Altering a table metadata location.
Supporting table rollback.
Honors sort orders on existing tables when writing a tableSort orders specification
Creating, writing to and dropping an Iceberg branch / tag.
Allowing expire snapshots by Snapshot ID, by time range, by retention of last N snapshots and using table properties.
Set current snapshot using snapshot ID for an Iceberg table.
Support for renaming an Iceberg table.
Altering a table to convert to an Iceberg table.
Fast forwarding, cherry-picking commit to an Iceberg branch.
Creating a branch from an Iceberg tag.
Set current snapshot using branch/tag for an Iceberg table.
Delete orphan files for an Iceberg table.
Allow full table compaction of Iceberg tables.
Support of showing partition information for Iceberg tables (SHOW PARTITIONS).

Warning

DML operations work only with Tez execution engine.

Enabling Iceberg support in Hive🔗

Hive 4 comes withhive-iceberg that ships Iceberg, so no additional downloads or jars are needed. For older versions of Hive a runtime jar has to be added.

Hive 4.0.0🔗

Hive 4.0.0 comes with the Iceberg 1.4.3 included.

Hive 4.0.0-beta-1🔗

Hive 4.0.0-beta-1 comes with the Iceberg 1.3.0 included.

Hive 4.0.0-alpha-2🔗

Hive 4.0.0-alpha-2 comes with the Iceberg 0.14.1 included.

Hive 4.0.0-alpha-1🔗

Hive 4.0.0-alpha-1 comes with the Iceberg 0.13.1 included.

Hive 2.3.x, Hive 3.1.x🔗

In order to use Hive 2.3.x or Hive 3.1.x, you must load the Iceberg-Hive runtime jar and enable Iceberg support, either globally or for an individual table using a table property.

Loading runtime jar🔗

To enable Iceberg support in Hive, theHiveIcebergStorageHandler and supporting classes need to be made available onHive's classpath. These are provided by theiceberg-hive-runtime jar file. For example, if using the Hive shell, thiscan be achieved by issuing a statement like so:

add jar /path/to/iceberg-hive-runtime.jar;

There are many others ways to achieve this including adding the jar file to Hive's auxiliary classpath so it isavailable by default. Please refer to Hive's documentation for more information.

Enabling support🔗

If the Iceberg storage handler is not in Hive's classpath, then Hive cannot load or update the metadata for an Icebergtable when the storage handler is set. To avoid the appearance of broken tables in Hive, Iceberg will not add thestorage handler to a table unless Hive support is enabled. The storage handler is kept in sync (added or removed) everytime Hive engine support for the table is updated, i.e. turned on or off in the table properties. There are two ways toenable Hive support: globally in Hadoop Configuration and per-table using a table property.

Hadoop configuration🔗

To enable Hive support globally for an application, seticeberg.engine.hive.enabled=true in its Hadoop configuration.For example, setting this in thehive-site.xml loaded by Spark will enable the storage handler for all tables createdby Spark.

Danger

Starting with Apache Iceberg0.11.0, when using Hive with Tez you also have to disable vectorization (hive.vectorized.execution.enabled=false).

Table property configuration🔗

Alternatively, the propertyengine.hive.enabled can be set totrue and added to the table properties when creatingthe Iceberg table. Here is an example of doing it programmatically:

Catalogcatalog=...;Map<String,String>tableProperties=Maps.newHashMap();tableProperties.put(TableProperties.ENGINE_HIVE_ENABLED,"true");// engine.hive.enabled=truecatalog.createTable(tableId,schema,spec,tableProperties);

The table level configuration overrides the global Hadoop configuration.

Hive on Tez configuration🔗

To use the Tez engine on Hive3.1.2 or later, Tez needs to be upgraded to >=0.10.1 which contains a necessary fixTEZ-4248.

To use the Tez engine on Hive2.3.x, you will need to manually build Tez from thebranch-0.9 branch due to abackwards incompatibility issue with Tez0.10.1.

In both cases, you will also need to set the following property in thetez-site.xml configuration file:tez.mrreader.config.update.properties=hive.io.file.readcolumn.names,hive.io.file.readcolumn.ids.

Catalog Management🔗

Global Hive catalog🔗

From the Hive engine's perspective, there is only one global data catalog that is defined in the Hadoop configuration inthe runtime environment. In contrast, Iceberg supports multiple different data catalog types such as Hive, Hadoop, AWSGlue, or custom catalog implementations. Iceberg also allows loading a table directly based on its path in the filesystem. Those tables do not belong to any catalog. Users might want to read these cross-catalog and path-based tablesthrough the Hive engine for use cases like join.

To support this, a table in the Hive metastore can represent three different ways of loading an Iceberg table, dependingon the table'siceberg.catalog property:

The table will be loaded using aHiveCatalog that corresponds to the metastore configured in the Hive environment if noiceberg.catalog is set
The table will be loaded using a custom catalog ificeberg.catalog is set to a catalog name (see below)
The table can be loaded directly using the table's root location ificeberg.catalog is set tolocation_based_table

For cases 2 and 3 above, users can create an overlay of an Iceberg table in the Hive metastore, so that different tabletypes can work together in the same Hive environment. SeeCREATE EXTERNAL TABLEandCREATE TABLE for more details.

Custom Iceberg catalogs🔗

To globally register different catalogs, set the following Hadoop configurations:

Config Key	Description
iceberg.catalog.<catalog_name>.type	type of catalog:`hive`,`hadoop`, or left unset if using a custom catalog
iceberg.catalog.<catalog_name>.catalog-impl	catalog implementation, must not be null if type is empty
iceberg.catalog.<catalog_name>.<key>	any config key and value pairs for the catalog

Here are some examples using Hive CLI:

SET iceberg.catalog.another_hive.type=hive;SET iceberg.catalog.another_hive.uri=thrift://example.com:9083;SET iceberg.catalog.another_hive.clients=10;SET iceberg.catalog.another_hive.warehouse=hdfs://example.com:8020/warehouse;

SET iceberg.catalog.hadoop.type=hadoop;SET iceberg.catalog.hadoop.warehouse=hdfs://example.com:8020/warehouse;

SET iceberg.catalog.glue.type=glue;SET iceberg.catalog.glue.warehouse=s3://my-bucket/my/key/prefix;SET iceberg.catalog.glue.lock.table=myGlueLockTable;

DDL Commands🔗

Not all the features below are supported with Hive 2.3.x and Hive 3.1.x. Please refer to theFeature support paragraph for further details.

One generally applicable difference is that Hive 4.0.0-alpha-1 provides the possibility to useSTORED BY ICEBERG instead of the oldSTORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'

CREATE TABLE🔗

Non partitioned tables🔗

The HiveCREATE EXTERNAL TABLE command creates an Iceberg table when you specify the storage handler as follows:

CREATEEXTERNALTABLEx(iint)STOREDBYICEBERG;

If you want to create external tables using CREATE TABLE, configure the MetaStoreMetadataTransformer on the cluster,andCREATE TABLE commands are transformed to create external tables. For example:

CREATETABLEx(iint)STOREDBYICEBERG;

You can specify the default file format (Avro, Parquet, ORC) at the time of the table creation.The default is Parquet:

CREATETABLEx(iint)STOREDBYICEBERGSTOREDASORC;

Partitioned tables🔗

You can create Iceberg partitioned tables using a command familiar to those who create non-Iceberg tables:

CREATETABLEx(iint)PARTITIONEDBY(jint)STOREDBYICEBERG;

Info

The resulting table does not create partitions in HMS, but instead, converts partition data into Iceberg identity partitions.

Use the DESCRIBE command to get information about the Iceberg identity partitions:

DESCRIBEx;

The result is:

col_name	data_type	comment
i	int
j	int
	NULL	NULL
# Partition Transform Information	NULL	NULL
# col_name	transform_type	NULL
j	IDENTITY	NULL

You can create Iceberg partitions using the following Iceberg partition specification syntax(supported only from Hive 4.0.0-alpha-1):

CREATETABLEx(iint,tstimestamp)PARTITIONEDBYSPEC(month(ts),bucket(2,i))STOREDASICEBERG;DESCRIBEx;

The result is:

col_name	data_type	comment
i	int
ts	timestamp
	NULL	NULL
# Partition Transform Information	NULL	NULL
# col_name	transform_type	NULL
ts	MONTH	NULL
i	BUCKET[2]	NULL

The supported transformations for Hive are the same as for Spark:* years(ts): partition by year* months(ts): partition by month* days(ts) or date(ts): equivalent to dateint partitioning* hours(ts) or date_hour(ts): equivalent to dateint and hour partitioning* bucket(N, col): partition by hashed value mod N buckets* truncate(L, col): partition by value truncated to L - Strings are truncated to the given length - Integers and longs truncate to bins: truncate(10, i) produces partitions 0, 10, 20, 30,

Info

The resulting table does not create partitions in HMS, but instead, converts partition data into Iceberg partitions.

CREATE TABLE AS SELECT🔗

CREATE TABLE AS SELECT operation resembles the native Hive operation with a single important difference.The Iceberg table and the corresponding Hive table are created at the beginning of the query execution.The data is inserted / committed when the query finishes. So for a transient period the table already exists but contains no data.

CREATETABLEtargetPARTITIONEDBYSPEC(year(year_field),identity_field)STOREDBYICEBERGASSELECT*FROMsource;

CREATE TABLE LIKE TABLE🔗

CREATETABLEtargetLIKEsourceSTOREDBYICEBERG;

CREATE EXTERNAL TABLE overlaying an existing Iceberg table🔗

TheCREATE EXTERNAL TABLE command is used to overlay a Hive table "on top of" an existing Iceberg table. Icebergtables are created using either aCatalog, or an implementation of theTables interface, and Hive needs to be configured accordingly to operate on these different types of table.

Hive catalog tables🔗

As described before, tables created by theHiveCatalog with Hive engine feature enabled are directly visible by theHive engine, so there is no need to create an overlay.

Custom catalog tables🔗

For a table in a registered catalog, specify the catalog name in the statement using table propertyiceberg.catalog.For example, the SQL below creates an overlay for a table in ahadoop type catalog namedhadoop_cat:

SETiceberg.catalog.hadoop_cat.type=hadoop;SETiceberg.catalog.hadoop_cat.warehouse=hdfs://example.com:8020/hadoop_cat;CREATEEXTERNALTABLEdatabase_a.table_aSTOREDBY'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'TBLPROPERTIES('iceberg.catalog'='hadoop_cat');

Wheniceberg.catalog is missing from both table properties and the global Hadoop configuration,HiveCatalog will beused as default.

Path-based Hadoop tables🔗

Iceberg tables created usingHadoopTables are stored entirely in a directory in a filesystem like HDFS. These tablesare considered to have no catalog. To indicate that, seticeberg.catalog property tolocation_based_table. Forexample:

CREATEEXTERNALTABLEtable_aSTOREDBY'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'LOCATION'hdfs://some_bucket/some_path/table_a'TBLPROPERTIES('iceberg.catalog'='location_based_table');

CREATE TABLE overlaying an existing Iceberg table🔗

You can also create a new table that is managed by a custom catalog. For example, the following code creates a table ina custom Hadoop catalog:

SETiceberg.catalog.hadoop_cat.type=hadoop;SETiceberg.catalog.hadoop_cat.warehouse=hdfs://example.com:8020/hadoop_cat;CREATETABLEdatabase_a.table_a(idbigint,namestring)PARTITIONEDBY(deptstring)STOREDBY'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'TBLPROPERTIES('iceberg.catalog'='hadoop_cat');

Danger

If the table to create already exists in the custom catalog, this will create a managed overlaytable. This means technically you can omit theEXTERNAL keyword when creating an overlay table. However, this isnotrecommended because creating managed overlay tables could pose a risk to the shared data files in case of accidentaldrop table commands from the Hive side, which would unintentionally remove all the data in the table.

ALTER TABLE🔗

Table properties🔗

For HiveCatalog tables the Iceberg table properties and the Hive table properties stored in HMS are kept in sync.

Info

IMPORTANT: This feature is not available for other Catalog implementations.

ALTERTABLEtSETTBLPROPERTIES('...'='...');

Schema evolution🔗

The Hive table schema is kept in sync with the Iceberg table. If an outside source (Impala/Spark/Java API/etc)changes the schema, the Hive table immediately reflects the changes. You alter the table schema using Hive commands:

Rename a table
```
ALTERTABLEordersRENAMETOrenamed_orders;
```

Add a column

ALTERTABLEordersADDCOLUMNS(nicknamestring);

Rename a column

ALTERTABLEordersCHANGECOLUMNitemfruitstring;

Reorder columns

ALTERTABLEordersCHANGECOLUMNquantityquantityintAFTERprice;

Change a column type - only if the Iceberg defined the column type change as safe
```
ALTERTABLEordersCHANGECOLUMNpricepricelong;
```
Drop column by using REPLACE COLUMN to remove the old column
```
ALTERTABLEordersREPLACECOLUMNS(remainingstring);
```

Info

Note, that dropping columns is only thing REPLACE COLUMNS can be used fori.e. if columns are specified out-of-order an error will be thrown signalling this limitation.

Partition evolution🔗

You change the partitioning schema using the following commands:* Change the partitioning schema to new identity partitions:

ALTERTABLEdefault.customersSETPARTITIONSPEC(last_name);

* Alternatively, provide a partition specification:

ALTERTABLEorderSETPARTITIONSPEC(month(ts));

Table migration🔗

You can migrate Avro / Parquet / ORC external tables to Iceberg tables using the following command:

ALTERTABLEtSETTBLPROPERTIES('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler');

During the migration the data files are not changed, only the appropriate Iceberg metadata files are created.After the migration, handle the table as a normal Iceberg table.

Drop partitions🔗

You can drop partitions based on a single / multiple partition specification using the following commands:

ALTERTABLEordersDROPPARTITION(buy_date=='2023-01-01',market_price>1000),PARTITION(buy_date=='2024-01-01',market_price<=2000);

The partition specification supports only identity-partition columns. Transform columns in partition specification are not supported.

Branches and tags🔗

ALTER TABLE ... CREATE BRANCH

Branches can be created via the CREATE BRANCH statement with the following options:

Create a branch using default properties.
Create a branch at a specific snapshot ID.
Create a branch using system time.
Create a branch with a specified number of snapshot retentions.
Create a branch using specific tag.

-- CREATE branch1 with default properties.ALTERTABLEtestCREATEBRANCHbranch1;-- CREATE branch1 at a specific snapshot ID.ALTERTABLEtestCREATEBRANCHbranch1FORSYSTEM_VERSIONASOF3369973735913135680;-- CREATE branch1 using system time.ALTERTABLEtestCREATEBRANCHbranch1FORSYSTEM_TIMEASOF'2023-09-16 09:46:38.939 Etc/UTC';-- CREATE branch1 with a specified number of snapshot retentions.ALTERTABLEtestCREATEBRANCHbranch1FORSYSTEM_VERSIONASOF3369973735913135680WITHSNAPSHOTRETENTION5SNAPSHOTS;-- CREATE branch1 using a specific tag.ALTERTABLEtestCREATEBRANCHbranch1FORTAGASOFtag1;

ALTER TABLE ... CREATE TAG

Tags can be created via the CREATE TAG statement with the following options:

Create a tag using default properties.
Create a tag at a specific snapshot ID.
Create a tag using system time.

-- CREATE tag1 with default properties.ALTERTABLEtestCREATETAGtag1;-- CREATE tag1 at a specific snapshot ID.ALTERTABLEtestCREATETAGtag1FORSYSTEM_VERSIONASOF3369973735913135680;-- CREATE tag1 using system time.ALTERTABLEtestCREATETAGtag1FORSYSTEM_TIMEASOF'2023-09-16 09:46:38.939 Etc/UTC';

ALTER TABLE ... DROP BRANCH

Branches can be dropped via the DROP BRANCH statement with the following options:

Do not fail if the branch does not exist with IF EXISTS

-- DROP branch1ALTERTABLEtestDROPBRANCHbranch1;-- DROP branch1 IF EXISTSALTERTABLEtestDROPBRANCHIFEXISTSbranch1;

ALTER TABLE ... DROP TAG

Tags can be dropped via the DROP TAG statement with the following options:

Do not fail if the tag does not exist with IF EXISTS

-- DROP tag1ALTERTABLEtestDROPTAGtag1;-- DROP tag1 IF EXISTSALTERTABLEtestDROPTAGIFEXISTStag1;

ALTER TABLE ... EXECUTE FAST-FORWARD

An iceberg branch which is an ancestor of another branch can be fast-forwarded to the state of the other branch.

-- This fast-forwards the branch1 to the state of main branch of the Iceberg table.ALTERtabletestEXECUTEFAST-FORWARD'branch1''main';-- This fast-forwards the branch1 to the state of branch2.ALTERtabletestEXECUTEFAST-FORWARD'branch1''branch2';

`ALTER TABLE ... EXECUTE CHERRY-PICK`🔗

Cherry-pick of a snapshot requires the ID of the snapshot. Cherry-pick of snapshots as of now is supported only on the main branch of an Iceberg table.

ALTERtabletestEXECUTECHERRY-PICK8602659039622823857;

TRUNCATE TABLE🔗

The following command truncates the Iceberg table:

TRUNCATETABLEt;

TRUNCATE TABLE ... PARTITION🔗

The following command truncates the partition in an Iceberg table:

TRUNCATETABLEordersPARTITION(customer_id=1,first_name='John');

The partition specification supports only identity-partition columns. Transform columns in partition specification are not supported.

DROP TABLE🔗

Tables can be dropped using theDROP TABLE command:

DROPTABLE[IFEXISTS]table_name[PURGE];

METADATA LOCATION🔗

The metadata location (snapshot location) only can be changed if the new path contains the exact same metadata json. It can be done only after migrating the table to Iceberg, the two operation cannot be done in one step.

ALTERTABLEtsetTBLPROPERTIES('metadata_location'='<path>/hivemetadata/00003-a1ada2b8-fc86-4b5b-8c91-400b6b46d0f2.metadata.json');

DML Commands🔗

SELECT🔗

Select statements work the same on Iceberg tables in Hive. You will see the Iceberg benefits over Hive in compilation and execution:

No file system listings - especially important on blob stores, like S3
No partition listing from the Metastore
Advanced partition filtering - the partition keys are not needed in the queries when they could be calculated
Could handlehigher number of partitions than normal Hive tables

Here are the features highlights for Iceberg Hive read support:

Predicate pushdown: Pushdown of the Hive SQLWHERE clause has been implemented so that these filters are used at the IcebergTableScan level as well as by the Parquet and ORC Readers.
Column projection: Columns from the Hive SQLSELECT clause are projected down to the Iceberg readers to reduce the number of columns read.
Hive query engines:
With Hive 2.3.x, 3.1.x both the MapReduce and Tez query execution engines are supported.
With Hive 4.0.0-alpha-1 Tez query execution engine is supported.

Some of the advanced / little used optimizations are not yet implemented for Iceberg tables, so you should check your individual queries.Also currently the statistics stored in the MetaStore are used for query planning. This is something we are planning to improve in the future.

Hive 4 supports select operations on branches which also work similar to the table level select operations. However, the branch must be provided as follows -

-- Branches should be specified as <database_name>.<table_name>.branch_<branch_name>SELECT*FROMdefault.test.branch_branch1;

INSERT INTO🔗

Hive supports the standard single-table INSERT INTO operation:

INSERTINTOtable_aVALUES('a',1);INSERTINTOtable_aSELECT...;

Multi-table insert is also supported, but it will not be atomic. Commits occur one table at a time.Partial changes will be visible during the commit process and failures can leave partial changes committed.Changes within a single table will remain atomic.

Insert-into operations on branches also work similar to the table level select operations. However, the branch must be provided as follows -

-- Branches should be specified as <database_name>.<table_name>.branch_<branch_name>INSERTINTOdefault.test.branch_branch1VALUES('a',1);INSERTINTOdefault.test.branch_branch1SELECT...;

Here is an example of inserting into multiple tables at once in Hive SQL:

FROMcustomersINSERTINTOtarget1SELECTcustomer_id,first_nameINSERTINTOtarget2SELECTlast_name,customer_id;

INSERT INTO ... PARTITION🔗

Hive 4 supports partition-level INSERT INTO operation:

INSERTINTOtable_aPARTITION(customer_id=1,first_name='John')VALUES(1,2);INSERTINTOtable_aPARTITION(customer_id=1,first_name='John')SELECT...;

The partition specification supports only identity-partition columns. Transform columns in partition specification are not supported.

INSERT OVERWRITE🔗

INSERT OVERWRITE can replace data in the table with the result of a query. Overwrites are atomic operations for Iceberg tables.For nonpartitioned tables the content of the table is always removed. For partitioned tables the partitionsthat have rows produced by the SELECT query will be replaced.

INSERTOVERWRITETABLEtargetSELECT*FROMsource;

INSERT OVERWRITE ... PARTITION🔗

Hive 4 supports partition-level INSERT OVERWRITE operation:

INSERTOVERWRITETABLEtargetPARTITION(customer_id=1,first_name='John')SELECT*FROMsource;

The partition specification supports only identity-partition columns. Transform columns in partition specification are not supported.

DELETE FROM🔗

Hive 4 supports DELETE FROM queries to remove data from tables.

Delete queries accept a filter to match rows to delete.

DELETEFROMtargetWHEREid>1ANDid<10;DELETEFROMtargetWHEREidIN(SELECTidFROMsource);DELETEFROMtargetWHEREidIN(SELECTmin(customer_id)FROMsource);

If the delete filter matches entire partitions of the table, Iceberg will perform a metadata-only delete. If the filter matches individual rows of a table, then Iceberg will rewrite only the affected data files.

UPDATE🔗

Hive 4 supports UPDATE queries which accept a filter to match rows to update.

UPDATEtargetSETfirst_name='Raj'WHEREid>1ANDid<10;UPDATEtargetSETfirst_name='Raj'WHEREidIN(SELECTidFROMsource);UPDATEtargetSETfirst_name='Raj'WHEREidIN(SELECTmin(customer_id)FROMsource);

For more complex row-level updates based on incoming data, see the section on MERGE INTO.

MERGE INTO🔗

Hive 4 added support for MERGE INTO queries that can express row-level updates.

MERGE INTO updates a table, called the target table, using a set of updates from another query, called the source. The update for a row in the target table is found using the ON clause that is like a join condition.

MERGEINTOtargetASt-- a target tableUSINGsources-- the source updatesONt.id=s.id-- condition to find updates for target rowsWHEN...-- updates

Updates to rows in the target table are listed using WHEN MATCHED ... THEN .... Multiple MATCHED clauses can be added with conditions that determine when each match should be applied. The first matching expression is used.

WHENMATCHEDANDs.op='delete'THENDELETEWHENMATCHEDANDt.countISNULLANDs.op='increment'THENUPDATESETt.count=0WHENMATCHEDANDs.op='increment'THENUPDATESETt.count=t.count+1

Source rows (updates) that do not match can be inserted:

WHENNOTMATCHEDTHENINSERTVALUES(s.a,s.b,s.c)

Only one record in the source data can update any given row of the target table, or else an error will be thrown.

QUERYING METADATA TABLES🔗

Hive supports querying of the Iceberg Metadata tables. The tables could be used as normalHive tables, so it is possible to use projections / joins / filters / etc.To reference a metadata table the full name of the table should be used, like:...

Currently the following metadata tables are available in Hive:

all_data_files
all_delete_files
all_entries all_files
all_manifests
data_files
delete_files
entries
files
manifests
metadata_log_entries
partitions
refs
snapshots

SELECT*FROMdefault.table_a.files;

TIMETRAVEL🔗

Hive supports snapshot id based and time base timetravel queries.For these views it is possible to use projections / joins / filters / etc.The function is available with the following syntax:

SELECT*FROMtable_aFORSYSTEM_TIMEASOF'2021-08-09 10:35:57';SELECT*FROMtable_aFORSYSTEM_VERSIONASOF1234567;

You can expire snapshots of an Iceberg table using an ALTER TABLE query from Hive. You should periodically expire snapshots to delete data files that is no longer needed, and reduce the size of table metadata.

Each write to an Iceberg table from Hive creates a new snapshot, or version, of a table. Snapshots can be used for time-travel queries, or the table can be rolled back to any valid snapshot. Snapshots accumulate until they are expired by the expire_snapshots operation.Enter a query to expire snapshots having the following timestamp:2021-12-09 05:39:18.689000000

ALTERTABLEtest_tableEXECUTEexpire_snapshots('2021-12-09 05:39:18.689000000');

Type compatibility🔗

Hive and Iceberg support different set of types. Iceberg can perform type conversion automatically, but not for allcombinations, so you may want to understand the type conversion in Iceberg in prior to design the types of columns inyour tables. You can enable auto-conversion through Hadoop configuration (not enabled by default):

Config key	Default	Description
iceberg.mr.schema.auto.conversion	false	if Hive should perform type auto-conversion

Hive type to Iceberg type🔗

This type conversion table describes how Hive types are converted to the Iceberg types. The conversion applies on bothcreating Iceberg table and writing to Iceberg table via Hive.

Hive	Iceberg	Notes
boolean	boolean
short	integer	auto-conversion
byte	integer	auto-conversion
integer	integer
long	long
float	float
double	double
date	date
timestamp	timestamp without timezone
timestamplocaltz	timestamp with timezone	Hive 3 only
interval_year_month		not supported
interval_day_time		not supported
char	string	auto-conversion
varchar	string	auto-conversion
string	string
binary	binary
decimal	decimal
struct	struct
list	list
map	map
union		not supported

Table rollback🔗

Rolling back iceberg table's data to the state at an older table snapshot.

Rollback to the last snapshot before a specific timestamp

ALTERTABLEice_tEXECUTEROLLBACK('2022-05-12 00:00:00')

Rollback to a specific snapshot ID

ALTERTABLEice_tEXECUTEROLLBACK(1111);

Compaction🔗

Hive 4 supports full table compaction of Iceberg tables using the following commands:* Using theALTER TABLE ... COMPACT syntax* Using theOPTIMIZE TABLE ... REWRITE DATA syntax

-- Using the ALTER TABLE ... COMPACT syntaxALTERTABLEtCOMPACT'major';-- Using the OPTIMIZE TABLE ... REWRITE DATA syntaxOPTIMIZETABLEtREWRITEDATA;

Both these syntax have the same effect of performing full table compaction on an Iceberg table.

Movatterモバイル変換

Hive🔗

Feature support🔗

Enabling Iceberg support in Hive🔗

Hive 4.0.0🔗

Hive 4.0.0-beta-1🔗

Hive 4.0.0-alpha-2🔗

Hive 4.0.0-alpha-1🔗

Hive 2.3.x, Hive 3.1.x🔗

Loading runtime jar🔗

Enabling support🔗

Hadoop configuration🔗

Table property configuration🔗

Hive on Tez configuration🔗

Catalog Management🔗

Global Hive catalog🔗

Custom Iceberg catalogs🔗

DDL Commands🔗

CREATE TABLE🔗

Non partitioned tables🔗

Partitioned tables🔗

CREATE TABLE AS SELECT🔗

CREATE TABLE LIKE TABLE🔗

CREATE EXTERNAL TABLE overlaying an existing Iceberg table🔗

Hive catalog tables🔗

Custom catalog tables🔗

Path-based Hadoop tables🔗

CREATE TABLE overlaying an existing Iceberg table🔗

ALTER TABLE🔗

Table properties🔗

Schema evolution🔗

Partition evolution🔗

Table migration🔗

Drop partitions🔗

Branches and tags🔗

ALTER TABLE ... EXECUTE CHERRY-PICK🔗

TRUNCATE TABLE🔗

TRUNCATE TABLE ... PARTITION🔗

DROP TABLE🔗

METADATA LOCATION🔗

DML Commands🔗

SELECT🔗

INSERT INTO🔗

INSERT INTO ... PARTITION🔗

INSERT OVERWRITE🔗

INSERT OVERWRITE ... PARTITION🔗

DELETE FROM🔗

UPDATE🔗

MERGE INTO🔗

QUERYING METADATA TABLES🔗

TIMETRAVEL🔗

Type compatibility🔗

Hive type to Iceberg type🔗

Table rollback🔗

Compaction🔗

Hive 🔗

`ALTER TABLE ... EXECUTE CHERRY-PICK`🔗