NotificationsYou must be signed in to change notification settings
Fork3.4k
Star12.3k

Use 20-bit base2 algorithm and`write.object-storage.partitioned-paths` property in Iceberg ObjectStoreLocationProvider#27633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

ebyhr wants to merge2 commits intotrinodb:master

base:master

Choose a base branch

fromebyhr:ebi/iceberg-object-location

Open

Use 20-bit base2 algorithm and`write.object-storage.partitioned-paths` property in Iceberg ObjectStoreLocationProvider#27633

ebyhr wants to merge2 commits intotrinodb:masterfromebyhr:ebi/iceberg-object-location

+128 −16

Conversation

Copy link

Member

ebyhr commentedDec 12, 2025•
edited
Loading

Description

apache/iceberg#11112 changed the hash scheme from 32-bit base64 to 20-bit base2 inObjectStoreLocationProvider. The first commit follows the upstream change.

The second commit adds support forwrite.object-storage.partitioned-paths table property.
The property avoids including partition values in the location if it's disabled.

Release notes

##Iceberg* Add support for`write.object-storage.partitioned-paths` Iceberg table property. ({issue}`27633`)

cla-botbot added the cla-signed label

Dec 12, 2025

github-actionsbot added the icebergIceberg connector label

Dec 12, 2025

Use 20-bit base2 algorithm in Iceberg ObjectStoreLocationProvider

c417e04

ebyhr force-pushed theebi/iceberg-object-location branch from854a0a0 tob3eb612Compare

December 12, 2025 07:55

ebyhr requested review fromchenjian2664,findepi,findinpath andraunaqmorarka

December 12, 2025 08:58

raunaqmorarka approved these changes

Dec 12, 2025

View reviewed changes

Copy link

Member

findepi commentedDec 12, 2025

apache/iceberg#11112 changed the hash scheme from 32-bit base64 to 20-bit base2 inObjectStoreLocationProvider. The first commit follows the upstream change.

the hash is derived from partitions, or is effectively a random number?

The second commit adds support forwrite.object-storage.partitioned-paths table property.
The property avoids including partition values in the location if it's disabled.

when the partitioning is not disabled, the hash part is written in the path before or after the partitions string?

Copy link

Member

findepi commentedDec 12, 2025

does this change require any docs update?

Copy link

MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we also linkLocationProviders code snippethttps://github.com/apache/iceberg/blob/849f218d8d9a6783ba04d2d4eafc07754ec3885c/core/src/main/java/org/apache/iceberg/LocationProviders.java#L131-L136 here ?
These values may seem rather mystical to a naive reader of thetrino-iceberg code.

Copy link

MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We already have a link here to the class. This is sufficient to me:

trino/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/ObjectStoreLocationProvider.java

Line 30 in400e081

// based on org.apache.iceberg.LocationProviders.ObjectStoreLocationProvider

Copy link

Contributor

findinpath commentedDec 12, 2025

does this change require any docs update?

this was mostly an implementation detail.

The functionality stays unchanged. The only benefit ispotentially improved object store layout for AWS S3.
I think that using a consistent object layout scheme to whatapache/iceberg is using is worth calling in RN.

findinpath reviewed

Dec 12, 2025

View reviewed changes

...in/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/ObjectStoreLocationProvider.java

		this.storageLocation =stripTrailingSlash(dataLocation(properties,tableLocation));
		// if the storage location is within the table prefix, don't add table and database name context
		this.context =storageLocation.startsWith(stripTrailingSlash(tableLocation)) ?null :pathContext(tableLocation);
		this.includePartitionPaths =propertyAsBoolean(properties,WRITE_OBJECT_STORE_PARTITIONED_PATHS,WRITE_OBJECT_STORE_PARTITIONED_PATHS_DEFAULT);

Copy link

Contributor

findinpathDec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: skimming through the code, I see that the previous maintainers opted forTableProperties. approach. Consider a consistent referencing scheme within the class for the constants fromTableProperties class.

findinpath reviewed

Dec 12, 2025

View reviewed changes

...in/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/ObjectStoreLocationProvider.java

		privatestaticfinalintENTROPY_DIR_DEPTH =3;
		privatefinalStringstorageLocation;
		privatefinalStringcontext;
		privatefinalbooleanincludePartitionPaths;

Copy link

Contributor

findinpathDec 12, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Given that we don't have a dedicated table property in the Trino Iceberg connector, I'm advocating for a compatibility product tests in spark to ensure end to end that Trino does honor thewrite.object-storage.partitioned-paths table property.

The unit test is fine, but it doesn't give us full certainty that the entire write operation (and eventual read operation afterwards) succeeds through the query engine.

Respect write.object-storage.partitioned-paths in Iceberg

fdb765c

ebyhr force-pushed theebi/iceberg-object-location branch fromb3eb612 tofdb765cCompare

December 12, 2025 13:05

Copy link

Member

findepi commentedDec 12, 2025

the hash is derived from partitions, or is effectively a random number?
It's derived from partitions + file names. Since the file name includes the query ID and a UUID, it's effectively random.

if it's effectively just random, i don't get the point of trying to derive it from anything. It's confusing.

does this change require any docs update?
Good question. I didn't update the docs because I thought this was mostly an implementation detail.

in a sense it's not. when using this you may also need to talk to AWS support and have them tweak your S3 bucket configuration.

Labels

cla-signed iceberg

Iceberg connector

		{
		privatestaticfinalHashFunctionHASH_FUNC =murmur3_32_fixed();
		privatestaticfinalBase64.EncoderBASE64_ENCODER =Base64.getUrlEncoder().withoutPadding();
		// Length of entropy generated in the file location

Movatterモバイル変換

Use 20-bit base2 algorithm andwrite.object-storage.partitioned-paths property in Iceberg ObjectStoreLocationProvider#27633

Are you sure you want to change the base?

Use 20-bit base2 algorithm andwrite.object-storage.partitioned-paths property in Iceberg ObjectStoreLocationProvider#27633

Conversation

ebyhr commentedDec 12, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Description

Release notes

Uh oh!

findepi commentedDec 12, 2025

Uh oh!

findepi commentedDec 12, 2025

Uh oh!

ebyhr commentedDec 12, 2025

Uh oh!

findinpathDec 12, 2025

Choose a reason for hiding this comment

Uh oh!

ebyhrDec 12, 2025

Choose a reason for hiding this comment

Uh oh!

findinpath commentedDec 12, 2025

Uh oh!

findinpathDec 12, 2025

Choose a reason for hiding this comment

Uh oh!

findinpathDec 12, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

findepi commentedDec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Use 20-bit base2 algorithm and`write.object-storage.partitioned-paths` property in Iceberg ObjectStoreLocationProvider#27633

Use 20-bit base2 algorithm and`write.object-storage.partitioned-paths` property in Iceberg ObjectStoreLocationProvider#27633

ebyhr commentedDec 12, 2025•
edited
Loading

findinpathDec 12, 2025•
edited
Loading