Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Use 20-bit base2 algorithm andwrite.object-storage.partitioned-paths property in Iceberg ObjectStoreLocationProvider#27633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
ebyhr wants to merge2 commits intotrinodb:master
base:master
Choose a base branch
Loading
fromebyhr:ebi/iceberg-object-location

Conversation

@ebyhr
Copy link
Member

@ebyhrebyhr commentedDec 12, 2025
edited
Loading

Description

apache/iceberg#11112 changed the hash scheme from 32-bit base64 to 20-bit base2 inObjectStoreLocationProvider. The first commit follows the upstream change.

The second commit adds support forwrite.object-storage.partitioned-paths table property.
The property avoids including partition values in the location if it's disabled.

Release notes

##Iceberg* Add support for`write.object-storage.partitioned-paths` Iceberg table property. ({issue}`27633`)

findepi reacted with thumbs up emoji
@github-actionsgithub-actionsbot added the icebergIceberg connector labelDec 12, 2025
@findepi
Copy link
Member

apache/iceberg#11112 changed the hash scheme from 32-bit base64 to 20-bit base2 inObjectStoreLocationProvider. The first commit follows the upstream change.

the hash is derived from partitions, or is effectively a random number?

The second commit adds support forwrite.object-storage.partitioned-paths table property.
The property avoids including partition values in the location if it's disabled.

when the partitioning is not disabled, the hash part is written in the path before or after the partitions string?

@findepi
Copy link
Member

does this change require any docs update?

@ebyhr
Copy link
MemberAuthor

the hash is derived from partitions, or is effectively a random number?

It's derived from partitions + file names. Since the file name includes the query ID and a UUID, it's effectively random.

when the partitioning is not disabled, the hash part is written in the path before or after the partitions string?

The hash part comes first. e.g.s3://table-location/data/1101/1101/0101/10100010/part=2/...

does this change require any docs update?

Good question. I didn't update the docs because I thought this was mostly an implementation detail.

findepi reacted with thumbs up emoji

{
privatestaticfinalHashFunctionHASH_FUNC =murmur3_32_fixed();
privatestaticfinalBase64.EncoderBASE64_ENCODER =Base64.getUrlEncoder().withoutPadding();
// Length of entropy generated in the file location
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we also linkLocationProviders code snippethttps://github.com/apache/iceberg/blob/849f218d8d9a6783ba04d2d4eafc07754ec3885c/core/src/main/java/org/apache/iceberg/LocationProviders.java#L131-L136 here ?
These values may seem rather mystical to a naive reader of thetrino-iceberg code.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We already have a link here to the class. This is sufficient to me:

// based on org.apache.iceberg.LocationProviders.ObjectStoreLocationProvider

findinpath reacted with thumbs up emoji
@findinpath
Copy link
Contributor

does this change require any docs update?

this was mostly an implementation detail.

The functionality stays unchanged. The only benefit ispotentially improved object store layout for AWS S3.
I think that using a consistent object layout scheme to whatapache/iceberg is using is worth calling in RN.

this.storageLocation =stripTrailingSlash(dataLocation(properties,tableLocation));
// if the storage location is within the table prefix, don't add table and database name context
this.context =storageLocation.startsWith(stripTrailingSlash(tableLocation)) ?null :pathContext(tableLocation);
this.includePartitionPaths =propertyAsBoolean(properties,WRITE_OBJECT_STORE_PARTITIONED_PATHS,WRITE_OBJECT_STORE_PARTITIONED_PATHS_DEFAULT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: skimming through the code, I see that the previous maintainers opted forTableProperties. approach. Consider a consistent referencing scheme within the class for the constants fromTableProperties class.

privatestaticfinalintENTROPY_DIR_DEPTH =3;
privatefinalStringstorageLocation;
privatefinalStringcontext;
privatefinalbooleanincludePartitionPaths;
Copy link
Contributor

@findinpathfindinpathDec 12, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Given that we don't have a dedicated table property in the Trino Iceberg connector, I'm advocating for a compatibility product tests in spark to ensure end to end that Trino does honor thewrite.object-storage.partitioned-paths table property.

The unit test is fine, but it doesn't give us full certainty that the entire write operation (and eventual read operation afterwards) succeeds through the query engine.

@ebyhrebyhrforce-pushed theebi/iceberg-object-location branch fromb3eb612 tofdb765cCompareDecember 12, 2025 13:05
@findepi
Copy link
Member

the hash is derived from partitions, or is effectively a random number?

It's derived from partitions + file names. Since the file name includes the query ID and a UUID, it's effectively random.

if it's effectively just random, i don't get the point of trying to derive it from anything. It's confusing.

does this change require any docs update?

Good question. I didn't update the docs because I thought this was mostly an implementation detail.

in a sense it's not. when using this you may also need to talk to AWS support and have them tweak your S3 bucket configuration.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@raunaqmorarkaraunaqmorarkaraunaqmorarka approved these changes

@findepifindepiAwaiting requested review from findepi

@chenjian2664chenjian2664Awaiting requested review from chenjian2664

+1 more reviewer

@findinpathfindinpathfindinpath left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

cla-signedicebergIceberg connector

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@ebyhr@findepi@findinpath@raunaqmorarka

[8]ページ先頭

©2009-2025 Movatter.jp