- Notifications
You must be signed in to change notification settings - Fork3.4k
Use 20-bit base2 algorithm andwrite.object-storage.partitioned-paths property in Iceberg ObjectStoreLocationProvider#27633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Conversation
854a0a0 tob3eb612Comparefindepi commentedDec 12, 2025
the hash is derived from partitions, or is effectively a random number?
when the partitioning is not disabled, the hash part is written in the path before or after the partitions string? |
findepi commentedDec 12, 2025
does this change require any docs update? |
ebyhr commentedDec 12, 2025
It's derived from partitions + file names. Since the file name includes the query ID and a UUID, it's effectively random.
The hash part comes first. e.g.
Good question. I didn't update the docs because I thought this was mostly an implementation detail. |
| { | ||
| privatestaticfinalHashFunctionHASH_FUNC =murmur3_32_fixed(); | ||
| privatestaticfinalBase64.EncoderBASE64_ENCODER =Base64.getUrlEncoder().withoutPadding(); | ||
| // Length of entropy generated in the file location |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can we also linkLocationProviders code snippethttps://github.com/apache/iceberg/blob/849f218d8d9a6783ba04d2d4eafc07754ec3885c/core/src/main/java/org/apache/iceberg/LocationProviders.java#L131-L136 here ?
These values may seem rather mystical to a naive reader of thetrino-iceberg code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We already have a link here to the class. This is sufficient to me:
Line 30 in400e081
| // based on org.apache.iceberg.LocationProviders.ObjectStoreLocationProvider |
findinpath commentedDec 12, 2025
The functionality stays unchanged. The only benefit ispotentially improved object store layout for AWS S3. |
| this.storageLocation =stripTrailingSlash(dataLocation(properties,tableLocation)); | ||
| // if the storage location is within the table prefix, don't add table and database name context | ||
| this.context =storageLocation.startsWith(stripTrailingSlash(tableLocation)) ?null :pathContext(tableLocation); | ||
| this.includePartitionPaths =propertyAsBoolean(properties,WRITE_OBJECT_STORE_PARTITIONED_PATHS,WRITE_OBJECT_STORE_PARTITIONED_PATHS_DEFAULT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
nit: skimming through the code, I see that the previous maintainers opted forTableProperties. approach. Consider a consistent referencing scheme within the class for the constants fromTableProperties class.
| privatestaticfinalintENTROPY_DIR_DEPTH =3; | ||
| privatefinalStringstorageLocation; | ||
| privatefinalStringcontext; | ||
| privatefinalbooleanincludePartitionPaths; |
findinpathDec 12, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Given that we don't have a dedicated table property in the Trino Iceberg connector, I'm advocating for a compatibility product tests in spark to ensure end to end that Trino does honor thewrite.object-storage.partitioned-paths table property.
The unit test is fine, but it doesn't give us full certainty that the entire write operation (and eventual read operation afterwards) succeeds through the query engine.
b3eb612 tofdb765cComparefindepi commentedDec 12, 2025
if it's effectively just random, i don't get the point of trying to derive it from anything. It's confusing.
in a sense it's not. when using this you may also need to talk to AWS support and have them tweak your S3 bucket configuration. |
Uh oh!
There was an error while loading.Please reload this page.
Description
apache/iceberg#11112 changed the hash scheme from 32-bit base64 to 20-bit base2 inObjectStoreLocationProvider. The first commit follows the upstream change.
The second commit adds support for
write.object-storage.partitioned-pathstable property.The property avoids including partition values in the location if it's disabled.
Release notes