Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Fixed geo point block loader slowness#136147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
Kubik42 wants to merge3 commits intoelastic:main
base:main
Choose a base branch
Loading
fromKubik42:kubik-geo-point-block-loader-fix

Conversation

Kubik42
Copy link
Contributor

This addresses#135891

TODO: write up explanation of what went wrong and his this PR fixes it

@elasticsearchmachine
Copy link
Collaborator

Hi@Kubik42, I've created a changelog YAML for you.

@Kubik42Kubik42force-pushed thekubik-geo-point-block-loader-fix branch frome7afa0b to1f168f2CompareOctober 8, 2025 03:10
@Kubik42
Copy link
ContributorAuthor

Kubik42 commentedOct 8, 2025
edited
Loading

The reason for failing tests ishere. Whenever there is noFieldExtractPreference (ie.FieldExtractPreference.NONE, just like in the SDH),geo_points are mapped toBytesRef. This means that when the block loader sanity checker runs, it expects the underlying element type to beBytesRef. However, since we've now started to rely on doc_values, the type is actuallyLONG.

Perhaps the bigger problem here is thatNONE is being assigned whenDOC_VALUES should be the one being assigned. This would normally not be an issue since field types generally use the same underlying types for storing values, regardless of whether they're using doc_values vs stored fields. However, geo_points are special and the way we store them changes. More specifically, if doc_values are available, we store them asLongs and when the stored flag is toggled, we store them asBytesRef; although technically itsStrings, but more on that later.

@Kubik42
Copy link
ContributorAuthor

Kubik42 commentedOct 8, 2025
edited
Loading

In my rabbit hole of understanding how BlockLoaders work, I found a somewhat of a bug thats in main. When ageo_point is specified withstore = true anddoc_values = false, we end up returning an imprecise result.

For example, the following command fails:

./gradlew ":x-pack:plugin:logsdb:yamlRestTest" --tests "org.elasticsearch.xpack.logsdb.LogsdbTestSuiteIT.test {yaml=/51_esql_synthetic_source/Simple from}" -Dtests.seed=76E814C77C112204 -Dtests.locale=xh -Dtests.timezone=Europe/Saratov -Druntime.java=25

with:

        Expected: "POINT (-74.00600004941225 40.712799984030426)"             but: was "POINT (-74.006 40.7128)"            at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)            at org.junit.Assert.assertThat(Assert.java:964)            at org.junit.Assert.assertThat(Assert.java:930)            at org.elasticsearch.test.rest.yaml.section.MatchAssertion.doAssert(MatchAssertion.java:100)            at org.elasticsearch.test.rest.yaml.section.Assertion.execute(Assertion.java:68)            at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:522)            ... 41 more

Based on what I've read in the code, this precision loss isexpected. That being said, its not documentedpublicly.

@Kubik42
Copy link
ContributorAuthor

Kubik42 commentedOct 8, 2025
edited
Loading

There is also a problem with usingBlockStoredFieldsReader.BytesFromStringsBlockLoader(name()) even when the field is stored. This is despite the fact that during indexing, theStoredField is created with a String representation of the geo_point;code.

When block loading, I get the error:Unknown geometry type: 825699888, which comes fromhere.

My guess is that it has something to do with the geo_point being stored as a String, rather than being converted to WKT first. That said, even when I usedWellKnownText.toWKT() I still received a similar error -Unknown geometry type: 1414416719.

For now, I've just followed what we do inGeoShapeWithDocValuesFieldMapper and changed the block loader toBlockStoredFieldsReader.BytesFromBytesRefsBlockLoader, which has worked.

This needs further discussion.

@Kubik42Kubik42force-pushed thekubik-geo-point-block-loader-fix branch from1f168f2 tob586f4cCompareOctober 9, 2025 01:08
}
if (store) {
context.doc().add(newStoredField(fieldType().name(),geometry.toString()));
context.doc()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think we should be changing how geo points are stored as stored fields.

If compute engine is expecting a different type with NONE field extract preference, maybe we should just give that without falling back to source based block loader? By implementing a different block loader for this case that mimics what its synthetic source support does?

I think want to implement a block loader that return WKB binary format that is converted from numeric doc values. This is kind of what happens now with source based block loader for geoip field type (but then by synthesizing all fields).

@Kubik42
Copy link
ContributorAuthor

Kubik42 commentedOct 9, 2025
edited
Loading

I did a little more investigating and found that the reason we end up withFieldExtractPreference.NONE is because we initializeFieldExtractExechere, where bothdocValuesAttributes andboundsAttributes are empty sets. This leads to thefollowing check failing, which ultimately returnsdefaultPreference, which isFieldExtractPreference.NONE.

This got me thinking, why are we setting the preference toNONE for attributes with doc_values? Shouldn't it beDOC_VALUES? If we think about it,NONE implies we can load the values however we want, so we're not restricted to using doc_values for that. This made me notice that insidethis function, the attribute itself contains aDataType, which in turns exposeshasDocValues(). So, ultimately, wouldn't it make more sense to just do:

if (docValuesAttributes.contains(attr) || attr.dataType().hasDocValues()) {    return MappedFieldType.FieldExtractPreference.DOC_VALUES;}

However, doing so causes some type inequality exceptions.

Nhat pointed outthis code, which suggests that doc_values shouldn't be always be used. However, that conflicts with the definition ofFieldExtractPreference.NONE, which states that we can use anything.

@Kubik42Kubik42force-pushed thekubik-geo-point-block-loader-fix branch frome9ba832 to7176d9eCompareOctober 10, 2025 22:58
@Kubik42Kubik42force-pushed thekubik-geo-point-block-loader-fix branch from7176d9e to8573b2fCompareOctober 10, 2025 23:04
@Kubik42Kubik42force-pushed thekubik-geo-point-block-loader-fix branch from4fa26e3 to4ee4a46CompareOctober 11, 2025 01:37
@Kubik42Kubik42force-pushed thekubik-geo-point-block-loader-fix branch fromadc51f9 to5cedfbaCompareOctober 13, 2025 04:52
Copy link
Member

@martijnvgmartijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I left a few comments.

* This implies that we need to load the value from _source. This however is very slow, especially when synthetic source is enabled.
* We're better off reading from doc_values and converting to BytesRef to satisfy the checker. This is what this block loader is for.
*/
staticclassBytesRefFromLongsBlockLoaderextendsBlockDocValuesReader.DocValuesBlockLoader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

seal classes?

privatefinalbooleanhasDocValuesSkipper;

privateIndexType(
IndexType(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think is intended even for tests. Can you use one of the static methods in this class?

}

@Override
publicTestBlockbuild() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

why can this be removed now?


// then
// verify that we use the correct block value reader
assertThat(loader,instanceOf(BlockSourceReader.GeometriesBlockLoader.class));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe I misunderstand this, but stored field and stored preference is handled yet in the block loader method?

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@martijnvgmartijnvgmartijnvg left review comments

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

>bug:StorageEngine/MappingThe storage related side of mappingsTeam:StorageEnginev9.3.0

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@Kubik42@elasticsearchmachine@martijnvg

[8]ページ先頭

©2009-2025 Movatter.jp