Movatterモバイル変換


[0]ホーム

URL:


Loading
  1. Elastic Docs/
  2. Reference/
  3. Elasticsearch/
  4. Mapping/
  5. Field data types/
  6. Semantic text/
  7. How-to guides

Ingest data withsemantic_text fields

This page provides instructions for ingesting data intosemantic_text fields. Learn how to index pre-chunked content, usecopy_to and multi-fields to collect values from multiple fields, and perform updates and partial updates to optimize ingestion costs.

To index pre-chunked content, provide your text as an array of strings. Each element in the array represents a single chunk that will be sent directly to the inference service without further chunking.

  1. Disable automatic chunking

    Disable automatic chunking in your index mapping by settingchunking_settings.strategy tonone:

    PUT test-index{  "mappings": {    "properties": {      "my_semantic_field": {        "type": "semantic_text",        "chunking_settings": {          "strategy": "none"        }      }    }  }}
    1. Disables automatic chunking onmy_semantic_field.
  2. Index documents

    Index documents with pre-chunked text as an array:

    PUT test-index/_doc/1{    "my_semantic_field": ["my first chunk", "my second chunk", ...]    ...}
    1. The text is pre-chunked and provided as an array of strings. Each element represents a single chunk.
Important

When providing pre-chunked input:

  • Ensure that you set the chunking strategy tonone to avoid additional processing.
  • Size each chunk carefully, staying within the token limit of the inference service and the underlying model.
  • If a chunk exceeds the model's token limit, the behavior depends on the service:
    • Some services (such as OpenAI) will return an error.
    • Others (such aselastic andelasticsearch) will automatically truncate the input.

You can use a singlesemantic_text field to collect values from multiple fields for semantic search. Thesemantic_text field type can serve as the target ofcopy_to fields, be part of amulti-field structure, or containmulti-fields internally.

Usecopy_to to copy values from source fields to asemantic_text field:

PUT test-index{    "mappings": {        "properties": {            "source_field": {                "type": "text",                "copy_to": "infer_field"            },            "infer_field": {                "type": "semantic_text",                "inference_id": ".elser-2-elasticsearch"            }        }    }}

Declaresemantic_text as a multi-field:

PUT test-index{    "mappings": {        "properties": {            "source_field": {                "type": "text",                "fields": {                    "infer_field": {                        "type": "semantic_text",                        "inference_id": ".elser-2-elasticsearch"                    }                }            }        }    }}

When updating documents that containsemantic_text fields, it's important to understand how inference is triggered:

Full document updates
Full document updates re-run inference on allsemantic_text fields, even if their values did not change. This ensures that embeddings remain consistent with the current document state but can increase ingestion costs.
Partial updates using the Bulk API
Partial updates submitted through theBulk API reuse existing embeddings when you omitsemantic_text fields. inference does not run for omitted fields, which can significantly reduce processing time and cost.
Partial updates using the Update API
Partial updates submitted through theUpdate API re-run inference on allsemantic_text fields, even when you omit them from thedoc object. Embeddings are re-generated regardless of whether field values changed.

To preserve existing embeddings and avoid unnecessary inference costs:

  • Use partial updates with the Bulk API.
  • Omit anysemantic_text fields that did not change from thedoc object in your request.

For indices containingsemantic_text fields, updates that use scripts have thefollowing behavior:

  • Supported:Update API
  • Not supported:Bulk API. Scripted updates will fail even if the script targets non-semantic_text fields.

[8]ページ先頭

©2009-2026 Movatter.jp