Embedding Documents using Optimized and Quantized Embedders

Embedding all documents using Quantized Embedders.

The embedders are based on optimized models, created by usingoptimum-intel andIPEX.

Example text is based onSBERT.

from langchain_community.embeddingsimport QuantizedBiEncoderEmbeddings

model_name="Intel/bge-small-en-v1.5-rag-int8-static"
encode_kwargs={"normalize_embeddings":True}# set True to compute cosine similarity

model= QuantizedBiEncoderEmbeddings(
    model_name=model_name,
    encode_kwargs=encode_kwargs,
    query_instruction="Represent this sentence for searching relevant passages: ",
)

API Reference:QuantizedBiEncoderEmbeddings

loading configuration file inc_config.json from cache at
INCConfig {
  "distillation": {},
  "neural_compressor_version": "2.4.1",
  "optimum_version": "1.16.2",
  "pruning": {},
  "quantization": {
    "dataset_num_samples": 50,
    "is_static": true
  },
  "save_onnx_model": false,
  "torch_version": "2.2.0",
  "transformers_version": "4.37.2"
}

Using `INCModel` to load a TorchScript model will be deprecated in v1.15.0, to load your model please use `IPEXModel` instead.

Let's ask a question, and compare to 2 documents. The first contains the answer to the question, and the second one does not.

We can check better suits our query.

question="How many people live in Berlin?"

documents=[
"Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.",
"Berlin is well known for its museums.",
]

doc_vecs= model.embed_documents(documents)

Batches: 100%|██████████| 1/1 [00:00<00:00,  4.18it/s]

query_vec= model.embed_query(question)

import torch

doc_vecs_torch= torch.tensor(doc_vecs)

query_vec_torch= torch.tensor(query_vec)

query_vec_torch @ doc_vecs_torch.T

tensor([0.7980, 0.6529])

We can see that indeed the first one ranks higher.

Embedding modelconceptual guide
Embedding modelhow-to guides

Movatterモバイル変換

Embedding Documents using Optimized and Quantized Embedders

Related

Movatterモバイル変換

Related​

Related