Intel® Extension for Transformers Quantized Text Embeddings

Load quantized BGE embedding models generated byIntel® Extension for Transformers (ITREX) and use ITREXNeural Engine, a high-performance NLP backend, to accelerate the inference of models without compromising accuracy.

Refer to our blog ofEfficient Natural Language Embedding Models with Intel Extension for Transformers andBGE optimization example for more details.

from langchain_community.embeddingsimport QuantizedBgeEmbeddings

model_name="Intel/bge-small-en-v1.5-sts-int8-static-inc"
encode_kwargs={"normalize_embeddings":True}# set True to compute cosine similarity

model= QuantizedBgeEmbeddings(
    model_name=model_name,
    encode_kwargs=encode_kwargs,
    query_instruction="Represent this sentence for searching relevant passages: ",
)

API Reference:QuantizedBgeEmbeddings

/home/yuwenzho/.conda/envs/bge/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
2024-03-04 10:17:17 [INFO] Start to extarct onnx model ops...
2024-03-04 10:17:17 [INFO] Extract onnxruntime model done...
2024-03-04 10:17:17 [INFO] Start to implement Sub-Graph matching and replacing...
2024-03-04 10:17:18 [INFO] Sub-Graph match and replace done...

usage

text="This is a test document."
query_result= model.embed_query(text)
doc_result= model.embed_documents([text])

Embedding modelconceptual guide
Embedding modelhow-to guides

Movatterモバイル変換

Intel® Extension for Transformers Quantized Text Embeddings

usage

Related

Movatterモバイル変換

usage​

Related​

usage

Related