VectorizeRetriever

This notebook shows how to use the LangChain Vectorize retriever.

Vectorize helps you build AI apps faster and with less hassle.It automates data extraction, finds the best vectorization strategy using RAG evaluation,and lets you quickly deploy real-time RAG pipelines for your unstructured data.Your vector search indexes stay up-to-date, and it integrates with your existing vector database,so you maintain full control of your data.Vectorize handles the heavy lifting, freeing you to focus on building robust AI solutions without getting bogged down by data management.

Setup

In the following steps, we'll setup the Vectorize environment and create a RAG pipeline.

Create a Vectorize Account & Get Your Access Token

Sign up for a free Vectorize accounthereGenerate an access token in theAccess Token sectionGather your organization ID. From the browser url, extract the UUID from the URL after /organization/

Configure token and organization ID

import getpass

VECTORIZE_ORG_ID= getpass.getpass("Enter Vectorize organization ID: ")
VECTORIZE_API_TOKEN= getpass.getpass("Enter Vectorize API Token: ")

Installation

This retriever lives in thelangchain-vectorize package:

!pip install-qU langchain-vectorize

Download a PDF file

!wget"https://raw.githubusercontent.com/vectorize-io/vectorize-clients/refs/tags/python-0.1.3/tests/python/tests/research.pdf"

Initialize the vectorize client

import vectorize_clientas v

api= v.ApiClient(v.Configuration(access_token=VECTORIZE_API_TOKEN))

Create a File Upload Source Connector

import json
import os

import urllib3

connectors_api= v.ConnectorsApi(api)
response= connectors_api.create_source_connector(
    VECTORIZE_ORG_ID,[{"type":"FILE_UPLOAD","name":"From API"}]
)
source_connector_id= response.connectors[0].id

Upload the PDF file

file_path="research.pdf"

http= urllib3.PoolManager()
uploads_api= v.UploadsApi(api)
metadata={"created-from-api":True}

upload_response= uploads_api.start_file_upload_to_connector(
    VECTORIZE_ORG_ID,
    source_connector_id,
    v.StartFileUploadToConnectorRequest(
        name=file_path.split("/")[-1],
        content_type="application/pdf",
# add additional metadata that will be stored along with each chunk in the vector database
        metadata=json.dumps(metadata),
),
)

withopen(file_path,"rb")as f:
    response= http.request(
"PUT",
        upload_response.upload_url,
        body=f,
        headers={
"Content-Type":"application/pdf",
"Content-Length":str(os.path.getsize(file_path)),
},
)

if response.status!=200:
print("Upload failed: ", response.data)
else:
print("Upload successful")

Connect to the AI Platform and Vector Database

ai_platforms= connectors_api.get_ai_platform_connectors(VECTORIZE_ORG_ID)
builtin_ai_platform=[
    c.idfor cin ai_platforms.ai_platform_connectorsif c.type=="VECTORIZE"
][0]

vector_databases= connectors_api.get_destination_connectors(VECTORIZE_ORG_ID)
builtin_vector_db=[
    c.idfor cin vector_databases.destination_connectorsif c.type=="VECTORIZE"
][0]

Configure and Deploy the Pipeline

pipelines= v.PipelinesApi(api)
response= pipelines.create_pipeline(
    VECTORIZE_ORG_ID,
    v.PipelineConfigurationSchema(
        source_connectors=[
            v.SourceConnectorSchema(
id=source_connector_id,type="FILE_UPLOAD", config={}
)
],
        destination_connector=v.DestinationConnectorSchema(
id=builtin_vector_db,type="VECTORIZE", config={}
),
        ai_platform=v.AIPlatformSchema(
id=builtin_ai_platform,type="VECTORIZE", config={}
),
        pipeline_name="My Pipeline From API",
        schedule=v.ScheduleSchema(type="manual"),
),
)
pipeline_id= response.data.id

Configure tracing (optional)

If you want to get automated tracing from individual queries, you can also set yourLangSmith API key by uncommenting below:

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

Instantiation

from langchain_vectorize.retrieversimport VectorizeRetriever

retriever= VectorizeRetriever(
    api_token=VECTORIZE_API_TOKEN,
    organization=VECTORIZE_ORG_ID,
    pipeline_id=pipeline_id,
)

Usage

query="Apple Shareholders equity"
retriever.invoke(query, num_results=2)

Use within a chain

Like other retrievers, VectorizeRetriever can be incorporated into LLM applications viachains.

We will need a LLM or chat model:

Selectchat model:

pip install -qU "langchain[google-genai]"

import getpass
import os

ifnot os.environ.get("GOOGLE_API_KEY"):
  os.environ["GOOGLE_API_KEY"]= getpass.getpass("Enter API key for Google Gemini: ")

from langchain.chat_modelsimport init_chat_model

llm= init_chat_model("gemini-2.0-flash", model_provider="google_genai")

from langchain_core.output_parsersimport StrOutputParser
from langchain_core.promptsimport ChatPromptTemplate
from langchain_core.runnablesimport RunnablePassthrough

prompt= ChatPromptTemplate.from_template(
"""Answer the question based only on the context provided.

Context: {context}

Question: {question}"""
)


defformat_docs(docs):
return"\n\n".join(doc.page_contentfor docin docs)


chain=(
{"context": retriever| format_docs,"question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)

API Reference:StrOutputParser |ChatPromptTemplate |RunnablePassthrough

chain.invoke("...")

API reference

For detailed documentation of all VectorizeRetriever features and configurations head to theAPI reference.

Retrieverconceptual guide
Retrieverhow-to guides

Movatterモバイル変換

VectorizeRetriever

Setup

Create a Vectorize Account & Get Your Access Token

Configure token and organization ID

Installation

Download a PDF file

Initialize the vectorize client

Create a File Upload Source Connector

Upload the PDF file

Connect to the AI Platform and Vector Database

Configure and Deploy the Pipeline

Configure tracing (optional)

Instantiation

Usage

Use within a chain

API reference

Related

Movatterモバイル変換

Setup​

Create a Vectorize Account & Get Your Access Token​

Configure token and organization ID​

Installation​

Download a PDF file​

Initialize the vectorize client​

Create a File Upload Source Connector​

Upload the PDF file​

Connect to the AI Platform and Vector Database​

Configure and Deploy the Pipeline​

Configure tracing (optional)​

Instantiation​

Usage​

Use within a chain​

API reference​

Related​

Setup

Create a Vectorize Account & Get Your Access Token

Configure token and organization ID

Installation

Download a PDF file

Initialize the vectorize client

Create a File Upload Source Connector

Upload the PDF file

Connect to the AI Platform and Vector Database

Configure and Deploy the Pipeline

Configure tracing (optional)

Instantiation

Usage

Use within a chain

API reference

Related