VectorizeRetriever
This notebook shows how to use the LangChain Vectorize retriever.
Vectorize helps you build AI apps faster and with less hassle.It automates data extraction, finds the best vectorization strategy using RAG evaluation,and lets you quickly deploy real-time RAG pipelines for your unstructured data.Your vector search indexes stay up-to-date, and it integrates with your existing vector database,so you maintain full control of your data.Vectorize handles the heavy lifting, freeing you to focus on building robust AI solutions without getting bogged down by data management.
Setup
In the following steps, we'll setup the Vectorize environment and create a RAG pipeline.
Create a Vectorize Account & Get Your Access Token
Sign up for a free Vectorize accounthereGenerate an access token in theAccess Token sectionGather your organization ID. From the browser url, extract the UUID from the URL after /organization/
Configure token and organization ID
import getpass
VECTORIZE_ORG_ID= getpass.getpass("Enter Vectorize organization ID: ")
VECTORIZE_API_TOKEN= getpass.getpass("Enter Vectorize API Token: ")
Installation
This retriever lives in thelangchain-vectorize
package:
!pip install-qU langchain-vectorize
Download a PDF file
!wget"https://raw.githubusercontent.com/vectorize-io/vectorize-clients/refs/tags/python-0.1.3/tests/python/tests/research.pdf"
Initialize the vectorize client
import vectorize_clientas v
api= v.ApiClient(v.Configuration(access_token=VECTORIZE_API_TOKEN))
Create a File Upload Source Connector
import json
import os
import urllib3
connectors_api= v.ConnectorsApi(api)
response= connectors_api.create_source_connector(
VECTORIZE_ORG_ID,[{"type":"FILE_UPLOAD","name":"From API"}]
)
source_connector_id= response.connectors[0].id
Upload the PDF file
file_path="research.pdf"
http= urllib3.PoolManager()
uploads_api= v.UploadsApi(api)
metadata={"created-from-api":True}
upload_response= uploads_api.start_file_upload_to_connector(
VECTORIZE_ORG_ID,
source_connector_id,
v.StartFileUploadToConnectorRequest(
name=file_path.split("/")[-1],
content_type="application/pdf",
# add additional metadata that will be stored along with each chunk in the vector database
metadata=json.dumps(metadata),
),
)
withopen(file_path,"rb")as f:
response= http.request(
"PUT",
upload_response.upload_url,
body=f,
headers={
"Content-Type":"application/pdf",
"Content-Length":str(os.path.getsize(file_path)),
},
)
if response.status!=200:
print("Upload failed: ", response.data)
else:
print("Upload successful")
Connect to the AI Platform and Vector Database
ai_platforms= connectors_api.get_ai_platform_connectors(VECTORIZE_ORG_ID)
builtin_ai_platform=[
c.idfor cin ai_platforms.ai_platform_connectorsif c.type=="VECTORIZE"
][0]
vector_databases= connectors_api.get_destination_connectors(VECTORIZE_ORG_ID)
builtin_vector_db=[
c.idfor cin vector_databases.destination_connectorsif c.type=="VECTORIZE"
][0]
Configure and Deploy the Pipeline
pipelines= v.PipelinesApi(api)
response= pipelines.create_pipeline(
VECTORIZE_ORG_ID,
v.PipelineConfigurationSchema(
source_connectors=[
v.SourceConnectorSchema(
id=source_connector_id,type="FILE_UPLOAD", config={}
)
],
destination_connector=v.DestinationConnectorSchema(
id=builtin_vector_db,type="VECTORIZE", config={}
),
ai_platform=v.AIPlatformSchema(
id=builtin_ai_platform,type="VECTORIZE", config={}
),
pipeline_name="My Pipeline From API",
schedule=v.ScheduleSchema(type="manual"),
),
)
pipeline_id= response.data.id
Configure tracing (optional)
If you want to get automated tracing from individual queries, you can also set yourLangSmith API key by uncommenting below:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
Instantiation
from langchain_vectorize.retrieversimport VectorizeRetriever
retriever= VectorizeRetriever(
api_token=VECTORIZE_API_TOKEN,
organization=VECTORIZE_ORG_ID,
pipeline_id=pipeline_id,
)
Usage
query="Apple Shareholders equity"
retriever.invoke(query, num_results=2)
Use within a chain
Like other retrievers, VectorizeRetriever can be incorporated into LLM applications viachains.
We will need a LLM or chat model:
pip install -qU "langchain[google-genai]"
import getpass
import os
ifnot os.environ.get("GOOGLE_API_KEY"):
os.environ["GOOGLE_API_KEY"]= getpass.getpass("Enter API key for Google Gemini: ")
from langchain.chat_modelsimport init_chat_model
llm= init_chat_model("gemini-2.0-flash", model_provider="google_genai")
from langchain_core.output_parsersimport StrOutputParser
from langchain_core.promptsimport ChatPromptTemplate
from langchain_core.runnablesimport RunnablePassthrough
prompt= ChatPromptTemplate.from_template(
"""Answer the question based only on the context provided.
Context: {context}
Question: {question}"""
)
defformat_docs(docs):
return"\n\n".join(doc.page_contentfor docin docs)
chain=(
{"context": retriever| format_docs,"question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
chain.invoke("...")
API reference
For detailed documentation of all VectorizeRetriever features and configurations head to theAPI reference.
Related
- Retrieverconceptual guide
- Retrieverhow-to guides
- Setup
- Create a Vectorize Account & Get Your Access Token
- Configure token and organization ID
- Installation
- Download a PDF file
- Initialize the vectorize client
- Create a File Upload Source Connector
- Upload the PDF file
- Connect to the AI Platform and Vector Database
- Configure and Deploy the Pipeline
- Configure tracing (optional)
- Instantiation
- Usage
- Use within a chain
- API reference
- Related