Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
OurBuilding Ambient Agents with LangGraph course is now available on LangChain Academy!
Open In ColabOpen on GitHub

VectorizeRetriever

This notebook shows how to use the LangChain Vectorize retriever.

Vectorize helps you build AI apps faster and with less hassle.It automates data extraction, finds the best vectorization strategy using RAG evaluation,and lets you quickly deploy real-time RAG pipelines for your unstructured data.Your vector search indexes stay up-to-date, and it integrates with your existing vector database,so you maintain full control of your data.Vectorize handles the heavy lifting, freeing you to focus on building robust AI solutions without getting bogged down by data management.

Setup

In the following steps, we'll setup the Vectorize environment and create a RAG pipeline.

Create a Vectorize Account & Get Your Access Token

Sign up for a free Vectorize accounthereGenerate an access token in theAccess Token sectionGather your organization ID. From the browser url, extract the UUID from the URL after /organization/

Configure token and organization ID

import getpass

VECTORIZE_ORG_ID= getpass.getpass("Enter Vectorize organization ID: ")
VECTORIZE_API_TOKEN= getpass.getpass("Enter Vectorize API Token: ")

Installation

This retriever lives in thelangchain-vectorize package:

!pip install-qU langchain-vectorize

Download a PDF file

!wget"https://raw.githubusercontent.com/vectorize-io/vectorize-clients/refs/tags/python-0.1.3/tests/python/tests/research.pdf"

Initialize the vectorize client

import vectorize_clientas v

api= v.ApiClient(v.Configuration(access_token=VECTORIZE_API_TOKEN))

Create a File Upload Source Connector

import json
import os

import urllib3

connectors_api= v.ConnectorsApi(api)
response= connectors_api.create_source_connector(
VECTORIZE_ORG_ID,[{"type":"FILE_UPLOAD","name":"From API"}]
)
source_connector_id= response.connectors[0].id

Upload the PDF file

file_path="research.pdf"

http= urllib3.PoolManager()
uploads_api= v.UploadsApi(api)
metadata={"created-from-api":True}

upload_response= uploads_api.start_file_upload_to_connector(
VECTORIZE_ORG_ID,
source_connector_id,
v.StartFileUploadToConnectorRequest(
name=file_path.split("/")[-1],
content_type="application/pdf",
# add additional metadata that will be stored along with each chunk in the vector database
metadata=json.dumps(metadata),
),
)

withopen(file_path,"rb")as f:
response= http.request(
"PUT",
upload_response.upload_url,
body=f,
headers={
"Content-Type":"application/pdf",
"Content-Length":str(os.path.getsize(file_path)),
},
)

if response.status!=200:
print("Upload failed: ", response.data)
else:
print("Upload successful")

Connect to the AI Platform and Vector Database

ai_platforms= connectors_api.get_ai_platform_connectors(VECTORIZE_ORG_ID)
builtin_ai_platform=[
c.idfor cin ai_platforms.ai_platform_connectorsif c.type=="VECTORIZE"
][0]

vector_databases= connectors_api.get_destination_connectors(VECTORIZE_ORG_ID)
builtin_vector_db=[
c.idfor cin vector_databases.destination_connectorsif c.type=="VECTORIZE"
][0]

Configure and Deploy the Pipeline

pipelines= v.PipelinesApi(api)
response= pipelines.create_pipeline(
VECTORIZE_ORG_ID,
v.PipelineConfigurationSchema(
source_connectors=[
v.SourceConnectorSchema(
id=source_connector_id,type="FILE_UPLOAD", config={}
)
],
destination_connector=v.DestinationConnectorSchema(
id=builtin_vector_db,type="VECTORIZE", config={}
),
ai_platform=v.AIPlatformSchema(
id=builtin_ai_platform,type="VECTORIZE", config={}
),
pipeline_name="My Pipeline From API",
schedule=v.ScheduleSchema(type="manual"),
),
)
pipeline_id= response.data.id

Configure tracing (optional)

If you want to get automated tracing from individual queries, you can also set yourLangSmith API key by uncommenting below:

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

Instantiation

from langchain_vectorize.retrieversimport VectorizeRetriever

retriever= VectorizeRetriever(
api_token=VECTORIZE_API_TOKEN,
organization=VECTORIZE_ORG_ID,
pipeline_id=pipeline_id,
)

Usage

query="Apple Shareholders equity"
retriever.invoke(query, num_results=2)

Use within a chain

Like other retrievers, VectorizeRetriever can be incorporated into LLM applications viachains.

We will need a LLM or chat model:

pip install -qU "langchain[google-genai]"
import getpass
import os

ifnot os.environ.get("GOOGLE_API_KEY"):
os.environ["GOOGLE_API_KEY"]= getpass.getpass("Enter API key for Google Gemini: ")

from langchain.chat_modelsimport init_chat_model

llm= init_chat_model("gemini-2.0-flash", model_provider="google_genai")
from langchain_core.output_parsersimport StrOutputParser
from langchain_core.promptsimport ChatPromptTemplate
from langchain_core.runnablesimport RunnablePassthrough

prompt= ChatPromptTemplate.from_template(
"""Answer the question based only on the context provided.

Context: {context}

Question: {question}"""
)


defformat_docs(docs):
return"\n\n".join(doc.page_contentfor docin docs)


chain=(
{"context": retriever| format_docs,"question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
chain.invoke("...")

API reference

For detailed documentation of all VectorizeRetriever features and configurations head to theAPI reference.

Related


[8]ページ先頭

©2009-2025 Movatter.jp