Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
OurBuilding Ambient Agents with LangGraph course is now available on LangChain Academy!
Open In ColabOpen on GitHub

Xata

Xata is a serverless data platform, based on PostgreSQL. It provides a Python SDK for interacting with your database, and a UI for managing your data.Xata has a native vector type, which can be added to any table, and supports similarity search. LangChain inserts vectors directly to Xata, and queries it for the nearest neighbors of a given vector, so that you can use all the LangChain Embeddings integrations with Xata.

This notebook guides you how to use Xata as a VectorStore.

Setup

Create a database to use as a vector store

In theXata UI create a new database. You can name it whatever you want, in this notepad we'll uselangchain.Create a table, again you can name it anything, but we will usevectors. Add the following columns via the UI:

  • content of type "Text". This is used to store theDocument.pageContent values.
  • embedding of type "Vector". Use the dimension used by the model you plan to use. In this notebook we use OpenAI embeddings, which have 1536 dimensions.
  • source of type "Text". This is used as a metadata column by this example.
  • any other columns you want to use as metadata. They are populated from theDocument.metadata object. For example, if in theDocument.metadata object you have atitle property, you can create atitle column in the table and it will be populated.

Let's first install our dependencies:

%pip install--upgrade--quiet  xata langchain-openai langchain-community tiktoken langchain

Let's load the OpenAI key to the environment. If you don't have one you can create an OpenAI account and create a key on thispage.

import getpass
import os

if"OPENAI_API_KEY"notin os.environ:
os.environ["OPENAI_API_KEY"]= getpass.getpass("OpenAI API Key:")

Similarly, we need to get the environment variables for Xata. You can create a new API key by visiting youraccount settings. To find the database URL, go to the Settings page of the database that you have created. The database URL should look something like this:https://demo-uni3q8.eu-west-1.xata.sh/db/langchain.

api_key= getpass.getpass("Xata API key: ")
db_url=input("Xata database URL (copy it from your DB settings):")
from langchain_community.document_loadersimport TextLoader
from langchain_community.vectorstores.xataimport XataVectorStore
from langchain_openaiimport OpenAIEmbeddings
from langchain_text_splittersimport CharacterTextSplitter

Create the Xata vector store

Let's import our test dataset:

loader= TextLoader("../../how_to/state_of_the_union.txt")
documents= loader.load()
text_splitter= CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs= text_splitter.split_documents(documents)

embeddings= OpenAIEmbeddings()

Now create the actual vector store, backed by the Xata table.

vector_store= XataVectorStore.from_documents(
docs, embeddings, api_key=api_key, db_url=db_url, table_name="vectors"
)

After running the above command, if you go to the Xata UI, you should see the documents loaded together with their embeddings.To use an existing Xata table that already contains vector contents, initialize the XataVectorStore constructor:

vector_store= XataVectorStore(
api_key=api_key, db_url=db_url, embedding=embeddings, table_name="vectors"
)

Similarity Search

query="What did the president say about Ketanji Brown Jackson"
found_docs= vector_store.similarity_search(query)
print(found_docs)

Similarity Search with score (vector distance)

query="What did the president say about Ketanji Brown Jackson"
result= vector_store.similarity_search_with_score(query)
for doc, scorein result:
print(f"document={doc}, score={score}")

Related


[8]ページ先頭

©2009-2025 Movatter.jp