Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
OurBuilding Ambient Agents with LangGraph course is now available on LangChain Academy!
Open In ColabOpen on GitHub

CloudflareVectorizeVectorStore

This notebook covers how to get started with the CloudflareVectorize vector store.

Setup

This Python package is a wrapper around Cloudflare's REST API. To interact with the API, you need to provide an API token with the appropriate privileges.

You can create and manage API tokens here:

https://dash.cloudflare.com/YOUR-ACCT-NUMBER/api-tokens

Credentials

CloudflareVectorize depends on WorkersAI (if you want to use it for Embeddings), and D1 (if you are using it to store and retrieve raw values).

While you can create a singleapi_token with Edit privileges to all needed resources (WorkersAI, Vectorize & D1), you may want to follow the principle of "least privilege access" and create separate API tokens for each service

Note: These service-specific tokens (if provided) will take preference over a global token. You could provide these instead of a global token.

You can also leave these parameters as environmental variables.

import os

from dotenvimport load_dotenv

load_dotenv(".env")

cf_acct_id= os.getenv("CF_ACCOUNT_ID")

# single "globally scoped" token with WorkersAI, Vectorize & D1
api_token= os.getenv("CF_API_TOKEN")

# OR, separate tokens with access to each service
cf_vectorize_token= os.getenv("CF_VECTORIZE_API_TOKEN")
cf_d1_token= os.getenv("CF_D1_API_TOKEN")

Initialization

import asyncio
import json
import uuid
import warnings

from langchain_cloudflare.embeddingsimport(
CloudflareWorkersAIEmbeddings,
)
from langchain_cloudflare.vectorstoresimport(
CloudflareVectorize,
)
from langchain_community.document_loadersimport WikipediaLoader
from langchain_core.documentsimport Document
from langchain_text_splittersimport RecursiveCharacterTextSplitter

warnings.filterwarnings("ignore")
# name your vectorize index
vectorize_index_name=f"test-langchain-{uuid.uuid4().hex}"

Embeddings

For storage of embeddings, semantic search and retrieval, you must embed your raw values as embeddings. Specify an embedding model, one available on WorkersAI

https://developers.cloudflare.com/workers-ai/models/

MODEL_WORKERSAI="@cf/baai/bge-large-en-v1.5"
cf_ai_token= os.getenv(
"CF_AI_API_TOKEN"
)# needed if you want to use workersAI for embeddings

embedder= CloudflareWorkersAIEmbeddings(
account_id=cf_acct_id, api_token=cf_ai_token, model_name=MODEL_WORKERSAI
)

Raw Values with D1

Vectorize only stores embeddings, metadata and namespaces. If you want to store and retrieve raw values, you must leverage Cloudflare's SQL Database D1.

You can create a database here and retrieve its id:

[https://dash.cloudflare.com/YOUR-ACCT-NUMBER/workers/d1

# provide the id of your D1 Database
d1_database_id= os.getenv("CF_D1_DATABASE_ID")

CloudflareVectorize Class

Now we can create the CloudflareVectorize instance. Here we passed:

  • Theembedding instance from earlier
  • The account ID
  • A global API token for all services (WorkersAI, Vectorize, D1)
  • Individual API tokens for each service
cfVect= CloudflareVectorize(
embedding=embedder,
account_id=cf_acct_id,
d1_api_token=cf_d1_token,# (Optional if using global token)
vectorize_api_token=cf_vectorize_token,# (Optional if using global token)
d1_database_id=d1_database_id,# (Optional if not using D1)
)

Cleanup

Before we get started, let's delete anytest-langchain* indexes we have for this walkthrough

# depending on your notebook environment you might need to include:
# import nest_asyncio
# nest_asyncio.apply()

arr_indexes= cfVect.list_indexes()
arr_indexes=[xfor xin arr_indexesif"test-langchain"in x.get("name")]
arr_async_requests=[
cfVect.adelete_index(index_name=x.get("name"))for xin arr_indexes
]
await asyncio.gather(*arr_async_requests);

Gotchyas

D1 Database ID provided but no "global"api_token and nod1_api_token

try:
cfVect= CloudflareVectorize(
embedding=embedder,
account_id=cf_acct_id,
# api_token=api_token, # (Optional if using service-specific token)
ai_api_token=cf_ai_token,# (Optional if using global token)
# d1_api_token=cf_d1_token, # (Optional if using global token)
vectorize_api_token=cf_vectorize_token,# (Optional if using global token)
d1_database_id=d1_database_id,# (Optional if not using D1)
)
except Exceptionas e:
print(str(e))
`d1_database_id` provided, but no global `api_token` provided and no `d1_api_token` provided.

Manage Vector Store

Creating an Index

Let's start off this example by creating and index (and first deleting if it exists). If the index doesn't exist we will get a an error from Cloudflare telling us so.

%%capture

try:
cfVect.delete_index(index_name=vectorize_index_name, wait=True)
except Exceptionas e:
print(e)
r= cfVect.create_index(
index_name=vectorize_index_name, description="A Test Vectorize Index", wait=True
)
print(r)
{'created_on': '2025-05-13T05:38:04.487284Z', 'modified_on': '2025-05-13T05:38:04.487284Z', 'name': 'test-langchain-5c177bb404f74d438c916462ad73d27a', 'description': 'A Test Vectorize Index', 'config': {'dimensions': 1024, 'metric': 'cosine'}}

Listing Indexes

Now, we can list our indexes on our account

indexes= cfVect.list_indexes()
indexes=[xfor xin indexesif"test-langchain"in x.get("name")]
print(indexes)
[{'created_on': '2025-05-13T05:38:04.487284Z', 'modified_on': '2025-05-13T05:38:04.487284Z', 'name': 'test-langchain-5c177bb404f74d438c916462ad73d27a', 'description': 'A Test Vectorize Index', 'config': {'dimensions': 1024, 'metric': 'cosine'}}]

Get Index Info

We can also get certain indexes and retrieve more granular information about an index.

This call returns aprocessedUpToMutation which can be used to track the status of operations such as creating indexes, adding or deleting records.

r= cfVect.get_index_info(index_name=vectorize_index_name)
print(r)
{'dimensions': 1024, 'vectorCount': 0}

Adding Metadata Indexes

It is common to assist retrieval by supplying metadata filters in quereies. In Vectorize, this is accomplished by first creating a "metadata index" on your Vectorize Index. We will do so for our example by creating one on thesection field in our documents.

Reference:https://developers.cloudflare.com/vectorize/reference/metadata-filtering/

r= cfVect.create_metadata_index(
property_name="section",
index_type="string",
index_name=vectorize_index_name,
wait=True,
)
print(r)
{'mutationId': '7fc5f849-4d35-420c-bb3f-b950a79e48b6'}

Listing Metadata Indexes

r= cfVect.list_metadata_indexes(index_name=vectorize_index_name)
print(r)
[{'propertyName': 'section', 'indexType': 'String'}]

Adding Documents

For this example, we will use LangChain's Wikipedia loader to pull an article about Cloudflare. We will store this in Vectorize and query its contents later.

docs= WikipediaLoader(query="Cloudflare", load_max_docs=2).load()

We will then create some simple chunks with metadata based on the chunk sections.

text_splitter= RecursiveCharacterTextSplitter(
# Set a really small chunk size, just to show.
chunk_size=100,
chunk_overlap=20,
length_function=len,
is_separator_regex=False,
)
texts= text_splitter.create_documents([docs[0].page_content])

running_section=""
for idx, textinenumerate(texts):
if text.page_content.startswith("="):
running_section= text.page_content
running_section= running_section.replace("=","").strip()
else:
if running_section=="":
text.metadata={"section":"Introduction"}
else:
text.metadata={"section": running_section}
print(len(texts))
print(texts[0],"\n\n", texts[-1])
55
page_content='Cloudflare, Inc., is an American company that provides content delivery network services,' metadata={'section': 'Introduction'}

page_content='attacks, Cloudflare ended up being attacked as well; Google and other companies eventually' metadata={'section': 'DDoS mitigation'}

Now we will add documents to our Vectorize Index.

Note:Adding embeddings to Vectorize happensasyncronously, meaning there will be a small delay between adding the embeddings and being able to query them. By defaultadd_documents has await=True parameter which waits for this operation to complete before returning a response. If you do not want the program to wait for embeddings availability, you can set this towait=False.

r= cfVect.add_documents(index_name=vectorize_index_name, documents=texts, wait=True)
print(json.dumps(r)[:300])
["433a614a-2253-4c54-951f-0e40379a52c4", "608a9cb6-ab71-4e5c-8831-ebedeb9749e8", "40a0eead-a781-46a7-a6a3-1940ec57c9ae", "64081e01-12d1-4760-9b3c-84ee1e4ba199", "af465fb9-9e3b-49a6-b00a-6a9eec4fc623", "2898e362-b667-46ab-ac20-651d8e13f2bf", "a2c0095b-2cbc-4724-bbcb-86cd702bfa84", "cc659763-37cb-42cb

Query vector store

We will do some searches on our embeddings. We can specify our searchquery and the top number of results we want withk.

query_documents= cfVect.similarity_search(
index_name=vectorize_index_name, query="Workers AI", k=100, return_metadata="none"
)

print(f"{len(query_documents)} results:\n{query_documents[:3]}")
55 results:
[Document(id='24405ae0-c125-4177-a1c2-8b1849c13ab7', metadata={}, page_content="In 2023, Cloudflare launched Workers AI, a framework allowing for use of Nvidia GPU's within"), Document(id='ca33b19e-4e28-4e1b-8ed7-94f133dbf8a7', metadata={}, page_content='based on queries by leveraging Workers AI.Cloudflare announced plans in September 2024 to launch a'), Document(id='14602058-73fe-4307-a1c2-95956d6392ad', metadata={}, page_content='=== Artificial intelligence ===')]

Output

If you want to return metadata you can passreturn_metadata="all" | 'indexed'. The default isall.

If you want to return the embeddings values, you can passreturn_values=True. The default isFalse.Embeddings will be returned in themetadata field under the special_values field.

Note:return_metadata="none" andreturn_values=True will return only ther_values field inmetadata.

Note:If you return metadata or values, the results will be limited to the top 20.

https://developers.cloudflare.com/vectorize/platform/limits/

query_documents= cfVect.similarity_search(
index_name=vectorize_index_name,
query="Workers AI",
return_values=True,
return_metadata="all",
k=100,
)
print(f"{len(query_documents)} results:\n{str(query_documents[0])[:500]}")
20 results:
page_content='In 2023, Cloudflare launched Workers AI, a framework allowing for use of Nvidia GPU's within' metadata={'section': 'Artificial intelligence', '_values': [0.014350891, 0.0053482056, -0.022354126, 0.002948761, 0.010406494, -0.016067505, -0.002029419, -0.023513794, 0.020141602, 0.023742676, 0.01361084, 0.003019333, 0.02748108, -0.023162842, 0.008979797, -0.029373169, -0.03643799, -0.03842163, -0.004463196, 0.021255493, 0.02192688, -0.005947113, -0.060272217, -0.055389404, -0.031188965

If you'd like the similarityscores to be returned, you can usesimilarity_search_with_score

query_documents= cfVect.similarity_search_with_score(
index_name=vectorize_index_name,
query="Workers AI",
k=100,
return_metadata="all",
)
print(f"{len(query_documents)} results:\n{str(query_documents[0])[:500]}")
20 results:
(Document(id='24405ae0-c125-4177-a1c2-8b1849c13ab7', metadata={'section': 'Artificial intelligence'}, page_content="In 2023, Cloudflare launched Workers AI, a framework allowing for use of Nvidia GPU's within"), 0.7851709)

Including D1 for "Raw Values"

All of theadd andsearch methods on CloudflareVectorize support ainclude_d1 parameter (default=True).

This is to configure whether you want to store/retrieve raw values.

If you do not want to use D1 for this, you can set this toinclude=False. This will return documents with an emptypage_content field.

Note: Your D1 table name MUST MATCH your vectorize index name! If you run 'create_index' and include_d1=True or cfVect(d1_database=...,) this D1 table will be created along with your Vectorize Index.

query_documents= cfVect.similarity_search_with_score(
index_name=vectorize_index_name,
query="california",
k=100,
return_metadata="all",
include_d1=False,
)
print(f"{len(query_documents)} results:\n{str(query_documents[0])[:500]}")
20 results:
(Document(id='64081e01-12d1-4760-9b3c-84ee1e4ba199', metadata={'section': 'Introduction'}, page_content=''), 0.60426825)

Query by turning into retriever

You can also transform the vector store into a retriever for easier usage in your chains.

retriever= cfVect.as_retriever(
search_type="similarity",
search_kwargs={"k":1,"index_name": vectorize_index_name},
)
r= retriever.get_relevant_documents("california")

Searching with Metadata Filtering

As mentioned before, Vectorize supports filtered search via filtered on indexes metadata fields. Here is an example where we search forIntroduction values within the indexedsection metadata field.

More info on searching on Metadata fields is here:https://developers.cloudflare.com/vectorize/reference/metadata-filtering/

query_documents= cfVect.similarity_search_with_score(
index_name=vectorize_index_name,
query="California",
k=100,
md_filter={"section":"Introduction"},
return_metadata="all",
)
print(f"{len(query_documents)} results:\n -{str(query_documents[:3])}")
6 results:
- [(Document(id='64081e01-12d1-4760-9b3c-84ee1e4ba199', metadata={'section': 'Introduction'}, page_content="and other services. Cloudflare's headquarters are in San Francisco, California. According to"), 0.60426825), (Document(id='608a9cb6-ab71-4e5c-8831-ebedeb9749e8', metadata={'section': 'Introduction'}, page_content='network services, cybersecurity, DDoS mitigation, wide area network services, reverse proxies,'), 0.52082914), (Document(id='433a614a-2253-4c54-951f-0e40379a52c4', metadata={'section': 'Introduction'}, page_content='Cloudflare, Inc., is an American company that provides content delivery network services,'), 0.50490546)]

You can do more sophisticated filtering as well

https://developers.cloudflare.com/vectorize/reference/metadata-filtering/#valid-filter-examples

query_documents= cfVect.similarity_search_with_score(
index_name=vectorize_index_name,
query="California",
k=100,
md_filter={"section":{"$ne":"Introduction"}},
return_metadata="all",
)
print(f"{len(query_documents)} results:\n -{str(query_documents[:3])}")
20 results:
- [(Document(id='daeb7891-ec00-4c09-aa73-fc8e9a226ca8', metadata={}, page_content='== Products =='), 0.56540567), (Document(id='8c91ed93-d306-4cf9-ad1e-157e90a01ddf', metadata={'section': 'History'}, page_content='Since at least 2017, Cloudflare has been using a wall of lava lamps in their San Francisco'), 0.5604333), (Document(id='1400609f-0937-4700-acde-6e770d2dbd12', metadata={'section': 'History'}, page_content='their San Francisco headquarters as a source of randomness for encryption keys, alongside double'), 0.55573463)]
query_documents= cfVect.similarity_search_with_score(
index_name=vectorize_index_name,
query="DNS",
k=100,
md_filter={"section":{"$in":["Products","History"]}},
return_metadata="all",
)
print(f"{len(query_documents)} results:\n -{str(query_documents)}")
20 results:
- [(Document(id='253a0987-1118-4ab2-a444-b8a50f0b4a63', metadata={'section': 'Products'}, page_content='protocols such as DNS over HTTPS, SMTP, and HTTP/2 with support for HTTP/2 Server Push. As of 2023,'), 0.7205538), (Document(id='112b61d1-6c34-41d6-a22f-7871bf1cca7b', metadata={'section': 'Products'}, page_content='utilizing edge computing, reverse proxies for web traffic, data center interconnects, and a content'), 0.58178145), (Document(id='36929a30-32a9-482a-add7-6c109bbf8f82', metadata={'section': 'Products'}, page_content='and a content distribution network to serve content across its network of servers. It supports'), 0.5797795), (Document(id='485ac8dc-c2ad-443a-90fc-8be9e004eaee', metadata={'section': 'History'}, page_content='the New York Stock Exchange under the stock ticker NET. It opened for public trading on September'), 0.5678468), (Document(id='1c7581d5-0b06-45d6-874c-554907f4f7f7', metadata={'section': 'Products'}, page_content='Cloudflare provides network and security products for consumers and businesses, utilizing edge'), 0.55722594), (Document(id='f2fd02ac-3bab-4565-a6e2-14d9963e8fd9', metadata={'section': 'History'}, page_content='Cloudflare has acquired web-services and security companies, including StopTheHacker (February'), 0.5558441), (Document(id='1315a8ff-6509-4350-ae84-21e11da282b3', metadata={'section': 'Products'}, page_content='Push. As of 2023, Cloudflare handles an average of 45 million HTTP requests per second.'), 0.55429655), (Document(id='f5b0c9d0-89c2-43ec-a9b7-5a5b376a5a85', metadata={'section': 'Products'}, page_content='It supports transport layer protocols TCP, UDP, QUIC, and many application layer protocols such as'), 0.54969466), (Document(id='cc659763-37cb-42cb-bf09-465df1b5bc1a', metadata={'section': 'History'}, page_content='Cloudflare was founded in July 2009 by Matthew Prince, Lee Holloway, and Michelle Zatlyn. Prince'), 0.54691005), (Document(id='b467348b-9a9b-4bf1-9104-27570891c9e4', metadata={'section': 'History'}, page_content='2019, Cloudflare submitted its S-1 filing for an initial public offering on the New York Stock'), 0.533554), (Document(id='7966591b-ff56-4346-aca8-341daece01fc', metadata={'section': 'History'}, page_content='Networks (March 2024), BastionZero (May 2024), and Kivera (October 2024).'), 0.53296596), (Document(id='c7657276-c631-4331-98ec-af308387ea49', metadata={'section': 'Products'}, page_content='Verizon’s October 2024 outage.'), 0.53137076), (Document(id='9418e10c-426b-45fa-a1a4-672074310890', metadata={'section': 'Products'}, page_content='Cloudflare also provides analysis and reports on large-scale outages, including Verizon’s October'), 0.53107977), (Document(id='db5507e2-0103-4275-a9f8-466f977255c0', metadata={'section': 'History'}, page_content='a product of Unspam Technologies that served as some inspiration for the basis of Cloudflare. From'), 0.528889), (Document(id='9d840318-be0e-4cf7-8a60-eaab50d45c9e', metadata={'section': 'History'}, page_content='of Cloudflare. From 2009, the company was venture-capital funded. On August 15, 2019, Cloudflare'), 0.52717584), (Document(id='db9137cc-051b-4b20-8d49-8a32bb2b99a7', metadata={'section': 'History'}, page_content='(December 2021), Vectrix (February 2022), Area 1 Security (February 2022), Nefeli Networks (March'), 0.52209044), (Document(id='dfaffd2f-4492-444d-accf-180b1f841463', metadata={'section': 'Products'}, page_content='As of 2024, Cloudflare servers are powered by AMD EPYC 9684X processors.'), 0.5169676), (Document(id='65bbd754-22d1-435a-860a-9259f6cf7dea', metadata={'section': 'History'}, page_content='(February 2014), CryptoSeal (June 2014), Eager Platform Co. (December 2016), Neumob (November'), 0.5132974), (Document(id='1400609f-0937-4700-acde-6e770d2dbd12', metadata={'section': 'History'}, page_content='their San Francisco headquarters as a source of randomness for encryption keys, alongside double'), 0.50999177), (Document(id='b77cef8b-1140-4d92-891b-0048ea70ae3a', metadata={'section': 'History'}, page_content='Neumob (November 2017), S2 Systems (January 2020), Linc (December 2020), Zaraz (December 2021),'), 0.5092492)]

Search by Namespace

We can also search for vectors bynamespace. We just need to add it to thenamespaces array when adding it to our vector database.

https://developers.cloudflare.com/vectorize/reference/metadata-filtering/#namespace-versus-metadata-filtering

namespace_name=f"test-namespace-{uuid.uuid4().hex[:8]}"

new_documents=[
Document(
page_content="This is a new namespace specific document!",
metadata={"section":"Namespace Test1"},
),
Document(
page_content="This is another namespace specific document!",
metadata={"section":"Namespace Test2"},
),
]

r= cfVect.add_documents(
index_name=vectorize_index_name,
documents=new_documents,
namespaces=[namespace_name]*len(new_documents),
wait=True,
)
query_documents= cfVect.similarity_search(
index_name=vectorize_index_name,
query="California",
namespace=namespace_name,
)

print(f"{len(query_documents)} results:\n -{str(query_documents)}")
2 results:
- [Document(id='65c4f7f4-aa4f-46b4-85ba-c90ea18dc7ed', metadata={'section': 'Namespace Test2', '_namespace': 'test-namespace-9cc13b96'}, page_content='This is another namespace specific document!'), Document(id='96350f98-7053-41c7-b6bb-5acdd3ab67bd', metadata={'section': 'Namespace Test1', '_namespace': 'test-namespace-9cc13b96'}, page_content='This is a new namespace specific document!')]

Search by IDs

We can also retrieve specific records for specific IDs. To do so, we need to set the vectorize index name on theindex_name Vectorize state param.

This will return both_namespace and_values as well as othermetadata.

sample_ids=[x.idfor xin query_documents]
cfVect.index_name= vectorize_index_name
query_documents= cfVect.get_by_ids(
sample_ids,
)
print(str(query_documents[:3])[:500])
[Document(id='65c4f7f4-aa4f-46b4-85ba-c90ea18dc7ed', metadata={'section': 'Namespace Test2', '_namespace': 'test-namespace-9cc13b96', '_values': [-0.0005841255, 0.014480591, 0.040771484, 0.005218506, 0.015579224, 0.0007543564, -0.005138397, -0.022720337, 0.021835327, 0.038970947, 0.017456055, 0.022705078, 0.013450623, -0.015686035, -0.019119263, -0.01512146, -0.017471313, -0.007183075, -0.054382324, -0.01914978, 0.0005302429, 0.018600464, -0.083740234, -0.006462097, 0.0005598068, 0.024230957, -0

The namespace will be included in the_namespace field inmetadata along with your other metadata (if you requested it inreturn_metadata).

Note: You cannot set the_namespace or_values fields inmetadata as they are reserved. They will be stripped out during the insert process.

Upserts

Vectorize supports Upserts which you can perform by settingupsert=True.

query_documents[0].page_content="Updated: "+ query_documents[0].page_content
print(query_documents[0].page_content)
Updated: This is another namespace specific document!
new_document_id="12345678910"
new_document= Document(
id=new_document_id,
page_content="This is a new document!",
metadata={"section":"Introduction"},
)
r= cfVect.add_documents(
index_name=vectorize_index_name,
documents=[new_document, query_documents[0]],
upsert=True,
wait=True,
)
query_documents_updated= cfVect.get_by_ids([new_document_id, query_documents[0].id])
print(str(query_documents_updated[0])[:500])
print(query_documents_updated[0].page_content)
print(query_documents_updated[1].page_content)
page_content='This is a new document!' metadata={'section': 'Introduction', '_namespace': None, '_values': [-0.007522583, 0.0023021698, 0.009963989, 0.031051636, -0.021316528, 0.0048103333, 0.026046753, 0.01348114, 0.026306152, 0.040374756, 0.03225708, 0.007423401, 0.031021118, -0.007347107, -0.034179688, 0.002111435, -0.027191162, -0.020950317, -0.021636963, -0.0030593872, -0.04977417, 0.018859863, -0.08062744, -0.027679443, 0.012512207, 0.0053634644, 0.008079529, -0.010528564, 0.07312012, 0.02
This is a new document!
Updated: This is another namespace specific document!

Deleting Records

We can delete records by their ids as well

r= cfVect.delete(index_name=vectorize_index_name, ids=sample_ids, wait=True)
print(r)
True

And to confirm deletion

query_documents= cfVect.get_by_ids(sample_ids)
assertlen(query_documents)==0

Creating from Documents

LangChain stipulates that all vectorstores must have afrom_documents method to instantiate a new Vectorstore from documents. This is a more streamlined method than the individualcreate, add steps shown above.

You can do that as shown here:

vectorize_index_name="test-langchain-from-docs"
cfVect= CloudflareVectorize.from_documents(
account_id=cf_acct_id,
index_name=vectorize_index_name,
documents=texts,
embedding=embedder,
d1_database_id=d1_database_id,
d1_api_token=cf_d1_token,
vectorize_api_token=cf_vectorize_token,
wait=True,
)
# query for documents
query_documents= cfVect.similarity_search(
index_name=vectorize_index_name,
query="Edge Computing",
)

print(f"{len(query_documents)} results:\n{str(query_documents[0])[:300]}")
20 results:
page_content='utilizing edge computing, reverse proxies for web traffic, data center interconnects, and a content' metadata={'section': 'Products'}

Async Examples

This section will show some Async examples

Creating Indexes

vectorize_index_name1=f"test-langchain-{uuid.uuid4().hex}"
vectorize_index_name2=f"test-langchain-{uuid.uuid4().hex}"
# depending on your notebook environment you might need to include these:
# import nest_asyncio
# nest_asyncio.apply()

async_requests=[
cfVect.acreate_index(index_name=vectorize_index_name1),
cfVect.acreate_index(index_name=vectorize_index_name2),
]

res=await asyncio.gather(*async_requests);

Creating Metadata Indexes

async_requests=[
cfVect.acreate_metadata_index(
property_name="section",
index_type="string",
index_name=vectorize_index_name1,
wait=True,
),
cfVect.acreate_metadata_index(
property_name="section",
index_type="string",
index_name=vectorize_index_name2,
wait=True,
),
]

await asyncio.gather(*async_requests);

Adding Documents

async_requests=[
cfVect.aadd_documents(index_name=vectorize_index_name1, documents=texts, wait=True),
cfVect.aadd_documents(index_name=vectorize_index_name2, documents=texts, wait=True),
]

await asyncio.gather(*async_requests);

Querying/Search

async_requests=[
cfVect.asimilarity_search(index_name=vectorize_index_name1, query="Workers AI"),
cfVect.asimilarity_search(index_name=vectorize_index_name2, query="Edge Computing"),
]

async_results=await asyncio.gather(*async_requests);
print(f"{len(async_results[0])} results:\n{str(async_results[0][0])[:300]}")
print(f"{len(async_results[1])} results:\n{str(async_results[1][0])[:300]}")
20 results:
page_content='In 2023, Cloudflare launched Workers AI, a framework allowing for use of Nvidia GPU's within'
20 results:
page_content='utilizing edge computing, reverse proxies for web traffic, data center interconnects, and a content'

Returning Metadata/Values

async_requests=[
cfVect.asimilarity_search(
index_name=vectorize_index_name1,
query="California",
return_values=True,
return_metadata="all",
),
cfVect.asimilarity_search(
index_name=vectorize_index_name2,
query="California",
return_values=True,
return_metadata="all",
),
]

async_results=await asyncio.gather(*async_requests);
print(f"{len(async_results[0])} results:\n{str(async_results[0][0])[:300]}")
print(f"{len(async_results[1])} results:\n{str(async_results[1][0])[:300]}")
20 results:
page_content='and other services. Cloudflare's headquarters are in San Francisco, California. According to' metadata={'section': 'Introduction', '_values': [-0.031219482, -0.018295288, -0.006000519, 0.017532349, 0.016403198, -0.029922485, -0.007133484, 0.004447937, 0.04559326, -0.011405945, 0.034820
20 results:
page_content='and other services. Cloudflare's headquarters are in San Francisco, California. According to' metadata={'section': 'Introduction', '_values': [-0.031219482, -0.018295288, -0.006000519, 0.017532349, 0.016403198, -0.029922485, -0.007133484, 0.004447937, 0.04559326, -0.011405945, 0.034820

Searching with Metadata Filtering

async_requests=[
cfVect.asimilarity_search(
index_name=vectorize_index_name1,
query="Cloudflare services",
k=2,
md_filter={"section":"Products"},
return_metadata="all",
# return_values=True
),
cfVect.asimilarity_search(
index_name=vectorize_index_name2,
query="Cloudflare services",
k=2,
md_filter={"section":"Products"},
return_metadata="all",
# return_values=True
),
]

async_results=await asyncio.gather(*async_requests);
print(f"{len(async_results[0])} results:\n{str(async_results[0][-1])[:300]}")
print(f"{len(async_results[1])} results:\n{str(async_results[1][0])[:300]}")
9 results:
page_content='It supports transport layer protocols TCP, UDP, QUIC, and many application layer protocols such as' metadata={'section': 'Products'}
9 results:
page_content='Cloudflare provides network and security products for consumers and businesses, utilizing edge' metadata={'section': 'Products'}

Cleanup

Let's finish by deleting all of the indexes we created in this notebook.

arr_indexes= cfVect.list_indexes()
arr_indexes=[xfor xin arr_indexesif"test-langchain"in x.get("name")]
arr_async_requests=[
cfVect.adelete_index(index_name=x.get("name"))for xin arr_indexes
]
await asyncio.gather(*arr_async_requests);

API Reference

https://developers.cloudflare.com/api/resources/vectorize/

https://developers.cloudflare.com/vectorize/

Related


[8]ページ先頭

©2009-2025 Movatter.jp