Rockset

⚠️Deprecation Notice: Rockset Integration Disabled

As of June 2024, Rockset has beenacquired by OpenAI andshut down its public services.
Rockset was a real-time analytics database known for world-class indexing and retrieval. Now, its core team and technology are being integrated into OpenAI's infrastructure to power future AI products.
This LangChain integration is no longer functional and is preservedfor archival purposes only.

Rockset is a real-time analytics database which enables queries on massive, semi-structured data without operational burden. With Rockset, ingested data is queryable within one second and analytical queries against that data typically execute in milliseconds. Rockset is compute optimized, making it suitable for serving high concurrency applications in the sub-100TB range (or larger than 100s of TBs with rollups).

This notebook demonstrates how to use Rockset as a document loader in langchain. To get started, make sure you have a Rockset account and an API key available.

Setting up the environment

Go to theRockset console and get an API key. Find your API region from theAPI reference. For the purpose of this notebook, we will assume you're using Rockset fromOregon(us-west-2).
Set your the environment variableROCKSET_API_KEY.
Install the Rockset python client, which will be used by langchain to interact with the Rockset database.

%pip install--upgrade--quiet  rockset

Loading Documents

The Rockset integration with LangChain allows you to load documents from Rockset collections with SQL queries. In order to do this you must construct aRocksetLoader object. Here is an example snippet that initializes aRocksetLoader.

from langchain_community.document_loadersimport RocksetLoader
from rocksetimport Regions, RocksetClient, models

loader= RocksetLoader(
    RocksetClient(Regions.usw2a1,"<api key>"),
    models.QueryRequestSql(query="SELECT * FROM langchain_demo LIMIT 3"),# SQL query
["text"],# content columns
    metadata_keys=["id","date"],# metadata columns
)

API Reference:RocksetLoader

Here, you can see that the following query is run:

SELECT*FROM langchain_demoLIMIT3

Thetext column in the collection is used as the page content, and the record'sid anddate columns are used as metadata (if you do not pass anything intometadata_keys, the whole Rockset document will be used as metadata).

To execute the query and access an iterator over the resultingDocuments, run:

loader.lazy_load()

To execute the query and access all resultingDocuments at once, run:

loader.load()

Here is an example response ofloader.load():

[
    Document(
        page_content="Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas a libero porta, dictum ipsum eget, hendrerit neque. Morbi blandit, ex ut suscipit viverra, enim velit tincidunt tellus, a tempor velit nunc et ex. Proin hendrerit odio nec convallis lobortis. Aenean in purus dolor. Vestibulum orci orci, laoreet eget magna in, commodo euismod justo.",
        metadata={"id":83209,"date":"2022-11-13T18:26:45.000000Z"}
),
    Document(
        page_content="Integer at finibus odio. Nam sit amet enim cursus lacus gravida feugiat vestibulum sed libero. Aenean eleifend est quis elementum tincidunt. Curabitur sit amet ornare erat. Nulla id dolor ut magna volutpat sodales fringilla vel ipsum. Donec ultricies, lacus sed fermentum dignissim, lorem elit aliquam ligula, sed suscipit sapien purus nec ligula.",
        metadata={"id":89313,"date":"2022-11-13T18:28:53.000000Z"}
),
    Document(
        page_content="Morbi tortor enim, commodo id efficitur vitae, fringilla nec mi. Nullam molestie faucibus aliquet. Praesent a est facilisis, condimentum justo sit amet, viverra erat. Fusce volutpat nisi vel purus blandit, et facilisis felis accumsan. Phasellus luctus ligula ultrices tellus tempor hendrerit. Donec at ultricies leo.",
        metadata={"id":87732,"date":"2022-11-13T18:49:04.000000Z"}
)
]

Using multiple columns as content

You can choose to use multiple columns as content:

from langchain_community.document_loadersimport RocksetLoader
from rocksetimport Regions, RocksetClient, models

loader= RocksetLoader(
    RocksetClient(Regions.usw2a1,"<api key>"),
    models.QueryRequestSql(query="SELECT * FROM langchain_demo LIMIT 1 WHERE id=38"),
["sentence1","sentence2"],# TWO content columns
)

API Reference:RocksetLoader

Assuming the "sentence1" field is"This is the first sentence." and the "sentence2" field is"This is the second sentence.", thepage_content of the resultingDocument would be:

This is the first sentence.
This is the second sentence.

You can define you own function to join content columns by setting thecontent_columns_joiner argument in theRocksetLoader constructor.content_columns_joiner is a method that takes in aList[Tuple[str, Any]]] as an argument, representing a list of tuples of (column name, column value). By default, this is a method that joins each column value with a new line.

For example, if you wanted to join sentence1 and sentence2 with a space instead of a new line, you could setcontent_columns_joiner like so:

RocksetLoader(
    RocksetClient(Regions.usw2a1,"<api key>"),
    models.QueryRequestSql(query="SELECT * FROM langchain_demo LIMIT 1 WHERE id=38"),
["sentence1","sentence2"],
    content_columns_joiner=lambda docs:" ".join(
[doc[1]for docin docs]
),# join with space instead of /n
)

Thepage_content of the resultingDocument would be:

This is the first sentence. This is the second sentence.

Oftentimes you want to include the column name in thepage_content. You can do that like this:

RocksetLoader(
    RocksetClient(Regions.usw2a1,"<api key>"),
    models.QueryRequestSql(query="SELECT * FROM langchain_demo LIMIT 1 WHERE id=38"),
["sentence1","sentence2"],
    content_columns_joiner=lambda docs:"\n".join(
[f"{doc[0]}:{doc[1]}"for docin docs]
),
)

This would result in the followingpage_content:

sentence1: This is the first sentence.
sentence2: This is the second sentence.

Document loaderconceptual guide
Document loaderhow-to guides

Movatterモバイル変換

Rockset

Setting up the environment

Loading Documents

Using multiple columns as content

Related

Movatterモバイル変換

Setting up the environment​

Loading Documents

Using multiple columns as content​

Related​

Setting up the environment

Using multiple columns as content

Related