Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Async API

We demonstrate the following functionalities suppored by LanceDB using our asynchonous APIs:

  • Automatic versioning
  • Instant rollback
  • Appends, updates, deletions
  • Schema evolution

Let's first prepare the data. We will be using a CSV file with a bunch of quotes from Rick and Morty

In [50]:
!wgethttp://vectordb-recipes.s3.us-west-2.amazonaws.com/rick_and_morty_quotes.csv!headrick_and_morty_quotes.csv
!wget http://vectordb-recipes.s3.us-west-2.amazonaws.com/rick_and_morty_quotes.csv!head rick_and_morty_quotes.csv
--2024-12-17 15:58:31--  http://vectordb-recipes.s3.us-west-2.amazonaws.com/rick_and_morty_quotes.csvResolving vectordb-recipes.s3.us-west-2.amazonaws.com (vectordb-recipes.s3.us-west-2.amazonaws.com)... 3.5.84.162, 3.5.76.76, 52.92.228.138, ...Connecting to vectordb-recipes.s3.us-west-2.amazonaws.com (vectordb-recipes.s3.us-west-2.amazonaws.com)|3.5.84.162|:80... connected.HTTP request sent, awaiting response... 200 OKLength: 8236 (8.0K) [text/csv]Saving to: ‘rick_and_morty_quotes.csv.3’rick_and_morty_quot 100%[===================>]   8.04K  --.-KB/s    in 0s      2024-12-17 15:58:31 (160 MB/s) - ‘rick_and_morty_quotes.csv.3’ saved [8236/8236]id,author,quote1,Rick," Morty, you got to come on. You got to come with me."2,Morty," Rick, what’s going on?"3,Rick," I got a surprise for you, Morty."4,Morty," It’s the middle of the night. What are you talking about?"5,Rick," I got a surprise for you."6,Morty," Ow! Ow! You’re tugging me too hard."7,Rick," I got a surprise for you, Morty."8,Rick," What do you think of this flying vehicle, Morty? I built it out of stuff I found in the garage."9,Morty," Yeah, Rick, it’s great. Is this the surprise?"

Let's load this into a pandas dataframe.

It's got 3 columns, a quote id, the quote string, and the first name of the author of the quote:

In [51]:
importpandasaspddf=pd.read_csv("rick_and_morty_quotes.csv")df.head()
import pandas as pddf = pd.read_csv("rick_and_morty_quotes.csv")df.head()
Out[51]:
idauthorquote
01RickMorty, you got to come on. You got to come wi...
12MortyRick, what’s going on?
23RickI got a surprise for you, Morty.
34MortyIt’s the middle of the night. What are you ta...
45RickI got a surprise for you.

Creating a LanceDB table from a pandas dataframe is straightforward usingcreate_table

We'll start with a local LanceDB connection

In [35]:
!pipinstalllancedb-q
!pip install lancedb -q
In [52]:
importlancedbasync_db=awaitlancedb.connect_async("~/.lancedb")
import lancedbasync_db = await lancedb.connect_async("~/.lancedb")
In [53]:
awaitasync_db.drop_table("rick_and_morty")async_table=awaitasync_db.create_table("rick_and_morty",df,mode="overwrite")awaitasync_table.to_pandas()
await async_db.drop_table("rick_and_morty")async_table = await async_db.create_table("rick_and_morty", df, mode="overwrite")await async_table.to_pandas()
[2024-12-17T23:58:46Z WARN  lance::dataset::write::insert] No existing dataset at ~/.lancedb/rick_and_morty.lance, it will be created
Out[53]:
idauthorquote
01RickMorty, you got to come on. You got to come wi...
12MortyRick, what’s going on?
23RickI got a surprise for you, Morty.
34MortyIt’s the middle of the night. What are you ta...
45RickI got a surprise for you.
56MortyOw! Ow! You’re tugging me too hard.
67RickI got a surprise for you, Morty.
78RickWhat do you think of this flying vehicle, Mor...
89MortyYeah, Rick, it’s great. Is this the surprise?
910RickMorty, I had to I had to I had to I had to ma...

Updates

Now, since Rick is the smartest man in the multiverse, he deserves to have his quotes attributed to his full name: Richard Daniel Sanchez.

This can be done viaLanceTable.update. It needs two arguments:

  1. Awhere string filter (sql syntax) to determine the rows to update
  2. A dict ofupdates where the keys are the column names to update and the values are the new values
In [54]:
awaitasync_table.update(where="author='Morty'",updates={"author":"Richard Daniel Sanchez"})awaitasync_table.to_pandas()
await async_table.update(where="author='Morty'", updates={"author": "Richard Daniel Sanchez"})await async_table.to_pandas()
Out[54]:
idauthorquote
01RickMorty, you got to come on. You got to come wi...
13RickI got a surprise for you, Morty.
25RickI got a surprise for you.
37RickI got a surprise for you, Morty.
48RickWhat do you think of this flying vehicle, Mor...
510RickMorty, I had to I had to I had to I had to ma...
612RickWe’re gonna drop it down there just get a who...
714RickCome on, Morty. Just take it easy, Morty. It’...
816RickWhen I drop the bomb you know, I want you to ...
918RickAnd Jessica’s gonna be Eve,…

Schema evolution

Let's add anew_id column to the table, where each value is the originalid plus 1.

In [55]:
awaitasync_table.add_columns({"new_id":"id + 1"})awaitasync_table.to_pandas()
await async_table.add_columns({"new_id": "id + 1"})await async_table.to_pandas()
Out[55]:
idauthorquotenew_id
01RickMorty, you got to come on. You got to come wi...2
13RickI got a surprise for you, Morty.4
25RickI got a surprise for you.6
37RickI got a surprise for you, Morty.8
48RickWhat do you think of this flying vehicle, Mor...9
510RickMorty, I had to I had to I had to I had to ma...11
612RickWe’re gonna drop it down there just get a who...13
714RickCome on, Morty. Just take it easy, Morty. It’...15
816RickWhen I drop the bomb you know, I want you to ...17
918RickAnd Jessica’s gonna be Eve,…19

If we look at the schema, we see that a new int64 column was added

In [56]:
awaitasync_table.schema()
await async_table.schema()
Out[56]:
id: int64author: stringquote: stringnew_id: int64

Rollback

Suppose we used the table and found that the new column should be a different value. How do we use another new column without losing the change history?

First, major operations are automatically versioned in LanceDB.Version 1 is the table creation, with the initial insertion of data.Versions 2 and 3 represents the update (deletion + append)Version 4 is adding the new column.

In [57]:
awaitasync_table.checkout_latest()awaitasync_table.list_versions()
await async_table.checkout_latest()await async_table.list_versions()
Out[57]:
[{'version': 1,  'timestamp': datetime.datetime(2024, 12, 17, 15, 58, 46, 983259),  'metadata': {}}, {'version': 2,  'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 0, 291948),  'metadata': {}}, {'version': 3,  'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 8, 381165),  'metadata': {}}]

We can restore version 3, before we added thenew_id vector column

In [58]:
awaitasync_table.checkout(2)awaitasync_table.restore()awaitasync_table.to_pandas()
await async_table.checkout(2)await async_table.restore()await async_table.to_pandas()
Out[58]:
idauthorquote
01RickMorty, you got to come on. You got to come wi...
13RickI got a surprise for you, Morty.
25RickI got a surprise for you.
37RickI got a surprise for you, Morty.
48RickWhat do you think of this flying vehicle, Mor...
510RickMorty, I had to I had to I had to I had to ma...
612RickWe’re gonna drop it down there just get a who...
714RickCome on, Morty. Just take it easy, Morty. It’...
816RickWhen I drop the bomb you know, I want you to ...
918RickAnd Jessica’s gonna be Eve,…

Notice that we now have one more, not less versions. When we restore an old version, we're not deleting the version history, we're just creating a new version where the schema and data is equivalent to the restored old version. In this way, we can keep track of all of the changes and always rollback to a previous state.

In [59]:
awaitasync_table.list_versions()
await async_table.list_versions()
Out[59]:
[{'version': 1,  'timestamp': datetime.datetime(2024, 12, 17, 15, 58, 46, 983259),  'metadata': {}}, {'version': 2,  'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 0, 291948),  'metadata': {}}, {'version': 3,  'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 8, 381165),  'metadata': {}}, {'version': 4,  'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 22, 800694),  'metadata': {}}]

Add another new column

Now we'll change the value of thenew_id column and add it to the restored dataset again

In [60]:
awaitasync_table.add_columns({"new_id":"id + 10"})
await async_table.add_columns({"new_id": "id + 10"})
In [61]:
awaitasync_table.schema()
await async_table.schema()
Out[61]:
id: int64author: stringquote: stringnew_id: int64

Deletion

What if the whole show was just Rick-isms?Let's delete any quote not said by Rick

In [62]:
awaitasync_table.delete("author != 'Richard Daniel Sanchez'")
await async_table.delete("author != 'Richard Daniel Sanchez'")

We can see that the number of rows has been reduced to 30

In [63]:
awaitasync_table.count_rows()
await async_table.count_rows()
Out[63]:
34

Ok we had our fun, let's get back to the full quote set

In [67]:
awaitasync_table.checkout(5)awaitasync_table.restore()
await async_table.checkout(5)await async_table.restore()
In [68]:
awaitasync_table.count_rows()
await async_table.count_rows()
Out[68]:
99

History

We now have 9 versions in the data. We can review the operations that corresponds to each version below:

In [32]:
awaitasync_table.version()
await async_table.version()
Out[32]:
6

Versions:

  • 1 - Create
  • 2 - Update
  • 3 - Add a new column
  • 4 - Restore (2)
  • 5 - Add a new column
  • 6 - Delete
  • 7 - Restore

Summary

We never had to explicitly manage the versioning. And we never had to create expensive and slow snapshots. LanceDB automatically tracks the full history of operations I created and supports fast rollbacks. In production this is critical for debugging issues and minimizing downtime by rolling back to a previously successful state in seconds.


[8]ページ先頭

©2009-2025 Movatter.jp