Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Working with tables

Open In Colab

A Table is a collection of Records in a LanceDB Database. Tables in Lance have a schema that defines the columns and their types. These schemas can include nested columns and can evolve over time.

This guide will show how to create tables, insert data into them, and update the data.

Creating a LanceDB Table

Initialize a LanceDB connection and create a table

importlancedburi="data/sample-lancedb"db=lancedb.connect(uri)
importlancedburi="data/sample-lancedb"async_db=awaitlancedb.connect_async(uri)

LanceDB allows ingesting data from various sources -dict,list[dict],pd.DataFrame,pa.Table or aIterator[pa.RecordBatch]. Let's take a look at some of the these.

import*aslancedbfrom"@lancedb/lancedb";import*asarrowfrom"apache-arrow";consturi="data/sample-lancedb";constdb=awaitlancedb.connect(uri);
constlancedb=require("vectordb");constarrow=require("apache-arrow");consturi="data/sample-lancedb";constdb=awaitlancedb.connect(uri);

From list of tuples or dictionaries

data=[{"vector":[1.1,1.2],"lat":45.5,"long":-122.7},{"vector":[0.2,1.8],"lat":40.1,"long":-74.1},]db.create_table("test_table",data)db["test_table"].head()
data=[{"vector":[1.1,1.2],"lat":45.5,"long":-122.7},{"vector":[0.2,1.8],"lat":40.1,"long":-74.1},]async_tbl=awaitasync_db.create_table("test_table_async",data)awaitasync_tbl.head()

Note

If the table already exists, LanceDB will raise an error by default.

create_table supports an optionalexist_ok parameter. When set to Trueand the table exists, then it simply opens the existing table. The data youpassed in will NOT be appended to the table in that case.

db.create_table("test_table",data,exist_ok=True)
awaitasync_db.create_table("test_table_async",data,exist_ok=True)

Sometimes you want to make sure that you start fresh. If you want tooverwrite the table, you can pass in mode="overwrite" to the createTable function.

db.create_table("test_table",data,mode="overwrite")
awaitasync_db.create_table("test_table_async",data,mode="overwrite")

You can create a LanceDB table in JavaScript using an array of records as follows.

const_tbl=awaitdb.createTable("myTable",[{vector:[3.1,4.1],item:"foo",price:10.0},{vector:[5.9,26.5],item:"bar",price:20.0},],{mode:"overwrite"},);

This will infer the schema from the provided data. If you want to explicitly provide a schema, you can useapache-arrow to declare a schema

constschema=newarrow.Schema([newarrow.Field("vector",newarrow.FixedSizeList(2,newarrow.Field("item",newarrow.Float32(),true),),),newarrow.Field("item",newarrow.Utf8(),true),newarrow.Field("price",newarrow.Float32(),true),]);constdata=[{vector:[3.1,4.1],item:"foo",price:10.0},{vector:[5.9,26.5],item:"bar",price:20.0},];consttbl=awaitdb.createTable("myTable",data,{schema,});

Note

createTable supports an optionalexistsOk parameter. When set to trueand the table exists, then it simply opens the existing table. The data youpassed in will NOT be appended to the table in that case.

consttbl=awaitdb.createTable("myTable",data,{existOk:true,});

Sometimes you want to make sure that you start fresh. If you want tooverwrite the table, you can pass in mode: "overwrite" to the createTable function.

consttbl=awaitdb.createTable("myTable",data,{mode:"overwrite",});
consttbl=awaitdb.createTable("myTable",[{vector:[3.1,4.1],item:"foo",price:10.0},{vector:[5.9,26.5],item:"bar",price:20.0},],{writeMode:lancedb.WriteMode.Overwrite},);

This will infer the schema from the provided data. If you want to explicitly provide a schema, you can use apache-arrow to declare a schema

constschema=newarrow.Schema([newarrow.Field("vector",newarrow.FixedSizeList(2,newarrow.Field("item",newarrow.Float32(),true),),),newarrow.Field("item",newarrow.Utf8(),true),newarrow.Field("price",newarrow.Float32(),true),]);constdata=[{vector:[3.1,4.1],item:"foo",price:10.0},{vector:[5.9,26.5],item:"bar",price:20.0},];consttbl=awaitdb.createTable({name:"myTableWithSchema",data,schema,});

Warning

existsOk is not available invectordb

If the table already exists, vectordb will raise an error by default.You can usewriteMode: WriteMode.Overwrite to overwrite the table.But this will delete the existing table and create a new one with the same name.

Sometimes you want to make sure that you start fresh.

If you want to overwrite the table, you can pass inwriteMode: lancedb.WriteMode.Overwrite to the createTable function.

consttable=awaitcon.createTable(tableName,data,{writeMode:WriteMode.Overwrite})

From a Pandas DataFrame

importpandasaspddata=pd.DataFrame({"vector":[[1.1,1.2,1.3,1.4],[0.2,1.8,0.4,3.6]],"lat":[45.5,40.1],"long":[-122.7,-74.1],})db.create_table("my_table_pandas",data)db["my_table_pandas"].head()
importpandasaspddata=pd.DataFrame({"vector":[[1.1,1.2,1.3,1.4],[0.2,1.8,0.4,3.6]],"lat":[45.5,40.1],"long":[-122.7,-74.1],})async_tbl=awaitasync_db.create_table("my_table_async_pd",data)awaitasync_tbl.head()

Note

Data is converted to Arrow before being written to disk. For maximum control over how data is saved, either provide the PyArrow schema to convert to or else provide a PyArrow Table directly.

Thevector column needs to be aVector (defined aspyarrow.FixedSizeList) type.

importpyarrowaspacustom_schema=pa.schema([pa.field("vector",pa.list_(pa.float32(),4)),pa.field("lat",pa.float32()),pa.field("long",pa.float32()),])tbl=db.create_table("my_table_custom_schema",data,schema=custom_schema)
importpyarrowaspacustom_schema=pa.schema([pa.field("vector",pa.list_(pa.float32(),4)),pa.field("lat",pa.float32()),pa.field("long",pa.float32()),])async_tbl=awaitasync_db.create_table("my_table_async_custom_schema",data,schema=custom_schema)

From a Polars DataFrame

LanceDB supportsPolars, a modern, fast DataFrame librarywritten in Rust. Just like in Pandas, the Polars integration is enabled by PyArrowunder the hood. A deeper integration between LanceDB Tables and Polars DataFramesis on the way.

importpolarsaspldata=pl.DataFrame({"vector":[[3.1,4.1],[5.9,26.5]],"item":["foo","bar"],"price":[10.0,20.0],})tbl=db.create_table("my_table_pl",data)
importpolarsaspldata=pl.DataFrame({"vector":[[3.1,4.1],[5.9,26.5]],"item":["foo","bar"],"price":[10.0,20.0],})async_tbl=awaitasync_db.create_table("my_table_async_pl",data)

From an Arrow Table

You can also create LanceDB tables directly from Arrow tables.LanceDB supports float16 data type!

importpyarrowaspaimportnumpyasnpdim=16total=2schema=pa.schema([pa.field("vector",pa.list_(pa.float16(),dim)),pa.field("text",pa.string())])data=pa.Table.from_arrays([pa.array([np.random.randn(dim).astype(np.float16)for_inrange(total)],pa.list_(pa.float16(),dim),),pa.array(["foo","bar"]),],["vector","text"],)tbl=db.create_table("f16_tbl",data,schema=schema)
importpolarsasplimportnumpyasnpdim=16total=2schema=pa.schema([pa.field("vector",pa.list_(pa.float16(),dim)),pa.field("text",pa.string())])data=pa.Table.from_arrays([pa.array([np.random.randn(dim).astype(np.float16)for_inrange(total)],pa.list_(pa.float16(),dim),),pa.array(["foo","bar"]),],["vector","text"],)async_tbl=awaitasync_db.create_table("f16_tbl_async",data,schema=schema)
constdb=awaitlancedb.connect(databaseDir);constdim=16;consttotal=10;constf16Schema=newSchema([newField("id",newInt32()),newField("vector",newFixedSizeList(dim,newField("item",newFloat16(),true)),false,),]);constdata=lancedb.makeArrowTable(Array.from(Array(total),(_,i)=>({id:i,vector:Array.from(Array(dim),Math.random),})),{schema:f16Schema},);const_table=awaitdb.createTable("f16_tbl",data);
constdim=16;consttotal=10;constschema=newSchema([newField("id",newInt32()),newField("vector",newFixedSizeList(dim,newField("item",newFloat16(),true)),false,),]);constdata=lancedb.makeArrowTable(Array.from(Array(total),(_,i)=>({id:i,vector:Array.from(Array(dim),Math.random),})),{schema},);consttable=awaitdb.createTable("f16_tbl",data);

From Pydantic Models

When you create an empty table without data, you must specify the table schema.LanceDB supports creating tables by specifying a PyArrow schema or a specializedPydantic model calledLanceModel.

For example, the following Content model specifies a table with 5 columns:movie_id,vector,genres,title, andimdb_id. When you create a table, you canpass the class as the value of theschema parameter tocreate_table.Thevector column is aVector type, which is a specialized Pydantic type thatcan be configured with the vector dimensions. It is also important to note thatLanceDB only understands subclasses oflancedb.pydantic.LanceModel(which itself derives frompydantic.BaseModel).

fromlancedb.pydanticimportVector,LanceModelimportpyarrowaspaclassContent(LanceModel):movie_id:intvector:Vector(128)genres:strtitle:strimdb_id:int@propertydefimdb_url(self)->str:returnf"https://www.imdb.com/title/tt{self.imdb_id}"tbl=db.create_table("movielens_small",schema=Content)
fromlancedb.pydanticimportVector,LanceModelimportpyarrowaspaclassContent(LanceModel):movie_id:intvector:Vector(128)genres:strtitle:strimdb_id:int@propertydefimdb_url(self)->str:returnf"https://www.imdb.com/title/tt{self.imdb_id}"async_tbl=awaitasync_db.create_table("movielens_small_async",schema=Content)

Nested schemas

Sometimes your data model may contain nested objects.For example, you may want to store the document stringand the document source name as a nested Document object:

frompydanticimportBaseModelclassDocument(BaseModel):content:strsource:str

This can be used as the type of a LanceDB table column:

classNestedSchema(LanceModel):id:strvector:Vector(1536)document:Documenttbl=db.create_table("nested_table",schema=NestedSchema)
classNestedSchema(LanceModel):id:strvector:Vector(1536)document:Documentasync_tbl=awaitasync_db.create_table("nested_table_async",schema=NestedSchema)

This creates a struct column called "document" that has two subfieldscalled "content" and "source":

In [28]: tbl.schemaOut[28]:id: string not nullvector: fixed_size_list<item: float>[1536] not null    child 0, item: floatdocument: struct<content: string not null, source: string not null> not null    child 0, content: string not null    child 1, source: string not null

Validators

Note that neither Pydantic nor PyArrow automatically validates that input datais of the correct timezone, but this is easy to add as a custom field validator:

fromdatetimeimportdatetimefromzoneinfoimportZoneInfofromlancedb.pydanticimportLanceModelfrompydanticimportField,field_validator,ValidationError,ValidationInfotzname="America/New_York"tz=ZoneInfo(tzname)classTestModel(LanceModel):dt_with_tz:datetime=Field(json_schema_extra={"tz":tzname})@field_validator('dt_with_tz')@classmethoddeftz_must_match(cls,dt:datetime)->datetime:assertdt.tzinfo==tzreturndtok=TestModel(dt_with_tz=datetime.now(tz))try:TestModel(dt_with_tz=datetime.now(ZoneInfo("Asia/Shanghai")))assert0==1,"this should raise ValidationError"exceptValidationError:print("A ValidationError was raised.")pass

When you run this code it should print "A ValidationError was raised."

Pydantic custom types

LanceDB does NOT yet support converting pydantic custom types. If this is something you need,please file a feature request on theLanceDB Github repo.

Using Iterators / Writing Large Datasets

It is recommended to use iterators to add large datasets in batches when creating your table in one go. This does not create multiple versions of your dataset unlike manually adding batches usingtable.add()

LanceDB additionally supports PyArrow'sRecordBatch Iterators or other generators producing supported data types.

Here's an example using usingRecordBatch iterator for creating tables.

importpyarrowaspadefmake_batches():foriinrange(5):yieldpa.RecordBatch.from_arrays([pa.array([[3.1,4.1,5.1,6.1],[5.9,26.5,4.7,32.8]],pa.list_(pa.float32(),4),),pa.array(["foo","bar"]),pa.array([10.0,20.0]),],["vector","item","price"],)schema=pa.schema([pa.field("vector",pa.list_(pa.float32(),4)),pa.field("item",pa.utf8()),pa.field("price",pa.float32()),])db.create_table("batched_tale",make_batches(),schema=schema)
importpyarrowaspadefmake_batches():foriinrange(5):yieldpa.RecordBatch.from_arrays([pa.array([[3.1,4.1,5.1,6.1],[5.9,26.5,4.7,32.8]],pa.list_(pa.float32(),4),),pa.array(["foo","bar"]),pa.array([10.0,20.0]),],["vector","item","price"],)schema=pa.schema([pa.field("vector",pa.list_(pa.float32(),4)),pa.field("item",pa.utf8()),pa.field("price",pa.float32()),])awaitasync_db.create_table("batched_table",make_batches(),schema=schema)

You can also use iterators of other types like Pandas DataFrame or Pylists directly in the above example.

Open existing tables

If you forget the name of your table, you can always get a listing of all table names.

print(db.table_names())
print(awaitasync_db.table_names())

Then, you can open any existing tables.

tbl=db.open_table("test_table")
async_tbl=awaitasync_db.open_table("test_table_async")

If you forget the name of your table, you can always get a listing of all table names.

console.log(awaitdb.tableNames());

Then, you can open any existing tables.

consttbl=awaitdb.openTable("my_table");

Creating empty table

You can create an empty table for scenarios where you want to add data to the table later. An example would be when you want to collect data from a stream/external file and then add it to a table in batches.

An empty table can be initialized via a PyArrow schema.

importlancedbimportpyarrowaspaschema=pa.schema([pa.field("vector",pa.list_(pa.float32(),2)),pa.field("item",pa.string()),pa.field("price",pa.float32()),])tbl=db.create_table("test_empty_table",schema=schema)
importlancedbimportpyarrowaspaschema=pa.schema([pa.field("vector",pa.list_(pa.float32(),2)),pa.field("item",pa.string()),pa.field("price",pa.float32()),])async_tbl=awaitasync_db.create_table("test_empty_table_async",schema=schema)

Alternatively, you can also use Pydantic to specify the schema for the empty table. Note that we do notdirectly importpydantic but instead uselancedb.pydantic which is a subclass ofpydantic.BaseModelthat has been extended to support LanceDB specific types likeVector.

importlancedbfromlancedb.pydanticimportVector,LanceModelclassItem(LanceModel):vector:Vector(2)item:strprice:floattbl=db.create_table("test_empty_table_new",schema=Item.to_arrow_schema())
importlancedbfromlancedb.pydanticimportVector,LanceModelclassItem(LanceModel):vector:Vector(2)item:strprice:floatasync_tbl=awaitasync_db.create_table("test_empty_table_async_new",schema=Item.to_arrow_schema())

Once the empty table has been created, you can add data to it via the various methods listed in theAdding to a table section.

constschema=newarrow.Schema([newarrow.Field("id",newarrow.Int32()),newarrow.Field("name",newarrow.Utf8()),]);constemptyTbl=awaitdb.createEmptyTable("empty_table",schema);
constschema=newarrow.Schema([newarrow.Field("id",newarrow.Int32()),newarrow.Field("name",newarrow.Utf8()),]);constempty_tbl=awaitdb.createTable({name:"empty_table",schema});

Adding to a table

After a table has been created, you can always add more data to it using theadd method

You can add any of the valid data structures accepted by LanceDB table, i.e,dict,list[dict],pd.DataFrame, orIterator[pa.RecordBatch]. Below are some examples.

Add a Pandas DataFrame

df=pd.DataFrame({"vector":[[1.3,1.4],[9.5,56.2]],"item":["banana","apple"],"price":[5.0,7.0],})tbl.add(df)
df=pd.DataFrame({"vector":[[1.3,1.4],[9.5,56.2]],"item":["banana","apple"],"price":[5.0,7.0],})awaitasync_tbl.add(df)

Add a Polars DataFrame

df=pl.DataFrame({"vector":[[1.3,1.4],[9.5,56.2]],"item":["banana","apple"],"price":[5.0,7.0],})tbl.add(df)
df=pl.DataFrame({"vector":[[1.3,1.4],[9.5,56.2]],"item":["banana","apple"],"price":[5.0,7.0],})awaitasync_tbl.add(df)

Add an Iterator

You can also add a large dataset batch in one go using Iterator of any supported data types.

defmake_batches_for_add():foriinrange(5):yield[{"vector":[3.1,4.1],"item":"peach","price":6.0},{"vector":[5.9,26.5],"item":"pear","price":5.0},]tbl.add(make_batches_for_add())
defmake_batches_for_add():foriinrange(5):yield[{"vector":[3.1,4.1],"item":"peach","price":6.0},{"vector":[5.9,26.5],"item":"pear","price":5.0},]awaitasync_tbl.add(make_batches_for_add())

Add a PyArrow table

If you have data coming in as a PyArrow table, you can add it directly to the LanceDB table.

pa_table=pa.Table.from_arrays([pa.array([[9.1,6.7],[9.9,31.2]],pa.list_(pa.float32(),2)),pa.array(["mango","orange"]),pa.array([7.0,4.0]),],["vector","item","price"],)tbl.add(pa_table)
pa_table=pa.Table.from_arrays([pa.array([[9.1,6.7],[9.9,31.2]],pa.list_(pa.float32(),2)),pa.array(["mango","orange"]),pa.array([7.0,4.0]),],["vector","item","price"],)awaitasync_tbl.add(pa_table)

Add a Pydantic Model

Assuming that a table has been created with the correct schema as shownabove, you can add data items that are valid Pydantic models to the table.

pydantic_model_items=[Item(vector=[8.1,4.7],item="pineapple",price=10.0),Item(vector=[6.9,9.3],item="avocado",price=9.0),]tbl.add(pydantic_model_items)
pydantic_model_items=[Item(vector=[8.1,4.7],item="pineapple",price=10.0),Item(vector=[6.9,9.3],item="avocado",price=9.0),]awaitasync_tbl.add(pydantic_model_items)
Ingesting Pydantic models with LanceDB embedding API

When using LanceDB's embedding API, you can add Pydantic models directly to the table. LanceDB will automatically convert thevector field to a vector before adding it to the table. You need to specify the default value ofvector field as None to allow LanceDB to automatically vectorize the data.

importlancedbfromlancedb.pydanticimportVector,LanceModelfromlancedb.embeddingsimportget_registryembed_fcn=get_registry().get("huggingface").create(name="BAAI/bge-small-en-v1.5")classSchema(LanceModel):text:str=embed_fcn.SourceField()vector:Vector(embed_fcn.ndims())=embed_fcn.VectorField(default=None)tbl=db.create_table("my_table_with_embedding",schema=Schema,mode="overwrite")models=[Schema(text="hello"),Schema(text="world")]tbl.add(models)
importlancedbfromlancedb.pydanticimportVector,LanceModelfromlancedb.embeddingsimportget_registryembed_fcn=get_registry().get("huggingface").create(name="BAAI/bge-small-en-v1.5")classSchema(LanceModel):text:str=embed_fcn.SourceField()vector:Vector(embed_fcn.ndims())=embed_fcn.VectorField(default=None)async_tbl=awaitasync_db.create_table("my_table_async_with_embedding",schema=Schema,mode="overwrite")models=[Schema(text="hello"),Schema(text="world")]awaitasync_tbl.add(models)
awaittbl.add([{vector:[1.3,1.4],item:"fizz",price:100.0},{vector:[9.5,56.2],item:"buzz",price:200.0}])

Upserting into a table

Upserting lets you insert new rows or update existing rows in a table. To upsertin LanceDB, use the merge insert API.

table=db.create_table("users",[{"id":0,"name":"Alice"},{"id":1,"name":"Bob"},],)new_users=[{"id":1,"name":"Bobby"},{"id":2,"name":"Charlie"},]res=(table.merge_insert("id").when_matched_update_all().when_not_matched_insert_all().execute(new_users))table.count_rows()# 3res# {'num_inserted_rows': 1, 'num_updated_rows': 1, 'num_deleted_rows': 0}
API Reference:lancedb.table.Table.merge_insert

table=awaitdb.create_table("users",[{"id":0,"name":"Alice"},{"id":1,"name":"Bob"},],)new_users=[{"id":1,"name":"Bobby"},{"id":2,"name":"Charlie"},]res=await(table.merge_insert("id").when_matched_update_all().when_not_matched_insert_all().execute(new_users))awaittable.count_rows()# 3res# MergeResult(version=2, num_updated_rows=1,# num_inserted_rows=1, num_deleted_rows=0)
API Reference:lancedb.table.AsyncTable.merge_insert

consttable=awaitdb.createTable("users",[{id:0,name:"Alice"},{id:1,name:"Bob"},]);constnewUsers=[{id:1,name:"Bobby"},{id:2,name:"Charlie"},];awaittable.mergeInsert("id").whenMatchedUpdateAll().whenNotMatchedInsertAll().execute(newUsers);awaittable.countRows();// 3
API Reference:lancedb.Table.mergeInsert

Read more in the guide onmerge insert.

Deleting from a table

Use thedelete() method on tables to delete rows from a table. To choose which rows to delete, provide a filter that matches on the metadata columns. This can delete any number of rows that match the filter.

tbl.delete('item = "fizz"')
awaitasync_tbl.delete('item = "fizz"')

Deleting row with specific column value

data=[{"x":1,"vector":[1,2]},{"x":2,"vector":[3,4]},{"x":3,"vector":[5,6]},]# Synchronous clienttbl=db.create_table("delete_row",data)tbl.to_pandas()#   x      vector# 0  1  [1.0, 2.0]# 1  2  [3.0, 4.0]# 2  3  [5.0, 6.0]tbl.delete("x = 2")tbl.to_pandas()#   x      vector# 0  1  [1.0, 2.0]# 1  3  [5.0, 6.0]
data=[{"x":1,"vector":[1,2]},{"x":2,"vector":[3,4]},{"x":3,"vector":[5,6]},]async_db=awaitlancedb.connect_async(uri)async_tbl=awaitasync_db.create_table("delete_row_async",data)awaitasync_tbl.to_pandas()#   x      vector# 0  1  [1.0, 2.0]# 1  2  [3.0, 4.0]# 2  3  [5.0, 6.0]awaitasync_tbl.delete("x = 2")awaitasync_tbl.to_pandas()#   x      vector# 0  1  [1.0, 2.0]# 1  3  [5.0, 6.0]

Delete from a list of values

to_remove=[1,5]to_remove=", ".join(str(v)forvinto_remove)tbl.delete(f"x IN ({to_remove})")tbl.to_pandas()#   x      vector# 0  3  [5.0, 6.0]
to_remove=[1,5]to_remove=", ".join(str(v)forvinto_remove)awaitasync_tbl.delete(f"x IN ({to_remove})")awaitasync_tbl.to_pandas()#   x      vector# 0  3  [5.0, 6.0]
awaittbl.delete('item = "fizz"')

Deleting row with specific column value

constcon=awaitlancedb.connect("./.lancedb")constdata=[{id:1,vector:[1,2]},{id:2,vector:[3,4]},{id:3,vector:[5,6]},];consttbl=awaitcon.createTable("my_table",data)awaittbl.delete("id = 2")awaittbl.countRows()// Returns 2

Delete from a list of values

constto_remove=[1,5];awaittbl.delete(`id IN (${to_remove.join(",")})`)awaittbl.countRows()// Returns 1

Updating a table

This can be used to update zero to all rows depending on how many rows match the where clause. The update queries follow the form of a SQL UPDATE statement. Thewhere parameter is a SQL filter that matches on the metadata columns. Thevalues orvalues_sql parameters are used to provide the new values for the columns.

ParameterTypeDescription
wherestrThe SQL where clause to use when updating rows. For example,'x = 2' or'x IN (1, 2, 3)'. The filter must not be empty, or it will error.
valuesdictThe values to update. The keys are the column names and the values are the values to set.
values_sqldictThe values to update. The keys are the column names and the values are the SQL expressions to set. For example,{'x': 'x + 1'} will increment the value of thex column by 1.

SQL syntax

SeeSQL filters for more information on the supported SQL syntax.

Warning

Updating nested columns is not yet supported.

API Reference:lancedb.table.Table.update

importlancedbimportpandasaspd# Create a table from a pandas DataFramedata=pd.DataFrame({"x":[1,2,3],"vector":[[1,2],[3,4],[5,6]]})tbl=db.create_table("test_table",data,mode="overwrite")# Update the table where x = 2tbl.update(where="x = 2",values={"vector":[10,10]})# Get the updated table as a pandas DataFramedf=tbl.to_pandas()print(df)
importlancedbimportpandasaspd# Create a table from a pandas DataFramedata=pd.DataFrame({"x":[1,2,3],"vector":[[1,2],[3,4],[5,6]]})async_tbl=awaitasync_db.create_table("update_table_async",data)# Update the table where x = 2awaitasync_tbl.update({"vector":[10,10]},where="x = 2")# Get the updated table as a pandas DataFramedf=awaitasync_tbl.to_pandas()# Print the DataFrameprint(df)

Output

xvector01[1.0,2.0]13[5.0,6.0]22[10.0,10.0]

API Reference:lancedb.Table.update

import*aslancedbfrom"@lancedb/lancedb";constdb=awaitlancedb.connect("./.lancedb");constdata=[{x:1,vector:[1,2]},{x:2,vector:[3,4]},{x:3,vector:[5,6]},];consttbl=awaitdb.createTable("my_table",data)awaittbl.update({values:{vector:[10,10]},where:"x = 2"});

API Reference:vectordb.Table.update

constlancedb=require("vectordb");constdb=awaitlancedb.connect("./.lancedb");constdata=[{x:1,vector:[1,2]},{x:2,vector:[3,4]},{x:3,vector:[5,6]},];consttbl=awaitdb.createTable("my_table",data)awaittbl.update({where:"x = 2",values:{vector:[10,10]}});

Updating using a sql query

Thevalues parameter is used to provide the new values for the columns as literal values. You can also use thevalues_sql /valuesSql parameter to provide SQL expressions for the new values. For example, you can usevalues_sql="x + 1" to increment the value of thex column by 1.

# Update the table where x = 2tbl.update(values_sql={"x":"x + 1"})print(tbl.to_pandas())
# Update the table where x = 2awaitasync_tbl.update(updates_sql={"x":"x + 1"})print(awaitasync_tbl.to_pandas())

Output

xvector02[1.0,2.0]14[5.0,6.0]23[10.0,10.0]

Coming Soon!

awaittbl.update({valuesSql:{x:"x + 1"}})

Note

When rows are updated, they are moved out of the index. The row will still show up in ANN queries, but the query will not be as fast as it would be if the row was in the index. If you update a large proportion of rows, consider rebuilding the index afterwards.

Drop a table

Use thedrop_table() method on the database to remove a table.

db.drop_table("my_table")
awaitdb.drop_table("my_table_async")

This permanently removes the table and is not recoverable, unlike deleting rows. By default, if the table does not exist an exception is raised. To suppress this, you can pass inignore_missing=True.

awaitdb.dropTable("myTable");

This permanently removes the table and is not recoverable, unlike deleting rows. If the table does not exist an exception is raised.

Changing schemas

While tables must have a schema specified when they are created, you canchange the schema over time. There's three methods to alter the schema ofa table:

  • add_columns: Add new columns to the table
  • alter_columns: Alter the name, nullability, or data type of a column
  • drop_columns: Drop columns from the table

Adding new columns

You can add new columns to the table with theadd_columns method. New columnsare filled with values based on a SQL expression. For example, you can add a newcolumny to the table, fill it with the value ofx * 2 and set the expecteddata type for it.

tbl.add_columns({"double_price":"cast((price * 2) as float)"})
awaittbl.add_columns({"double_price":"cast((price * 2) as float)"})

API Reference:lancedb.table.Table.add_columns

awaittbl.addColumns([{name:"double_price",valueSql:"cast((price * 2) as Float)"},]);
API Reference:lancedb.Table.addColumns

If you want to fill it with null, you can usecast(NULL as <data_type>) asthe SQL expression to fill the column with nulls, while controlling the datatype of the column. Available data types are base on theDataFusion data types.You can use any of the SQL types, such asBIGINT:

cast(NULLasBIGINT)

Using Arrow data types and thearrow_typeof function is not yet supported.

Altering existing columns

You can alter the name, nullability, or data type of a column with thealter_columnsmethod.

Changing the name or nullability of a column just updates the metadata. Becauseof this, it's a fast operation. Changing the data type of a column requiresrewriting the column, which can be a heavy operation.

importpyarrowaspatbl.alter_columns({"path":"double_price","rename":"dbl_price","data_type":pa.float64(),"nullable":True,})
importpyarrowaspaawaittbl.alter_columns({"path":"double_price","rename":"dbl_price","data_type":pa.float64(),"nullable":True,})

API Reference:lancedb.table.Table.alter_columns

awaittbl.alterColumns([{path:"double_price",rename:"dbl_price",dataType:"float",nullable:true,},]);
API Reference:lancedb.Table.alterColumns

Dropping columns

You can drop columns from the table with thedrop_columns method. This willwill remove the column from the schema.

tbl.drop_columns(["dbl_price"])
awaittbl.drop_columns(["dbl_price"])

API Reference:lancedb.table.Table.drop_columns

awaittbl.dropColumns(["dbl_price"]);
API Reference:lancedb.Table.dropColumns

Handling bad vectors

In LanceDB Python, you can use theon_bad_vectors parameter to choose howinvalid vector values are handled. Invalid vectors are vectors that are not validbecause:

  1. They are the wrong dimension
  2. They contain NaN values
  3. They are null but are on a non-nullable field

By default, LanceDB will raise an error if it encounters a bad vector. You canalso choose one of the following options:

  • drop: Ignore rows with bad vectors
  • fill: Replace bad values (NaNs) or missing values (too few dimensions) with the fill value specified in thefill_value parameter. An input like[1.0, NaN, 3.0] will be replaced with[1.0, 0.0, 3.0] iffill_value=0.0.
  • null: Replace bad vectors with null (only works if the column is nullable). A bad vector[1.0, NaN, 3.0] will be replaced withnull if the column is nullable. If the vector column is non-nullable, then bad vectors will cause an error

Consistency

In LanceDB OSS, users can set theread_consistency_interval parameter on connections to achieve different levels of read consistency. This parameter determines how frequently the database synchronizes with the underlying storage system to check for updates made by other processes. If another process updates a table, the database will not see the changes until the next synchronization.

There are three possible settings forread_consistency_interval:

  1. Unset (default): The database does not check for updates to tables made by other processes. This provides the best query performance, but means that clients may not see the most up-to-date data. This setting is suitable for applications where the data does not change during the lifetime of the table reference.
  2. Zero seconds (Strong consistency): The database checks for updates on every read. This provides the strongest consistency guarantees, ensuring that all clients see the latest committed data. However, it has the most overhead. This setting is suitable when consistency matters more than having high QPS.
  3. Custom interval (Eventual consistency): The database checks for updates at a custom interval, such as every 5 seconds. This provides eventual consistency, allowing for some lag between write and read operations. Performance wise, this is a middle ground between strong consistency and no consistency check. This setting is suitable for applications where immediate consistency is not critical, but clients should see updated data eventually.

Consistency in LanceDB Cloud

This is only tune-able in LanceDB OSS. In LanceDB Cloud, readers are always eventually consistent.

To set strong consistency, usetimedelta(0):

fromdatetimeimporttimedeltauri="data/sample-lancedb"db=lancedb.connect(uri,read_consistency_interval=timedelta(0))tbl=db.open_table("test_table")
fromdatetimeimporttimedeltauri="data/sample-lancedb"async_db=awaitlancedb.connect_async(uri,read_consistency_interval=timedelta(0))async_tbl=awaitasync_db.open_table("test_table_async")

For eventual consistency, use a customtimedelta:

fromdatetimeimporttimedeltauri="data/sample-lancedb"db=lancedb.connect(uri,read_consistency_interval=timedelta(seconds=5))tbl=db.open_table("test_table")
fromdatetimeimporttimedeltauri="data/sample-lancedb"async_db=awaitlancedb.connect_async(uri,read_consistency_interval=timedelta(seconds=5))async_tbl=awaitasync_db.open_table("test_table_async")

By default, aTable will never check for updates from other writers. To manually check for updates you can usecheckout_latest:

tbl=db.open_table("test_table")# (Other writes happen to my_table from another process)# Check for updatestbl.checkout_latest()
async_tbl=awaitasync_db.open_table("test_table_async")# (Other writes happen to test_table_async from another process)# Check for updatesawaitasync_tbl.checkout_latest()

To set strong consistency, use0:

constdb=awaitlancedb.connect({uri:"./.lancedb",readConsistencyInterval:0});consttbl=awaitdb.openTable("my_table");

For eventual consistency, specify the update interval as seconds:

constdb=awaitlancedb.connect({uri:"./.lancedb",readConsistencyInterval:5});consttbl=awaitdb.openTable("my_table");

What's next?

Learn the best practices on creating an ANN index and getting the most out of it.


  1. Thevectordb package is a legacy package that is deprecated in favor of@lancedb/lancedb. Thevectordb package will continue to receive bug fixes and security updates until September 2024. We recommend all new projects use@lancedb/lancedb. See themigration guide for more information. 


[8]ページ先頭

©2009-2025 Movatter.jp