Movatterモバイル変換


[0]ホーム

URL:


Skip to content

DuckDB

In Python, LanceDB tables can also be queried withDuckDB, an in-process SQL OLAP database. This means you can write complex SQL queries to analyze your data in LanceDB.

This integration is done viaApache Arrow, which provides zero-copy data sharing between LanceDB and DuckDB. DuckDB is capable of passing down column selections and basic filters to LanceDB, reducing the amount of data that needs to be scanned to perform your query. Finally, the integration allows streaming data from LanceDB tables, allowing you to aggregate tables that won't fit into memory. All of this uses the same mechanism described in DuckDB's blog postDuckDB quacks Arrow.

We can demonstrate this by first installingduckdb andlancedb.

pipinstallduckdblancedb

We will re-use the datasetcreated previously:

importlancedbdb=lancedb.connect("data/sample-lancedb")data=[{"vector":[3.1,4.1],"item":"foo","price":10.0},{"vector":[5.9,26.5],"item":"bar","price":20.0}]table=db.create_table("pd_table",data=data)

Theto_lance method converts the LanceDB table to aLanceDataset, which is accessible to DuckDB through the Arrow compatibility layer.To query the resulting Lance dataset in DuckDB, all you need to do is reference the dataset by the same name in your SQL query.

importduckdbarrow_table=table.to_lance()duckdb.query("SELECT * FROM arrow_table")
┌─────────────┬─────────┬────────┐│   vector    │  item   │ price  ││   float[]   │ varchar │ double │├─────────────┼─────────┼────────┤│ [3.1, 4.1]  │ foo     │   10.0 ││ [5.9, 26.5] │ bar     │   20.0 │└─────────────┴─────────┴────────┘

You can very easily run any other DuckDB SQL queries on your data.

duckdb.query("SELECT mean(price) FROM arrow_table")
┌─────────────┐│ mean(price) ││   double    │├─────────────┤│        15.0 │└─────────────┘

[8]ページ先頭

©2009-2025 Movatter.jp