Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Example - MultiModal CLIP Embeddings

The Disappearing Embedding Function

Previously, to use vector databases, you had to do the embedding process yourself and interact with the system using vectors directly.With this new release of LanceDB, we make it much more convenient so you don't need to worry about that at all.

  1. We present you with sentence-transformer, openai, and openclip embedding functions that can be saved directly as table metadata
  2. You no longer have to generate the vectors directly either during query time or ingestion time
  3. The embedding function interface is extensible so you can create your own
  4. The function is persisted as table metadata so you can use it across sessions
In [1]:
importlancedb
import lancedb

Multi-modal search made easy

In this example we'll go over multi-modal image search using:

  • Oxford Pet dataset
  • OpenClip model
  • LanceDB

Data

First, download the dataset fromhttps://www.robots.ox.ac.uk/~vgg/data/pets/Specifically, download theimages.tar.gz

This notebook assumes you've downloaded it into your ~/Downloads directory.When you extract the tarball, it will create animages directory.

Define embedding function

We'll use the OpenClipEmbeddingFunction here for multi-modal image search.

In [7]:
fromlancedb.embeddingsimportEmbeddingFunctionRegistryregistry=EmbeddingFunctionRegistry.get_instance()clip=registry.get("open-clip").create()
from lancedb.embeddings import EmbeddingFunctionRegistryregistry = EmbeddingFunctionRegistry.get_instance()clip = registry.get("open-clip").create()
/home/saksham/Documents/lancedb/env/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html  from .autonotebook import tqdm as notebook_tqdmDownloading (…)ip_pytorch_model.bin: 100%|██████████| 605M/605M [00:41<00:00, 14.6MB/s]
In [6]:
!pipinstallopen_clip_torch
!pip install open_clip_torch
Collecting open_clip_torch  Downloading open_clip_torch-2.20.0-py3-none-any.whl (1.5 MB)     |████████████████████████████████| 1.5 MB 771 kB/s eta 0:00:01Requirement already satisfied: regex in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from open_clip_torch) (2023.10.3)Requirement already satisfied: tqdm in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from open_clip_torch) (4.66.1)Collecting torchvision  Downloading torchvision-0.16.0-cp38-cp38-manylinux1_x86_64.whl (6.9 MB)     |████████████████████████████████| 6.9 MB 21.0 MB/s eta 0:00:01Collecting huggingface-hub  Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB)     |████████████████████████████████| 295 kB 43.1 MB/s eta 0:00:01Collecting protobuf<4  Using cached protobuf-3.20.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB)Collecting timm  Downloading timm-0.9.7-py3-none-any.whl (2.2 MB)     |████████████████████████████████| 2.2 MB 28.3 MB/s eta 0:00:01Collecting sentencepiece  Downloading sentencepiece-0.1.99-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)     |████████████████████████████████| 1.3 MB 39.9 MB/s eta 0:00:01Collecting torch>=1.9.0  Downloading torch-2.1.0-cp38-cp38-manylinux1_x86_64.whl (670.2 MB)     |████████████████████████████████| 670.2 MB 47 kB/s s eta 0:00:01Collecting ftfy  Downloading ftfy-6.1.1-py3-none-any.whl (53 kB)     |████████████████████████████████| 53 kB 2.3 MB/s  eta 0:00:01Collecting pillow!=8.3.*,>=5.3.0  Using cached Pillow-10.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.5 MB)Requirement already satisfied: requests in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from torchvision->open_clip_torch) (2.31.0)Requirement already satisfied: numpy in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from torchvision->open_clip_torch) (1.24.4)Requirement already satisfied: packaging>=20.9 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from huggingface-hub->open_clip_torch) (23.2)Collecting fsspec  Downloading fsspec-2023.9.2-py3-none-any.whl (173 kB)     |████████████████████████████████| 173 kB 22.0 MB/s eta 0:00:01Collecting filelock  Using cached filelock-3.12.4-py3-none-any.whl (11 kB)Requirement already satisfied: pyyaml>=5.1 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from huggingface-hub->open_clip_torch) (6.0.1)Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from huggingface-hub->open_clip_torch) (4.8.0)Collecting safetensors  Downloading safetensors-0.3.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)     |████████████████████████████████| 1.3 MB 22.8 MB/s eta 0:00:01Collecting networkx  Downloading networkx-3.1-py3-none-any.whl (2.1 MB)     |████████████████████████████████| 2.1 MB 16.6 MB/s eta 0:00:01Collecting triton==2.1.0; platform_system == "Linux" and platform_machine == "x86_64"  Downloading triton-2.1.0-0-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89.2 MB)     |████████████████████████████████| 89.2 MB 31.6 MB/s eta 0:00:01Collecting nvidia-curand-cu12==10.3.2.106; platform_system == "Linux" and platform_machine == "x86_64"  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)     |████████████████████████████████| 56.5 MB 15.9 MB/s eta 0:00:01Collecting nvidia-nvtx-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64"  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)     |████████████████████████████████| 99 kB 9.4 MB/s  eta 0:00:01Collecting sympy  Downloading sympy-1.12-py3-none-any.whl (5.7 MB)     |████████████████████████████████| 5.7 MB 16.4 MB/s eta 0:00:01Collecting nvidia-cusparse-cu12==12.1.0.106; platform_system == "Linux" and platform_machine == "x86_64"  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)     |████████████████████████████████| 196.0 MB 78 kB/s  eta 0:00:011Collecting nvidia-cuda-nvrtc-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64"  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)     |████████████████████████████████| 23.7 MB 619 kB/s eta 0:00:011Collecting nvidia-cufft-cu12==11.0.2.54; platform_system == "Linux" and platform_machine == "x86_64"  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)     |████████████████████████████████| 121.6 MB 93 kB/s s eta 0:00:01Collecting nvidia-cuda-cupti-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64"  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)     |████████████████████████████████| 14.1 MB 19.5 MB/s eta 0:00:01Requirement already satisfied: jinja2 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from torch>=1.9.0->open_clip_torch) (3.1.2)Collecting nvidia-nccl-cu12==2.18.1; platform_system == "Linux" and platform_machine == "x86_64"  Downloading nvidia_nccl_cu12-2.18.1-py3-none-manylinux1_x86_64.whl (209.8 MB)     |████████████████████████████████| 209.8 MB 5.2 kB/s  eta 0:00:01     |███████████████████████████████▊| 208.2 MB 17.0 MB/s eta 0:00:01Collecting nvidia-cudnn-cu12==8.9.2.26; platform_system == "Linux" and platform_machine == "x86_64"  Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)     |████████████████████████████████| 731.7 MB 22 kB/s  eta 0:00:011Collecting nvidia-cublas-cu12==12.1.3.1; platform_system == "Linux" and platform_machine == "x86_64"  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)     |████████████████████████████████| 410.6 MB 9.2 kB/s eta 0:00:012Collecting nvidia-cuda-runtime-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64"  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)     |████████████████████████████████| 823 kB 18.5 MB/s eta 0:00:01Collecting nvidia-cusolver-cu12==11.4.5.107; platform_system == "Linux" and platform_machine == "x86_64"  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)     |████████████████████████████████| 124.2 MB 43 kB/s s eta 0:00:01ta 0:00:02Requirement already satisfied: wcwidth>=0.2.5 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from ftfy->open_clip_torch) (0.2.8)Requirement already satisfied: certifi>=2017.4.17 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (2023.7.22)Requirement already satisfied: urllib3<3,>=1.21.1 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (2.0.6)Requirement already satisfied: idna<4,>=2.5 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (3.4)Requirement already satisfied: charset-normalizer<4,>=2 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (3.3.0)Collecting mpmath>=0.19  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)     |████████████████████████████████| 536 kB 14.2 MB/s eta 0:00:01Collecting nvidia-nvjitlink-cu12  Downloading nvidia_nvjitlink_cu12-12.2.140-py3-none-manylinux1_x86_64.whl (20.2 MB)     |████████████████████████████████| 20.2 MB 14.3 MB/s eta 0:00:01Requirement already satisfied: MarkupSafe>=2.0 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from jinja2->torch>=1.9.0->open_clip_torch) (2.1.3)Installing collected packages: pillow, networkx, filelock, triton, nvidia-curand-cu12, nvidia-nvtx-cu12, mpmath, sympy, nvidia-nvjitlink-cu12, nvidia-cusparse-cu12, fsspec, nvidia-cuda-nvrtc-cu12, nvidia-cufft-cu12, nvidia-cuda-cupti-cu12, nvidia-nccl-cu12, nvidia-cublas-cu12, nvidia-cudnn-cu12, nvidia-cuda-runtime-cu12, nvidia-cusolver-cu12, torch, torchvision, huggingface-hub, protobuf, safetensors, timm, sentencepiece, ftfy, open-clip-torchSuccessfully installed filelock-3.12.4 fsspec-2023.9.2 ftfy-6.1.1 huggingface-hub-0.17.3 mpmath-1.3.0 networkx-3.1 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.18.1 nvidia-nvjitlink-cu12-12.2.140 nvidia-nvtx-cu12-12.1.105 open-clip-torch-2.20.0 pillow-10.0.1 protobuf-3.20.3 safetensors-0.3.3 sentencepiece-0.1.99 sympy-1.12 timm-0.9.7 torch-2.1.0 torchvision-0.16.0 triton-2.1.0
In [8]:
clip
clip
Out[8]:
OpenClipEmbeddings(name='ViT-B-32', pretrained='laion2b_s34b_b79k', device='cpu', batch_size=64, normalize=True)

The data model

We'll declare a new model that subclasses LanceModel (special pydantic model) to represent the table.This table has two columns, one for the image_uri and one for the vector generated from those images.The embedding function defines the number of dimensions in its vectors so you don't need tolook it up.

We use theVectorField method from the embedding function to annotate the modelso that LanceDB knows to use the open-clip embedding function to generate query embeddings thatcorrespond to thevector column.

We also use theSourceField so that when adding data, LanceDB knows to automatically useopen-clip to encode the input images.

Finally, because we're working with images, we add a convenience propertyimage to open the image andreturn a PIL Image so it can be visualized in Jupyter Notebook

In [ ]:
fromPILimportImagefromlancedb.pydanticimportLanceModel,VectorclassPets(LanceModel):vector:Vector(clip.ndims())=clip.VectorField()image_uri:str=clip.SourceField()@propertydefimage(self):returnImage.open(self.image_uri)
from PIL import Imagefrom lancedb.pydantic import LanceModel, Vectorclass Pets(LanceModel): vector: Vector(clip.ndims()) = clip.VectorField() image_uri: str = clip.SourceField() @property def image(self): return Image.open(self.image_uri)

Create the table

First we connect to a local lancedb directory

In [ ]:
db=lancedb.connect("~/.lancedb")
db = lancedb.connect("~/.lancedb")

Next we get all of the paths for the images we downloaded and create a table.Notice that we didn't have to worry about generating the image embeddings ourselves.

In [ ]:
importpandasaspdfrompathlibimportPathfromrandomimportsampleif"pets"indb:table=db["pets"]else:table=db.create_table("pets",schema=Pets)# use a sampling of 1000 imagesp=Path("~/Downloads/images").expanduser()uris=[str(f)forfinp.glob("*.jpg")]uris=sample(uris,1000)table.add(pd.DataFrame({"image_uri":uris}))
import pandas as pdfrom pathlib import Pathfrom random import sampleif "pets" in db: table = db["pets"]else: table = db.create_table("pets", schema=Pets) # use a sampling of 1000 images p = Path("~/Downloads/images").expanduser() uris = [str(f) for f in p.glob("*.jpg")] uris = sample(uris, 1000) table.add(pd.DataFrame({"image_uri": uris}))
In [ ]:
table.head().to_pandas()
table.head().to_pandas()
Out[ ]:
vectorimage_uri
0[0.018789755, 0.11621179, -0.09760579, -0.0268.../Users/changshe/Downloads/images/leonberger_14...
1[0.021960497, 0.06073219, -0.1625527, 0.021481.../Users/changshe/Downloads/images/havanese_63.jpg
2[0.0074375155, 0.084355146, -0.027461205, -0.0.../Users/changshe/Downloads/images/english_cocke...
3[-0.01220356, 0.020815236, -0.08587208, -0.027.../Users/changshe/Downloads/images/shiba_inu_143...
4[-0.010112503, 0.14021927, -0.14588796, -0.046.../Users/changshe/Downloads/images/saint_bernard...

Querying via text

We also don't need to generate the embeddings when querying either.LanceDB does that automatically so you can query directly using text input.

The pydantic model we declared for the table schema also makes it really easy for us to work with the search results

In [ ]:
rs=table.search("dog").limit(3).to_pydantic(Pets)rs[0].image
rs = table.search("dog").limit(3).to_pydantic(Pets)rs[0].image
Out[ ]:
No description has been provided for this image

Querying via images

The great thing about CLIP is that it's multi-modal.So you can search using not just text but images as well.

Create a query image using PIL

In [ ]:
fromPILimportImagep=Path("~/Downloads/images/samoyed_100.jpg").expanduser()query_image=Image.open(p)query_image
from PIL import Imagep = Path("~/Downloads/images/samoyed_100.jpg").expanduser()query_image = Image.open(p)query_image
Out[ ]:
No description has been provided for this image

Pass in the query_image to the search API

In [ ]:
rs=table.search(query_image).limit(3).to_pydantic(Pets)rs[2].image
rs = table.search(query_image).limit(3).to_pydantic(Pets)rs[2].image
Out[ ]:
No description has been provided for this image

Persistence

Embedding functions are persisted as table metadata so it's much easier to use across sessions.

For example we can recreate the database connection and table object

In [ ]:
db=lancedb.connect("~/.lancedb")table=db["pets"]
db = lancedb.connect("~/.lancedb")table = db["pets"]

We can observe that it's read out as table metadata

In [ ]:
importjsonjson.loads(table.schema.metadata[b"embedding_functions"])[0]
import jsonjson.loads(table.schema.metadata[b"embedding_functions"])[0]
Out[ ]:
{'name': 'open-clip', 'model': {'name': 'ViT-B-32',  'pretrained': 'laion2b_s34b_b79k',  'device': 'cpu',  'batch_size': 64,  'normalize': True}, 'source_column': 'image_uri', 'vector_column': 'vector'}

And we can also run queries as before without having to reinstantiate the embedding function explicitly

In [ ]:
rs=table.search("big dog").limit(3).to_pydantic(Pets)rs[0].image
rs = table.search("big dog").limit(3).to_pydantic(Pets)rs[0].image
Out[ ]:
No description has been provided for this image

LanceDB makes multimodal AI easy

  • LanceDB's new embedding functions feature makes it easy for builders of LLM apps
  • You no longer need to manually encode the data yourself
  • You no longer need to figure out how many dimensions is your vector
  • You no longer need to manually encode the query
  • And with the right embedding model, you can search way more than just text
In [ ]:

[8]ページ先頭

©2009-2025 Movatter.jp