Example - MultiModal CLIP Embeddings¶
The Disappearing Embedding Function¶
Previously, to use vector databases, you had to do the embedding process yourself and interact with the system using vectors directly.With this new release of LanceDB, we make it much more convenient so you don't need to worry about that at all.
- We present you with sentence-transformer, openai, and openclip embedding functions that can be saved directly as table metadata
- You no longer have to generate the vectors directly either during query time or ingestion time
- The embedding function interface is extensible so you can create your own
- The function is persisted as table metadata so you can use it across sessions
importlancedb
Multi-modal search made easy¶
In this example we'll go over multi-modal image search using:
- Oxford Pet dataset
- OpenClip model
- LanceDB
Data¶
First, download the dataset fromhttps://www.robots.ox.ac.uk/~vgg/data/pets/Specifically, download theimages.tar.gz
This notebook assumes you've downloaded it into your ~/Downloads directory.When you extract the tarball, it will create animages
directory.
Define embedding function¶
We'll use the OpenClipEmbeddingFunction here for multi-modal image search.
fromlancedb.embeddingsimportEmbeddingFunctionRegistryregistry=EmbeddingFunctionRegistry.get_instance()clip=registry.get("open-clip").create()
/home/saksham/Documents/lancedb/env/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdmDownloading (…)ip_pytorch_model.bin: 100%|██████████| 605M/605M [00:41<00:00, 14.6MB/s]
!pipinstallopen_clip_torch
Collecting open_clip_torch Downloading open_clip_torch-2.20.0-py3-none-any.whl (1.5 MB) |████████████████████████████████| 1.5 MB 771 kB/s eta 0:00:01Requirement already satisfied: regex in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from open_clip_torch) (2023.10.3)Requirement already satisfied: tqdm in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from open_clip_torch) (4.66.1)Collecting torchvision Downloading torchvision-0.16.0-cp38-cp38-manylinux1_x86_64.whl (6.9 MB) |████████████████████████████████| 6.9 MB 21.0 MB/s eta 0:00:01Collecting huggingface-hub Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB) |████████████████████████████████| 295 kB 43.1 MB/s eta 0:00:01Collecting protobuf<4 Using cached protobuf-3.20.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB)Collecting timm Downloading timm-0.9.7-py3-none-any.whl (2.2 MB) |████████████████████████████████| 2.2 MB 28.3 MB/s eta 0:00:01Collecting sentencepiece Downloading sentencepiece-0.1.99-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB) |████████████████████████████████| 1.3 MB 39.9 MB/s eta 0:00:01Collecting torch>=1.9.0 Downloading torch-2.1.0-cp38-cp38-manylinux1_x86_64.whl (670.2 MB) |████████████████████████████████| 670.2 MB 47 kB/s s eta 0:00:01Collecting ftfy Downloading ftfy-6.1.1-py3-none-any.whl (53 kB) |████████████████████████████████| 53 kB 2.3 MB/s eta 0:00:01Collecting pillow!=8.3.*,>=5.3.0 Using cached Pillow-10.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.5 MB)Requirement already satisfied: requests in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from torchvision->open_clip_torch) (2.31.0)Requirement already satisfied: numpy in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from torchvision->open_clip_torch) (1.24.4)Requirement already satisfied: packaging>=20.9 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from huggingface-hub->open_clip_torch) (23.2)Collecting fsspec Downloading fsspec-2023.9.2-py3-none-any.whl (173 kB) |████████████████████████████████| 173 kB 22.0 MB/s eta 0:00:01Collecting filelock Using cached filelock-3.12.4-py3-none-any.whl (11 kB)Requirement already satisfied: pyyaml>=5.1 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from huggingface-hub->open_clip_torch) (6.0.1)Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from huggingface-hub->open_clip_torch) (4.8.0)Collecting safetensors Downloading safetensors-0.3.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB) |████████████████████████████████| 1.3 MB 22.8 MB/s eta 0:00:01Collecting networkx Downloading networkx-3.1-py3-none-any.whl (2.1 MB) |████████████████████████████████| 2.1 MB 16.6 MB/s eta 0:00:01Collecting triton==2.1.0; platform_system == "Linux" and platform_machine == "x86_64" Downloading triton-2.1.0-0-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89.2 MB) |████████████████████████████████| 89.2 MB 31.6 MB/s eta 0:00:01Collecting nvidia-curand-cu12==10.3.2.106; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB) |████████████████████████████████| 56.5 MB 15.9 MB/s eta 0:00:01Collecting nvidia-nvtx-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB) |████████████████████████████████| 99 kB 9.4 MB/s eta 0:00:01Collecting sympy Downloading sympy-1.12-py3-none-any.whl (5.7 MB) |████████████████████████████████| 5.7 MB 16.4 MB/s eta 0:00:01Collecting nvidia-cusparse-cu12==12.1.0.106; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB) |████████████████████████████████| 196.0 MB 78 kB/s eta 0:00:011Collecting nvidia-cuda-nvrtc-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB) |████████████████████████████████| 23.7 MB 619 kB/s eta 0:00:011Collecting nvidia-cufft-cu12==11.0.2.54; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB) |████████████████████████████████| 121.6 MB 93 kB/s s eta 0:00:01Collecting nvidia-cuda-cupti-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB) |████████████████████████████████| 14.1 MB 19.5 MB/s eta 0:00:01Requirement already satisfied: jinja2 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from torch>=1.9.0->open_clip_torch) (3.1.2)Collecting nvidia-nccl-cu12==2.18.1; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_nccl_cu12-2.18.1-py3-none-manylinux1_x86_64.whl (209.8 MB) |████████████████████████████████| 209.8 MB 5.2 kB/s eta 0:00:01 |███████████████████████████████▊| 208.2 MB 17.0 MB/s eta 0:00:01Collecting nvidia-cudnn-cu12==8.9.2.26; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB) |████████████████████████████████| 731.7 MB 22 kB/s eta 0:00:011Collecting nvidia-cublas-cu12==12.1.3.1; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB) |████████████████████████████████| 410.6 MB 9.2 kB/s eta 0:00:012Collecting nvidia-cuda-runtime-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB) |████████████████████████████████| 823 kB 18.5 MB/s eta 0:00:01Collecting nvidia-cusolver-cu12==11.4.5.107; platform_system == "Linux" and platform_machine == "x86_64" Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB) |████████████████████████████████| 124.2 MB 43 kB/s s eta 0:00:01ta 0:00:02Requirement already satisfied: wcwidth>=0.2.5 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from ftfy->open_clip_torch) (0.2.8)Requirement already satisfied: certifi>=2017.4.17 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (2023.7.22)Requirement already satisfied: urllib3<3,>=1.21.1 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (2.0.6)Requirement already satisfied: idna<4,>=2.5 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (3.4)Requirement already satisfied: charset-normalizer<4,>=2 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (3.3.0)Collecting mpmath>=0.19 Downloading mpmath-1.3.0-py3-none-any.whl (536 kB) |████████████████████████████████| 536 kB 14.2 MB/s eta 0:00:01Collecting nvidia-nvjitlink-cu12 Downloading nvidia_nvjitlink_cu12-12.2.140-py3-none-manylinux1_x86_64.whl (20.2 MB) |████████████████████████████████| 20.2 MB 14.3 MB/s eta 0:00:01Requirement already satisfied: MarkupSafe>=2.0 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from jinja2->torch>=1.9.0->open_clip_torch) (2.1.3)Installing collected packages: pillow, networkx, filelock, triton, nvidia-curand-cu12, nvidia-nvtx-cu12, mpmath, sympy, nvidia-nvjitlink-cu12, nvidia-cusparse-cu12, fsspec, nvidia-cuda-nvrtc-cu12, nvidia-cufft-cu12, nvidia-cuda-cupti-cu12, nvidia-nccl-cu12, nvidia-cublas-cu12, nvidia-cudnn-cu12, nvidia-cuda-runtime-cu12, nvidia-cusolver-cu12, torch, torchvision, huggingface-hub, protobuf, safetensors, timm, sentencepiece, ftfy, open-clip-torchSuccessfully installed filelock-3.12.4 fsspec-2023.9.2 ftfy-6.1.1 huggingface-hub-0.17.3 mpmath-1.3.0 networkx-3.1 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.18.1 nvidia-nvjitlink-cu12-12.2.140 nvidia-nvtx-cu12-12.1.105 open-clip-torch-2.20.0 pillow-10.0.1 protobuf-3.20.3 safetensors-0.3.3 sentencepiece-0.1.99 sympy-1.12 timm-0.9.7 torch-2.1.0 torchvision-0.16.0 triton-2.1.0
clip
OpenClipEmbeddings(name='ViT-B-32', pretrained='laion2b_s34b_b79k', device='cpu', batch_size=64, normalize=True)
The data model¶
We'll declare a new model that subclasses LanceModel (special pydantic model) to represent the table.This table has two columns, one for the image_uri and one for the vector generated from those images.The embedding function defines the number of dimensions in its vectors so you don't need tolook it up.
We use theVectorField
method from the embedding function to annotate the modelso that LanceDB knows to use the open-clip embedding function to generate query embeddings thatcorrespond to thevector
column.
We also use theSourceField
so that when adding data, LanceDB knows to automatically useopen-clip to encode the input images.
Finally, because we're working with images, we add a convenience propertyimage
to open the image andreturn a PIL Image so it can be visualized in Jupyter Notebook
fromPILimportImagefromlancedb.pydanticimportLanceModel,VectorclassPets(LanceModel):vector:Vector(clip.ndims())=clip.VectorField()image_uri:str=clip.SourceField()@propertydefimage(self):returnImage.open(self.image_uri)
Create the table¶
First we connect to a local lancedb directory
db=lancedb.connect("~/.lancedb")
Next we get all of the paths for the images we downloaded and create a table.Notice that we didn't have to worry about generating the image embeddings ourselves.
importpandasaspdfrompathlibimportPathfromrandomimportsampleif"pets"indb:table=db["pets"]else:table=db.create_table("pets",schema=Pets)# use a sampling of 1000 imagesp=Path("~/Downloads/images").expanduser()uris=[str(f)forfinp.glob("*.jpg")]uris=sample(uris,1000)table.add(pd.DataFrame({"image_uri":uris}))
table.head().to_pandas()
vector | image_uri | |
---|---|---|
0 | [0.018789755, 0.11621179, -0.09760579, -0.0268... | /Users/changshe/Downloads/images/leonberger_14... |
1 | [0.021960497, 0.06073219, -0.1625527, 0.021481... | /Users/changshe/Downloads/images/havanese_63.jpg |
2 | [0.0074375155, 0.084355146, -0.027461205, -0.0... | /Users/changshe/Downloads/images/english_cocke... |
3 | [-0.01220356, 0.020815236, -0.08587208, -0.027... | /Users/changshe/Downloads/images/shiba_inu_143... |
4 | [-0.010112503, 0.14021927, -0.14588796, -0.046... | /Users/changshe/Downloads/images/saint_bernard... |
Querying via text¶
We also don't need to generate the embeddings when querying either.LanceDB does that automatically so you can query directly using text input.
The pydantic model we declared for the table schema also makes it really easy for us to work with the search results
rs=table.search("dog").limit(3).to_pydantic(Pets)rs[0].image
Querying via images¶
The great thing about CLIP is that it's multi-modal.So you can search using not just text but images as well.
Create a query image using PIL
fromPILimportImagep=Path("~/Downloads/images/samoyed_100.jpg").expanduser()query_image=Image.open(p)query_image
Pass in the query_image to the search API
rs=table.search(query_image).limit(3).to_pydantic(Pets)rs[2].image
Persistence¶
Embedding functions are persisted as table metadata so it's much easier to use across sessions.
For example we can recreate the database connection and table object
db=lancedb.connect("~/.lancedb")table=db["pets"]
We can observe that it's read out as table metadata
importjsonjson.loads(table.schema.metadata[b"embedding_functions"])[0]
{'name': 'open-clip', 'model': {'name': 'ViT-B-32', 'pretrained': 'laion2b_s34b_b79k', 'device': 'cpu', 'batch_size': 64, 'normalize': True}, 'source_column': 'image_uri', 'vector_column': 'vector'}
And we can also run queries as before without having to reinstantiate the embedding function explicitly
rs=table.search("big dog").limit(3).to_pydantic(Pets)rs[0].image
LanceDB makes multimodal AI easy¶
- LanceDB's new embedding functions feature makes it easy for builders of LLM apps
- You no longer need to manually encode the data yourself
- You no longer need to figure out how many dimensions is your vector
- You no longer need to manually encode the query
- And with the right embedding model, you can search way more than just text