Added a new dataframe operatorload_images(img_path_col, encoded_img_col) (in utils.py). The first parameter refers to a column of file paths to images (or URLs), and the second parameter is the name of the new column to be appended. This iterates through the dataframe, loads each image from the path, and encodes each image into a base64 string (to cut down on the conversions needed later for any vision LLM call).
- Example usage:df.load_images("image_path", "image") -> then the 'image' column can be used as normal for getting the images in any other LOTUS operation
Modified task_instructions.py to automatically extract any image data and correctly append in the messages array
Modified sem_topk.py to also correctly extract image data and append it to the messages array
Added support for the CLIP family of embedding/retrieval models (in clip_model.py)
- Included batching support to help reduce memory usage when indexing large datasets
- Also included a custom method of creating combined text+image embeddings, with user configurable weights (by default an even split)
  - Example usage:rm = CLIPModelRetriever(similarity_weights=[0.4, 0.4, 0.1, 0.1]) # [text-text, image-image, text-image, image-text]
Added chunking to reduce memory_usage for the sem_search operation. The chunk_size is user configurable:sem_search(chunk_size=1000)

I did most of my testing on larger datasets, but I created a very simple jupyter notebook (examples/multimodal_tests.ipynb) that demonstrates CLIP working with a dataframe of images, as well as sem_topk, sem_filter, sem_map, and sem_search. In my own testing at least sem_sim_join, and sem_agg work too.

jmelovich added13 commits

October 6, 2024 17:56

CLIP integration, and vision llm support partial working

f8cea97

CLIP integration, and vision llm support partial working

Update README.md

f4179bc

task instructions update

f380842

fixed image extraction for multicolumns

9585bbc

included demo notebook

59d02f3

added animal images dataset

587e75d

removed vestigial test files

edc1551

added batching support for topk, and disable token counting for topk …

ffe077e

…for now

Implemented method for combined text+image embeddings with CLIP

975ab64

- after 3 failed backoffs for making openai llm request, it skips- implemented basic method for created combined text+image embeddings with CLIP

Merge remote-tracking branch 'upstream/main' into mm-integration

5faa9d4

Fixed merging errors, vision support should be good

30ec7a7

removed accidental testing image data from repo

added support for image urls in load_images

7b31819

included very simple demo notebook for clip and image operators

e5cd100

Copy link

Collaborator

liana313 commentedNov 22, 2024•
edited
Loading

Thanks for the awesome work on this@jmelovich! It looks like there is overlap with PR#33, which plan to support images using the pandas types extension, which will be slightly more extensible as we add support more types as well. We've started a review of PR#33 and plan to merge it soon -- can you compare and merge with it instead of main? Also we'd be happy to coordinate ongoing dev efforts with you! Feel free to join our slack here so we can coordinate offline as wellhttps://join.slack.com/t/lotus-fnm8919/shared_invite/zt-2tnq6948j-juGuSIR0__fsh~kUmZ6TJw

Copy link

Author

jmelovich commentedNov 22, 2024

Thanks for the awesome work on this@jmelovich! It looks like there is overlap with PR#37, which plan to support images using the pandas types extension, which will be slightly more extensible as we add support more types as well. We've started a review of PR#37 and plan to merge it soon -- can you compare and merge with it instead of main? Also we'd be happy to coordinate ongoing dev efforts with you! Feel free to join our slack here so we can coordinate offline as wellhttps://join.slack.com/t/lotus-fnm8919/shared_invite/zt-2tnq6948j-juGuSIR0__fsh~kUmZ6TJw

Yes, I should be able to relatively simply rework it to use the pandas types extension. Are you sure you meant to mention PR#37 ? That is this current one, I assume you meant to refer to PR#33 ?

Copy link

Collaborator

liana313 commentedNov 22, 2024

Yes, sorry, I meant PR#33 (edited), which adds support for images and has an implementation for each operator. One other main difference is that you added a new class for CLIP, although I believe we can support it using SentenceTransformers (example herehttps://www.sbert.net/examples/applications/image-search/README.html)

Copy link

Author

jmelovich commentedNov 22, 2024

Yes, sorry, I meant PR#33 (edited), which adds support for images and has an implementation for each operator. One other main difference is that you added a new class for CLIP, although I believe we can support it using SentenceTransformers (example herehttps://www.sbert.net/examples/applications/image-search/README.html)

Ok interesting, I was not familiar with SentenceTransformers so I'll check that out. In addition, one of the most useful things I added in my CLIP implementation was the ability to create combined text & image embeddings so that both an image and text can be used to create a single embedding- this has proved very useful on some VQA datasets I've tested, likeInfoseek. If there is a way to implement this CLIP class more simply with SentenceTransformers I will look into it.

Also I want to note that my PR does support each operator I've tested- which is all but sem_dedup, and sem_extract (just not sure what to extract from an image).

Labels

None yet

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Full image support (LLM operators & embeddings w/ CLIP)#37

Are you sure you want to change the base?

Full image support (LLM operators & embeddings w/ CLIP)#37

Uh oh!

Conversation

jmelovich commentedNov 20, 2024

Uh oh!

liana313 commentedNov 22, 2024•
edited
Loading

Uh oh!

Uh oh!

jmelovich commentedNov 22, 2024

Uh oh!

liana313 commentedNov 22, 2024

Uh oh!

jmelovich commentedNov 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Movatterモバイル変換

Full image support (LLM operators & embeddings w/ CLIP)#37

Are you sure you want to change the base?

Full image support (LLM operators & embeddings w/ CLIP)#37

Uh oh!

Conversation

jmelovich commentedNov 20, 2024

Uh oh!

liana313 commentedNov 22, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

jmelovich commentedNov 22, 2024

Uh oh!

liana313 commentedNov 22, 2024

Uh oh!

jmelovich commentedNov 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

liana313 commentedNov 22, 2024•
edited
Loading