Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Full image support (LLM operators & embeddings w/ CLIP)#37

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
jmelovich wants to merge13 commits intolotus-data:main
base:main
Choose a base branch
Loading
fromjmelovich:mm-integration

Conversation

@jmelovich
Copy link

Changes:

  • Added a new dataframe operatorload_images(img_path_col, encoded_img_col) (in utils.py). The first parameter refers to a column of file paths to images (or URLs), and the second parameter is the name of the new column to be appended. This iterates through the dataframe, loads each image from the path, and encodes each image into a base64 string (to cut down on the conversions needed later for any vision LLM call).
    • Example usage:df.load_images("image_path", "image") -> then the 'image' column can be used as normal for getting the images in any other LOTUS operation
  • Modified task_instructions.py to automatically extract any image data and correctly append in the messages array
  • Modified sem_topk.py to also correctly extract image data and append it to the messages array
  • Added support for the CLIP family of embedding/retrieval models (in clip_model.py)
    • Included batching support to help reduce memory usage when indexing large datasets
    • Also included a custom method of creating combined text+image embeddings, with user configurable weights (by default an even split)
      • Example usage:rm = CLIPModelRetriever(similarity_weights=[0.4, 0.4, 0.1, 0.1]) # [text-text, image-image, text-image, image-text]
  • Added chunking to reduce memory_usage for the sem_search operation. The chunk_size is user configurable:sem_search(chunk_size=1000)

I did most of my testing on larger datasets, but I created a very simple jupyter notebook (examples/multimodal_tests.ipynb) that demonstrates CLIP working with a dataframe of images, as well as sem_topk, sem_filter, sem_map, and sem_search. In my own testing at least sem_sim_join, and sem_agg work too.

CLIP integration, and vision llm support partial working
- after 3 failed backoffs for making openai llm request, it skips- implemented basic method for created combined text+image embeddings with CLIP
removed accidental testing image data from repo
@liana313
Copy link
Collaborator

liana313 commentedNov 22, 2024
edited
Loading

Thanks for the awesome work on this@jmelovich! It looks like there is overlap with PR#33, which plan to support images using the pandas types extension, which will be slightly more extensible as we add support more types as well. We've started a review of PR#33 and plan to merge it soon -- can you compare and merge with it instead of main? Also we'd be happy to coordinate ongoing dev efforts with you! Feel free to join our slack here so we can coordinate offline as wellhttps://join.slack.com/t/lotus-fnm8919/shared_invite/zt-2tnq6948j-juGuSIR0__fsh~kUmZ6TJw

@jmelovich
Copy link
Author

Thanks for the awesome work on this@jmelovich! It looks like there is overlap with PR#37, which plan to support images using the pandas types extension, which will be slightly more extensible as we add support more types as well. We've started a review of PR#37 and plan to merge it soon -- can you compare and merge with it instead of main? Also we'd be happy to coordinate ongoing dev efforts with you! Feel free to join our slack here so we can coordinate offline as wellhttps://join.slack.com/t/lotus-fnm8919/shared_invite/zt-2tnq6948j-juGuSIR0__fsh~kUmZ6TJw

Yes, I should be able to relatively simply rework it to use the pandas types extension. Are you sure you meant to mention PR#37 ? That is this current one, I assume you meant to refer to PR#33 ?

@liana313
Copy link
Collaborator

Yes, sorry, I meant PR#33 (edited), which adds support for images and has an implementation for each operator. One other main difference is that you added a new class for CLIP, although I believe we can support it using SentenceTransformers (example herehttps://www.sbert.net/examples/applications/image-search/README.html)

@jmelovich
Copy link
Author

Yes, sorry, I meant PR#33 (edited), which adds support for images and has an implementation for each operator. One other main difference is that you added a new class for CLIP, although I believe we can support it using SentenceTransformers (example herehttps://www.sbert.net/examples/applications/image-search/README.html)

Ok interesting, I was not familiar with SentenceTransformers so I'll check that out. In addition, one of the most useful things I added in my CLIP implementation was the ability to create combined text & image embeddings so that both an image and text can be used to create a single embedding- this has proved very useful on some VQA datasets I've tested, likeInfoseek. If there is a way to implement this CLIP class more simply with SentenceTransformers I will look into it.

Also I want to note that my PR does support each operator I've tested- which is all but sem_dedup, and sem_extract (just not sure what to extract from an image).

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@jmelovich@liana313

[8]ページ先頭

©2009-2025 Movatter.jp