cogvlm

Tiny-scale experiment showing that CLIP models trained using detailed captions generated by multimodal models (CogVLM and LLaVA 1.5) outperform models trained using the original alt-texts on a range of classification and retrieval tasks.

clip synthetic-data multimodal vision-language-model llava cogvlm

UpdatedMar 6, 2024
Python

williamcfrancis /vlm-comparison-gemini-cog

Star0

A comparitive study between the two of the best performing open source Vision Language Models - Google Gemini Vision and CogVLM

ai gemini vision vlm vision-and-language vision-language-model cogvlm google-gemini gemini-pro

UpdatedJan 28, 2024
Python

Improve this page

Add a description, image, and links to thecogvlm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thecogvlm topic, visit your repo's landing page and select "manage topics."

Learn more

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cogvlm

Here are 6 public repositories matching this topic...

THUDM /CogVLM2

jhc13 /taggui

gokayfem /awesome-vlm-architectures

ProGamerGov /VLM-Captioning-Tools

nopperl /clip-synthetic-captions

williamcfrancis /vlm-comparison-gemini-cog

Improve this page

Add this topic to your repo