image-captioning

Star

Here are 878 public repositories matching this topic...

Language:All

Filter by language

All878 Jupyter Notebook409 Python373 HTML17 JavaScript12 Java4 Lua4 OpenEdge ABL3 C++2 C1 C#1

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

salesforce /LAVIS

Star10.4k

LAVIS - A One-stop Library for Language-Vision Intelligence

deep-learning salesforce image-captioning deep-learning-library vision-framework vision-and-language multimodal-deep-learning multimodal-datasets vision-language-transformer vision-language-pretraining visual-question-anwsering

UpdatedNov 18, 2024
Jupyter Notebook

salesforce /BLIP

Star5.1k

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

image-captioning visual-reasoning visual-question-answering vision-language vision-language-transformer image-text-retrieval vision-and-language-pre-training

UpdatedAug 5, 2024
Jupyter Notebook

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

sam click vqa image-captioning llama gpt gradio husky multimodal video-generation vicuna gpt-4 llm chatgpt langchain foundation-model segment-anything internimage imagebind draggan

UpdatedAug 20, 2024
Python

sgrvinod /a-PyTorch-Tutorial-to-Image-Captioning

Star2.8k

Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning

computer-vision pytorch image-captioning show-attend-and-tell attention-mechanism encoder-decoder pytorch-tutorial mscoco

UpdatedJul 28, 2022
Python

OFA-Sys /OFA

Star2.5k

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

prompt chinese image-captioning pretrained-models visual-question-answering multimodal text-to-image-synthesis vision-language pretraining referring-expression-comprehension prompt-tuning

UpdatedApr 24, 2024
Python

ttengwang /Caption-Anything

Star1.7k

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences.https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything

image-captioning controllable-image-captioning controllable-generation chatgpt segment-anything

UpdatedAug 29, 2023
Python

peteanderson80 /bottom-up-attention

Star1.4k

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

caffe vqa faster-rcnn image-captioning captioning-images mscoco mscoco-dataset visual-question-answering

UpdatedFeb 3, 2023
Jupyter Notebook

imaginary-cloud /CameraManager

Star1.4k

Simple Swift class to provide all the configurations you need to create custom camera view in your app

swift ios camera cocoapods carthage swift-package-manager video-recording custom-camera image-captioning qrcode-reader

UpdatedJul 19, 2024
Swift

NVlabs /prismer

Star1.3k

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

vqa image-captioning language-model multi-task-learning vision-and-language multi-modal-learning vision-language-model

UpdatedJan 17, 2024
Python

microsoft /Oscar

Star1k

Oscar and VinVL

vqa image-captioning oscar vision-and-language pre-training image-text-search vinvl

UpdatedAug 28, 2023
Python

ruotianluo /self-critical.pytorch

Star1k

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

image-captioning

UpdatedOct 5, 2023
Python

YehLi /xmodaler

Star970

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

image-captioning video-captioning visual-question-answering vision-and-language cross-modal-retrieval pretraining tden

UpdatedFeb 27, 2023
Python

jhc13 /taggui

Star936

Tag manager and captioner for image datasets

image-captioning image-tagging tag-manager pyside6 stable-diffusion llava cogvlm florence-2

UpdatedFeb 22, 2025
Python

yunjey /show-attend-and-tell

Star906

TensorFlow Implementation of "Show, Attend and Tell"

tensorflow image-captioning show-attend-and-tell attention-mechanism mscoco-image-dataset

UpdatedJul 28, 2018
Jupyter Notebook

SkalskiP /awesome-foundation-and-multimodal-models

Sponsor

Star606

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

nlp computer-vision image-captioning clip blip multimodal zero-shot-detection foundational-models llava segment-anything open-vocabulary-detection open-vocabulary-segmentation grounding-dino

UpdatedFeb 29, 2024
Python

kdexd /virtex

Star561

[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations

model-zoo image-captioning pretrained-models coco-dataset cvpr2021

UpdatedJan 1, 2024
Python

kuanghuei /SCAN

Star558

PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)

computer-vision deep-learning neural-network pytorch image-captioning cross-modal visual-semantic

UpdatedMay 18, 2023
Python

aimagelab /meshed-memory-transformer

Star531

Meshed-Memory Transformer for Image Captioning. CVPR 2020

pytorch transformer image-captioning captioning-images visual-semantic caption-generation cvpr2020

UpdatedDec 21, 2022
Python

subho406 /OmniNet

Star512

Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain

nlp machine-learning deep-learning neural-network artificial-intelligence transformer image-captioning video-recognition multimodal-learning multitask-learning

UpdatedOct 31, 2020
Python

gokayfem /ComfyUI_VLM_nodes

Star481

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

image-captioning nodes vlm custom-nodes img2text llm mllm llava comfyui siglip phi15 joytag img2sfx

UpdatedFeb 13, 2025
Python

Improve this page

Add a description, image, and links to theimage-captioning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with theimage-captioning topic, visit your repo's landing page and select "manage topics."

Learn more

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

image-captioning

Here are 878 public repositories matching this topic...

salesforce /LAVIS

salesforce /BLIP

OpenGVLab /InternGPT

sgrvinod /a-PyTorch-Tutorial-to-Image-Captioning

OFA-Sys /OFA

ttengwang /Caption-Anything

peteanderson80 /bottom-up-attention

imaginary-cloud /CameraManager

NVlabs /prismer

microsoft /Oscar

ruotianluo /self-critical.pytorch

YehLi /xmodaler

jhc13 /taggui

yunjey /show-attend-and-tell

SkalskiP /awesome-foundation-and-multimodal-models

kdexd /virtex

kuanghuei /SCAN

aimagelab /meshed-memory-transformer

subho406 /OmniNet

gokayfem /ComfyUI_VLM_nodes

Improve this page

Add this topic to your repo