qwen2-vl

Star

Here are 34 public repositories matching this topic...

Language:All

Filter by language

All34 Python17 Jupyter Notebook11 TypeScript2 C1 JavaScript1 Rust1

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

modelscope /ms-swift

Star7.2k

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, GLM4, Mistral, Yi1.5, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, DeepSeek-VL2, Phi4, GOT-OCR2, ...).

deploy llama lora embedding omni liger peft multimodal sft megatron rft llm internvl qwen2-vl deepseek-r1 grpo open-r1 qwen3 llama4 qwen3-moe

UpdatedApr 28, 2025
Python

roboflow /maestro

Star2.6k

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

transformers vqa objectdetection captioning fine-tuning multimodal vision-and-language phi-3-vision paligemma florence-2 qwen2-vl

UpdatedApr 28, 2025
Python

2U1 /Qwen2-VL-Finetune

Star677

An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.

chatbot multimodal vision-language vision-language-model qwen2-vl qwen2-5

UpdatedApr 28, 2025
Python

PaddlePaddle /PaddleMIX

Star625

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

image-to-text clip text-to-image dit multimodal sora text-to-video aigc stable-diffusion controlnet llava sd-xl ppdiffusers eva-clip stablevideodiffusion minicpm-v internvl2 qwen2-vl got-ocr20 deepseek-vl

UpdatedApr 28, 2025
Python

NetEase-Media /grps_trtllm

Star130

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.

openai multi-modal phi function-call qwq ai-agent llm llama-index chatglm internvideo tensorrt-llm qwen2 llama3 minicpm-v internvl qwen2-vl deepseek-r1 janus-pro olmocr

UpdatedApr 18, 2025
Python

lucasjinreal /Crane

Star101

A Pure Rust based LLM (Any LLM based MLLM such as Spark-TTS) Inference Engine, powering by Candle framework.

rust mllm llama-cpp qwen2-vl spark-tts qwen3

UpdatedMar 26, 2025
Rust

drive-bench /toolkit

Star72

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

autonomous-driving chatgpt vision-language-models phi-3 internvl qwen2-vl driving-with-language

UpdatedFeb 22, 2025
Python

arcstep /illufly

Star60

✨🦋 illufly - 【幻蝶】基于记忆蒸馏、资料检索的自我进化智能体

agent ai growth openai multiagent gpt rag llm longtext qwen qwen2 dashscope glm-4 zhipu qwen2-vl illufly

UpdatedApr 28, 2025
Python

soulteary /dify-with-qwen-vl

Star52

视频理解：千问视频多模态模型 & Dify

dify qwen2 qwen2-vl

UpdatedSep 2, 2024
Python

fireicewolf /wd-llm-caption-cli

Star35

A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha,meta Llama 3.2 Vision Instruct and Qwen2 VL Instruct models.

image-caption wd14 llama3-vision florence-2 qwen2-vl joy-caption

UpdatedMar 18, 2025
Python

see2023 /autoXHS

Star17

基于多模态大模型的智能搜索助手，通过AI技术实现小红书平台的智能化信息检索和知识整合|An intelligent search assistant based on multimodal large models, enabling smart information retrieval and knowledge integration on the Xiaohongshu platform.

spider selenium-webdriver xiaohongshu llm qwen2-vl

UpdatedNov 6, 2024
Python

shaadclt /Qwen2-VL-OCR-VQA

Sponsor

Star12

This project demonstrates how to use the Qwen2-VL model from Hugging Face for Optical Character Recognition (OCR) and Visual Question Answering (VQA). The model combines vision and language capabilities, enabling users to analyze images and generate context-based responses.

optical-character-recognition visual-question-answering qwen2-vl

UpdatedOct 18, 2024
Jupyter Notebook

BUAADreamer /Qwen2-VL-History

Star10

Qwen2-VL在文旅领域的LLaMA-Factory微调案例 The case for fine-tuning Qwen2-VL in the field of historical literature and museums

beauty museum history supervised-finetuning mllm multimodal-large-language-models llama-factory qwen2-vl

UpdatedSep 17, 2024

ZachcZhang /Qwen2-VL-inference

Star9

An open-source server implementation for inference Qwen2-VL series model using fastapi.

inference fastapi huggingface mllm qwen2-vl

UpdatedNov 20, 2024
Python

Kazuhito00 /Qwen2-VL-Colaboratory-Sample

Star7

Colaboratory上でQwenLM/Qwen2-VLをお試しするサンプル

python vlm colaboratory qwen2-vl

UpdatedSep 4, 2024
Jupyter Notebook

Valdanitooooo /chat_with_qwen2_vl_test

Star7

qwen2-vl

UpdatedDec 27, 2024
Python

Younis-Ahmed /qwen-ai-provider

Star6

Community-built Qwen AI Provider for Vercel AI SDK - Integrate Alibaba Cloud's Qwen models with Vercel's AI application framework

ai artificial-intelligence language-model alibaba-cloud vercel generative-ai vercel-ai vercel-ai-sdk qwen qwen-api llm-integration qwen2-vl qwen2-5 ai-provider

UpdatedMar 10, 2025
TypeScript

aws-samples /multi-modal-examples-for-amazon-sagemaker

Star4

A workshop for collections of multi-modal LLM examples, samples, reference architecture and demos on Amazon SageMaker.

sagemaker multi-modality sagemaker-example sagemaker-studio llm vllm video-llava internvl2 qwen2-vl

UpdatedMar 16, 2025
Jupyter Notebook

zhangguanghao523 /CMMCoT

Star4

Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

mcot cot chain-of-thought mllm multimodel-large-language-model qwen2-vl

UpdatedApr 24, 2025
Python

aws-samples /sample-for-multi-modal-document-to-json-with-sagemaker-ai

Star4

This open-source project delivers a complete pipeline for converting multi-page documents (PDFs/images) into structured JSON using Vision LLMs on Amazon SageMaker. The solution leverages the SWIFT Framework to fine-tune models specifically for document understanding tasks.

swift aws llama idp document-processing fine-tuning multimodal sagemaker sft huggingface qwen2-vl

UpdatedMar 22, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to theqwen2-vl topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with theqwen2-vl topic, visit your repo's landing page and select "manage topics."

Learn more

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen2-vl

Here are 34 public repositories matching this topic...

modelscope /ms-swift

roboflow /maestro

2U1 /Qwen2-VL-Finetune

PaddlePaddle /PaddleMIX

NetEase-Media /grps_trtllm

lucasjinreal /Crane

drive-bench /toolkit

arcstep /illufly

soulteary /dify-with-qwen-vl

fireicewolf /wd-llm-caption-cli

see2023 /autoXHS

shaadclt /Qwen2-VL-OCR-VQA

BUAADreamer /Qwen2-VL-History

ZachcZhang /Qwen2-VL-inference

Kazuhito00 /Qwen2-VL-Colaboratory-Sample

Valdanitooooo /chat_with_qwen2_vl_test

Younis-Ahmed /qwen-ai-provider

aws-samples /multi-modal-examples-for-amazon-sagemaker

zhangguanghao523 /CMMCoT

aws-samples /sample-for-multi-modal-document-to-json-with-sagemaker-ai

Improve this page

Add this topic to your repo