qwen2-vl
Here are 34 public repositories matching this topic...
Language:All
Sort:Most stars
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, GLM4, Mistral, Yi1.5, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, DeepSeek-VL2, Phi4, GOT-OCR2, ...).
- Updated
Apr 28, 2025 - Python
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
- Updated
Apr 28, 2025 - Python
An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.
- Updated
Apr 28, 2025 - Python
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
- Updated
Apr 28, 2025 - Python
Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.
- Updated
Apr 18, 2025 - Python
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
- Updated
Feb 22, 2025 - Python
A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha,meta Llama 3.2 Vision Instruct and Qwen2 VL Instruct models.
- Updated
Mar 18, 2025 - Python
基于多模态大模型的智能搜索助手,通过AI技术实现小红书平台的智能化信息检索和知识整合|An intelligent search assistant based on multimodal large models, enabling smart information retrieval and knowledge integration on the Xiaohongshu platform.
- Updated
Nov 6, 2024 - Python
This project demonstrates how to use the Qwen2-VL model from Hugging Face for Optical Character Recognition (OCR) and Visual Question Answering (VQA). The model combines vision and language capabilities, enabling users to analyze images and generate context-based responses.
- Updated
Oct 18, 2024 - Jupyter Notebook
Qwen2-VL在文旅领域的LLaMA-Factory微调案例 The case for fine-tuning Qwen2-VL in the field of historical literature and museums
- Updated
Sep 17, 2024
An open-source server implementation for inference Qwen2-VL series model using fastapi.
- Updated
Nov 20, 2024 - Python
Colaboratory上でQwenLM/Qwen2-VLをお試しするサンプル
- Updated
Sep 4, 2024 - Jupyter Notebook
- Updated
Dec 27, 2024 - Python
Community-built Qwen AI Provider for Vercel AI SDK - Integrate Alibaba Cloud's Qwen models with Vercel's AI application framework
- Updated
Mar 10, 2025 - TypeScript
A workshop for collections of multi-modal LLM examples, samples, reference architecture and demos on Amazon SageMaker.
- Updated
Mar 16, 2025 - Jupyter Notebook
Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
- Updated
Apr 24, 2025 - Python
This open-source project delivers a complete pipeline for converting multi-page documents (PDFs/images) into structured JSON using Vision LLMs on Amazon SageMaker. The solution leverages the SWIFT Framework to fine-tune models specifically for document understanding tasks.
- Updated
Mar 22, 2025 - Jupyter Notebook
Improve this page
Add a description, image, and links to theqwen2-vl topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with theqwen2-vl topic, visit your repo's landing page and select "manage topics."