grpo
Here are 192 public repositories matching this topic...
Language:All
Sort:Most stars
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...) (AAAI 2025).
- Updated
Feb 20, 2026 - Python
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!
- Updated
Feb 19, 2026 - Python
Solve Visual Understanding with Reinforced VLMs
- Updated
Oct 21, 2025 - Python
Comprehensive open-source library of AI research and engineering skills for any AI model. Package the skills and your claude code/codex/gemini agent will be an AI research agent with full horsepower. Maintained by Orchestra Research.
- Updated
Feb 19, 2026 - TeX
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.
- Updated
Dec 15, 2025 - Python
https://adongwanai.github.io/AgentGuide | AI Agent开发指南 | LangGraph实战 | 高级RAG | 转行大模型 | 大模型面试 | 算法工程师 | 面试题库 | 强化学习|数据合成
- Updated
Feb 12, 2026 - HTML
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"
- Updated
Feb 11, 2026 - Python
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
- Updated
Feb 4, 2026 - Python
The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.
- Updated
Feb 20, 2026 - Python
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
- Updated
Jan 29, 2026 - Python
Explore the Multimodal “Aha Moment” on 2B Model
- Updated
Mar 18, 2025 - Python
[NeurIPS 2025] AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
- Updated
Feb 3, 2026 - Python
BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | NeurIPS '25
- Updated
Dec 22, 2025 - Jupyter Notebook
A curated list of papers on reinforcement learning for video generation
- Updated
Feb 19, 2026
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
- Updated
Jun 1, 2025 - Jupyter Notebook
Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
- Updated
Mar 26, 2025 - Python
[AAAI 2026] GUI-G²: Gaussian Reward Modeling for GUI Grounding
- Updated
Feb 2, 2026 - Python
Improve this page
Add a description, image, and links to thegrpo topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thegrpo topic, visit your repo's landing page and select "manage topics."