Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

grpo

Here are 192 public repositories matching this topic...

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...) (AAAI 2025).

  • UpdatedFeb 20, 2026
  • Python
ART

Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!

  • UpdatedFeb 19, 2026
  • Python

Solve Visual Understanding with Reinforced VLMs

  • UpdatedOct 21, 2025
  • Python

Comprehensive open-source library of AI research and engineering skills for any AI model. Package the skills and your claude code/codex/gemini agent will be an AI research agent with full horsepower. Maintained by Orchestra Research.

  • UpdatedFeb 19, 2026
  • TeX

Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.

  • UpdatedDec 15, 2025
  • Python

https://adongwanai.github.io/AgentGuide | AI Agent开发指南 | LangGraph实战 | 高级RAG | 转行大模型 | 大模型面试 | 算法工程师 | 面试题库 | 强化学习|数据合成

  • UpdatedFeb 12, 2026
  • HTML

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"

  • UpdatedFeb 11, 2026
  • Python

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

  • UpdatedFeb 4, 2026
  • Python
judgeval

The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.

  • UpdatedFeb 20, 2026
  • Python

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

  • UpdatedJan 29, 2026
  • Python

Collect every awesome work about r1!

  • UpdatedMay 2, 2025
  • Python

[NeurIPS 2025] AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

  • UpdatedFeb 3, 2026
  • Python

Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

  • UpdatedFeb 17, 2026
  • Python

Agentic RAG R1 Framework via Reinforcement Learning

  • UpdatedFeb 16, 2026
  • Python

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | NeurIPS '25

  • UpdatedDec 22, 2025
  • Jupyter Notebook

A curated list of papers on reinforcement learning for video generation

  • UpdatedFeb 19, 2026

OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.

  • UpdatedJun 1, 2025
  • Jupyter Notebook

Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

  • UpdatedMar 26, 2025
  • Python

[AAAI 2026] GUI-G²: Gaussian Reward Modeling for GUI Grounding

  • UpdatedFeb 2, 2026
  • Python

Improve this page

Add a description, image, and links to thegrpo topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thegrpo topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2026 Movatter.jp