Movatterモバイル変換

Skip to content

#

grpo

Here are 192 public repositories matching this topic...

Language:All

Filter by language

All192 Python142 Jupyter Notebook26 HTML2 TeX2 C#1 C++1 Rust1 TypeScript1

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

modelscope /ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...) (AAAI 2025).

moe llama lora embedding liger peft multimodal reranker sft megatron llm internvl deepseek-r1 grpo open-r1 qwen3 llama4 qwen3-vl qwen3-next qwen3-omni

UpdatedFeb 20, 2026
Python

ART

OpenPipe /ART

Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!

agent reinforcement-learning rl lora llms qwen agentic-ai grpo qwen3

UpdatedFeb 19, 2026
Python

om-ai-lab /VLM-R1

Solve Visual Understanding with Reinforced VLMs

reinforcement-learning vlm multimodal llm qwen deepseek-r1 grpo r1-zero vlm-r1 multimodal-r1

UpdatedOct 21, 2025
Python

Orchestra-Research /AI-Research-SKILLs

Comprehensive open-source library of AI research and engineering skills for any AI model. Package the skills and your claude code/codex/gemini agent will be an AI research agent with full horsepower. Maintained by Orchestra Research.

ai skills gemini codex claude ai-research machine-leanring megatron huggingface gpt-5 vllm grpo claude-code claude-skills

UpdatedFeb 19, 2026
TeX

SkyworkAI /Skywork-R1V

Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.

reinforcement-learning reasoning vlm llm multimodal-understanding deepseek-r1 grpo vlm-r1 multimodal-r1 r1v skywork-r1v

UpdatedDec 15, 2025
Python

adongwanai /AgentGuide

https://adongwanai.github.io/AgentGuide | AI Agent开发指南 | LangGraph实战 | 高级RAG | 转行大模型 | 大模型面试 | 算法工程师 | 面试题库 | 强化学习｜数据合成

tutorial interview multi-agent job-hunting rag sft ai-agent llm langchain crewai graphrag grpo agenticrag

UpdatedFeb 12, 2026
HTML

langfengQ /verl-agent

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"

reinforcement-learning agent-framework large-language-models llm-training llm-agents deepseek-r1 grpo gigpo

UpdatedFeb 11, 2026
Python

Tencent-Hunyuan /MixGRPO

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

reinforcement-learning diffusion grpo

UpdatedFeb 4, 2026
Python

judgeval

JudgmentLabs /judgeval

The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.

agent open-source reinforcement-learning openai rl agents llm prompt-engineering langchain llama-index llm-evaluation langgraph llm-observability agentic-ai grpo

UpdatedFeb 20, 2026
Python

sail-sg /oat

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

thompson-sampling alignment reasoning distributed-training ppo dueling-bandits dpo distributed-rl llm online-rl rlhf llm-aligment online-alignment llm-exploration grpo r1-zero

UpdatedJan 29, 2026
Python

turningpoint-ai /VisualThinker-R1-Zero

Explore the Multimodal “Aha Moment” on 2B Model

reinforcement-learning reasoning r1 post-training multimodal deepseek deepseek-r1 grpo deepseek-r1-zero r1-zero multimodal-journey multimodal-r1

UpdatedMar 18, 2025
Python

modelscope /awesome-deep-reasoning

Collect every awesome work about r1!

collection rl reasoning r1 o1 qwen deepseek grpo

UpdatedMay 2, 2025
Python

ucla-mobility /AutoVLA

[NeurIPS 2025] AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

autonomous-driving vision-language-action reinforcement-finetuning grpo

UpdatedFeb 3, 2026
Python

NVlabs /GDPO

Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

rl reasoning trl llm agentic-ai grpo verl

UpdatedFeb 17, 2026
Python

jiangxinke /Agentic-RAG-R1

Agentic RAG R1 Framework via Reinforcement Learning

rl rag agentic grpo

UpdatedFeb 16, 2026
Python

bowang-lab /BioReason

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | NeurIPS '25

bioinformatics computational-biology dna reasoning foundation-models large-language-models grpo

UpdatedDec 22, 2025
Jupyter Notebook

wendell0218 /Awesome-RL-for-Video-Generation

A curated list of papers on reinforcement learning for video generation

reinforcement-learning ppo video-generation dpo reward-model grpo

UpdatedFeb 19, 2026

zhaochen0110 /OpenThinkIMG

OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.

reinforcement-learning lvlm grpo vision-tool

UpdatedJun 1, 2025
Jupyter Notebook

hustvl /AlphaDrive

Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

reinforcement-learning planning autonomous-driving reasoning vision-language-model grpo

UpdatedMar 26, 2025
Python

ZJU-REAL /GUI-G2

[AAAI 2026] GUI-G²: Gaussian Reward Modeling for GUI Grounding

guiagent grpo gaussianreward

UpdatedFeb 2, 2026
Python

Improve this page

Add a description, image, and links to thegrpo topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thegrpo topic, visit your repo's landing page and select "manage topics."

[8]ページ先頭

©2009-2026 Movatter.jp