Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
NotificationsYou must be signed in to change notification settings

sail-sg/FlowReasoner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-based meta-agent via external execution feedback. Concretely, by distilling DeepSeek R1, we first endow the basic reasoning ability regarding the generation of multi-agent systems to FlowReasoner. Then, we further enhance it via reinforcement learning (RL) with external execution feedback.A multi-purpose reward is designed to guide the RL training from aspects of performance, complexity, and efficiency. In this manner, FlowReasoner is enabled to generate a personalized multi-agent system for each user query via deliberative reasoning. Experiments on both engineering and competition code benchmarks demonstrate the superiority of FlowReasoner. Remarkably, it surpasses o1-mini by$\mathbf{10.52}$% accuracy across three benchmarks.

Installation

We follow theMetaGPT to install the required dependencies, please run the following commands:

git clone https://github.com/sail-sg/FlowReasonercd codepip install --upgrade -e.

All experiments are conducted on NVIDIA A100 GPUs with 80GB of memory.

Configure optimization parameters

Configure LLM parameters in config/config2.yaml (see examples/FlowReasoner/config2.example.yaml for reference)

models:"<model_name>":# model: "gpt-4-turbo"  # or gpt-3.5-turbo   api_type:"openai"# or azure / ollama / groq etc.   base_url:"<your base url>"    api_key:"<your api key>"   temperature: 0"<model_name>":     api_type:"openai"     base_url:"<your base url>"   api_key:"<your api key>"   temperature: 0CALC_USAGE: True

Run the inference

Using default parameters

python -m examples.FlowReasoner.optimize --dataset MATH

Or with custom parameters

python -m examples.FlowReasoner.optimize --dataset MATH --sample n --optimized_path xxx ...

Note that the test cases of each dataset should be split to two part with keyval and keytest seperately. Theval test cases are used for external execution feedback for optimaze workflow.

Training Stage

The SFT dataset is generated by the inference stage. The SFT is conducted by the standard training process usingLLaMA-Factory while the RL is based onEasyRL.

Acknowledgments

This repository is based on the codebase of theMetaGPT,LLaMA-Factory, andEasyRL. Thanks for their impressive work!

Citation

If you find our work helpful, please cite as

@misc{gao2025flowreasonerreinforcingquerylevelmetaagents,      title={FlowReasoner: Reinforcing Query-Level Meta-Agents},       author={Hongcheng Gao and Yue Liu and Yufei He and Longxu Dou and Chao Du and Zhijie Deng and Bryan Hooi and Min Lin and Tianyu Pang},      year={2025},      eprint={2504.15257},      archivePrefix={arXiv},      primaryClass={cs.AI},      url={https://arxiv.org/abs/2504.15257}, }

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp