Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

License

NotificationsYou must be signed in to change notification settings

openreasoner/openr

Repository files navigation


Logo

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Paper ·Tutorial ·Code ·Docs ·Data ·Model ·Issue ·Demo

[English ][中文 ]


GitHub contributorsarXivGitHub LicenseGitHub Issues or Pull RequestsGitHub forksGitHub Repo starsHuggingFace DatasetXWeChat

Table of Contents 📖
  1. News and Updates
  2. Features
  3. TODO
  4. Benchmark
  5. Plots
  6. Datasets and Models
  7. Getting Started
  8. Usage
  9. Join Us
  10. Contact
  11. Response Examples
  12. Community
  13. Reference

News and Updates

  • [29/11/2024] We have now added ademo page onModelScope. Many thanks to@wangxingjun778 !
  • [24/10/2024]OpenR now supportsMCTS reasoning (#24)! 🌲
  • [15/10/2024] Our report is onArxiv!
  • [12/10/2024]OpenR has been released! 🚀

Features

Description

FeatureContents
✅ Process-supervision Data Generation-OmegaPRM: Improve Mathematical Reasoning in Language Models by Automated Process Supervision
✅ Online Policy Training-RL Training: APPO, GRPO, TPPO;
✅ Generative and Discriminative PRM Training-PRM Training: Supervised Training for PRMs
-Generative RM Training:Direct GenRM
✅ Multiple Search Strategies-Greedy Search
-Best-of-N
-Beam Search
-MCTS
-rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
-Critic-MCTS:Under Review
✅ Test-time Computation and Scaling LawTBA, seebenchmark

TODO

FeatureTODO (High Priority, We value you contribution!)
👨‍💻Data- Re-implementJourney Learning
👨‍💻RL Training- Distributed Training
- Reinforcement Fine-Tuning (RFT)#80
👨‍💻PRM- Larger-scale training
- GenRM-CoT implementation
- Soft-label training#57
👨‍💻Reasoning- Optimize code structure#53
- More tasks on reasoning (AIME, etc.)#53
- Multi-modal reasoning#82
- Reasoning in code generation#68
- Dots#75
- Consistency check
- Benchmarking

Benchmark

SeeBenchmark !

Plots

PRM_ResultsInference_Results

Provided Datasets and Models

MATH-APS (Our Dataset)

MATH-psa (Our Process Reward Model)

Getting Started

Installation

conda create -n open_reasoner python=3.10conda activate open_reasonerpip install -r requirements.txtpip3 install  "fschat[model_worker,webui]"pip install -U pydanticcd envs/MATH/latex2sympypip install -e .cd -

Download Base Models

Before running the project, please ensure that all required base models are downloaded. The models used in this project include:

  • Qwen2.5-Math-1.5B-Instruct,Qwen2.5-Math-7B-Instruct
  • peiyi9979/mistral-7b-sft
  • peiyi9979/math-shepherd-mistral-7b-prm

To download these models, please refer to theHugging Face model downloading tutorial for step-by-step guidance on downloading models from the Hugging Face Hub.

Please make sure that all models are saved in their directories according to the project setup before proceeding.

Quickstart

Before running inference, please modify the following variables in the scripts underreason/llm_service/ to set the appropriate base models for your usage:

  • $MODEL_BASE: Set this to the directory where your models are stored.
  • $POLICY_MODEL_NAME: Set this to the name of the policy model you wish to use.
  • $VALUE_MODEL_NAME: Set this to the name of the value model you wish to use.
  • $NUM_LM_WORKER: Set this to the number of language model (LM) workers to start.
  • $NUM_RM_WORKER: Set this to the number of reward model (RM) workers to start.

Then it prepares and runs inference using different techniques.

Start LM & RM Services

For example, to start the LM and RM services for the Math Shepherd model, run the following command:

sh reason/llm_service/create_service_math_shepherd.sh

To kill the server processes, recommend using the following command:

tmux kill-session -t {Your Session Name}# default is `FastChat`

Usage

Run Inference

⚠️ Make sure the input (--LM,--RM) in the script aligns with the variables ($POLICY_MODEL_NAME,$VALUE_MODEL_NAME) in the pending worker!

export PYTHONPATH=$(pwd)sh scripts/eval/cot_greedy.sh# Method: cot. Average result: ({'majority_vote': 0.734, 'total_completion_tokens': 559.13},)sh scripts/eval/cot_rerank.sh# Method: best_of_n. Average result: ({'majority_vote': 0.782,#                                       'prm_min_max': 0.772,#                                       'prm_min_vote': 0.792,#                                       'prm_last_max': 0.776,#                                       'prm_last_vote': 0.792,#                                       'total_completion_tokens': 4431.268},)sh scripts/eval/beam_search.sh# Method: beam_search. Average result: ({'majority_vote': 0.74, 'total_completion_tokens': 2350.492},)sh scripts/eval/vanila_mcts.sh

Run Training

⚠️ Before training, please modify the$dataset_path,$model_name_or_path and$prm_name_or_path intrain/mat/scripts/train_llm.sh.

cd train/mat/scriptsbash train_llm.sh

Run PRM Learning

cd prm/code\\ single gpupython finetune_qwen_single_gpu.py --model_path$YOUR_MODEL_PATH \                                   --train_data_path$TRAIN_DATA_PATH \                                   --test_data_path$TEST_DATA_PATH\\ multi gputorchrun --nproc_per_node=2 finetune_qwen.py --model_path$YOUR_MODEL_PATH \                                             --data_path$YOUR_DATA_FOLDER_PATH \                                             --datasets both \

Join Us

Every contribution is valuable to the community.

Thank you for your interest inOpenR ! 🥰 We are deeply committed to the open-source community,and we welcome contributions from everyone. Your efforts, whether big or small, help us grow and improve.Contributions aren’t limited to code—answering questions, helping others, enhancing ourdocumentation, and sharing the project are equally impactful.

Feel free to checkout thecontribution guidance !

Future Plan

  • Add More Comprehensive Evaluations on RL Training and Search Strategies

  • Scaling the Prove-Verifier Model Size

  • Support Self-improvement Training

Contact

TheOpenR community is maintained by:

License

OpenR is released under the MIT License.

Citation

If you do find our resources helpful, please cite our paper:

@misc{wang2024tutorial,  author = {Jun Wang},  title = {A Tutorial on LLM Reasoning: Relevant Methods Behind ChatGPT o1},  year = {2024},  url = {https://github.com/openreasoner/openr/blob/main/reports/tutorial.pdf},  note = {Available on GitHub}}
@article{wang2024openr,  title={OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models},  author={Wang, Jun and Fang, Meng and Wan, Ziyu and Wen, Muning and Zhu, Jiachen and Liu, Anjie and Gong, Ziqin and Song, Yan and Chen, Lei and Ni, Lionel M and others},  journal={arXiv preprint arXiv:2410.09671},  year={2024}}

Response Examples

Comparing PRM, Math-psa (Ours) V.S. Math-Shepherd

QA 1QA 2

Justifing RL Training

QA 3QA 4

Exploring Test-time Computation

QA 5QA 6QA 7

Community

WeChat:

Reference

Inference-time Computing

[1]Alphazero-like tree-search can guide large language model decoding and training.

[2]Reasoning with language model is planning with world model.

[3]Scaling LLM test-time compute optimally can be more effective than scaling model parameters

[4]Think before you speak: Training language models with pause tokens

From Outcome Supervision to Process Supervision

[1]Training verifiers to solve math word problems

[2]Solving math word problems with process-and outcome-based feedback

[3]Let’s verify step by step

[4]Making large language models better reasoners with step-aware verifier

[5]Ovm, outcome-supervised value models for planning inmathematical reasoning

[6]Generative verifiers: Reward modeling as next-token prediction

Data Acquisition

[1]Star: Bootstrapping reasoning with reasoning

[2]Quiet-star: Language models can teach themselves to think before speaking

[3]Improve mathematical reasoning in language models by automatedprocess supervision

[4]Shepherd: A critic for language model generation

[5]Math-shepherd: Verify and reinforce llms step-by-step without human annotations

About

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

[8]ページ先頭

©2009-2025 Movatter.jp