Movatterモバイル変換

forked fromsail-sg/oat

NotificationsYou must be signed in to change notification settings
Fork0
Star0

🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

arxiv.org/pdf/2411.01493

License

Apache-2.0 license

0 stars 13 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
benchmark		benchmark
docs		docs
examples		examples
k8s		k8s
oat		oat
scripts		scripts
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Repository files navigation

Installation |Usage |Examples |Benchmarking |Citation

Updates

26/01/2025: We support reinforcement learning with verifiable rewards (RLVR) for math reasoning.

Introduction

Oat 🌾 is a simple yet efficient framework for runningonline LLM alignment algorithms. Its key features include:

High Efficiency: Oat implements a distributedActor-Learner-Oracle architecture, with each component being optimized using state-of-the-art tools:
- Actor: UtilizesvLLM for accelerated online response sampling.
- Learner: LeveragesDeepSpeed ZeRO strategies to enhance memory efficiency.
- Oracle: Model-based oracle byMosec as a remote service, supporting dynamic batching, data parallelism and pipeline parallelism.
Simplified Workflow: Oat simplifies the experimental pipeline of LLM alignment. With anOracle served online, we can flexibly query it for preference data labeling as well as anytime model evaluation. All you need is to launch experiments and monitor real-time learning curves (e.g., win rate) on wandb (seereproduced results) — no need for manual training, checkpointing and loading for evaluation.
Oracle Simulation: Oat provides a diverse set of oracles to simulate preference/reward/verification feedback.
- Verifiable rewards supported using rule-based functions.
- Lightweight reward models run within the actor's process, enabling quick testing on as few as two GPUs.
- Larger and more capable reward models can be served remotely, harnessing additional compute and memory resources.
- LLM-as-a-judge is supported via querying OpenAI API for model-based pairwise ranking.
Ease of Use: Oat's modular structure allows researchers to easily inherit and modify existing classes, enabling rapid prototyping and experimentation with new algorithms.
Cutting-Edge Algorithms: Oat implements state-of-the-art online algorithms, fostering innovation and fair benchmarking.
- PPO (online RL) for math reasoning.
- Online DPO/SimPO/IPO for online preference learning.
- Online exploration (active alignment) algorithms, includingSEA, APL and XPO.

Installation

In a python environment with supported versions (>=3.8, <=3.10), you could install oat via PyPI:

pip install vllm==0.6.2&& pip install oat-llm

Or you could also install in "editable" mode for local development:

git clone git@github.com:sail-sg/oat.gitcd oatpip install vllm==0.6.2&& pip install -e.

Usage

Benchmarking

The benchmarking compares oat with the online DPO implementation fromhuggingface/trl. Below, we outline the configurations used for oat and present the benchmarking results. Notably, oat 🌾 achieves up to2.5x computational efficiency compared to trl 🤗.

Please refer toAppendix C of our paper for a detailed discussion of the benchmarking methods and results.

Citation

If you find this codebase useful for your research, please consider citing

@misc{liu2025oat,author       = {Zichen Liu and Changyu Chen and Chao Du and Wee Sun Lee and Min Lin},title        = {OAT: A research-friendly framework for LLM online alignment},howpublished = {[https://github.com/sail-sg/oat](https://github.com/sail-sg/oat)},year         = {2025}}

@article{  liu2024sea,  title={Sample-Efficient Alignment for LLMs},  author={Zichen Liu and Changyu Chen and Chao Du and Wee Sun Lee and Min Lin},  journal={arXiv preprint arXiv:2411.01493},  year={2024}}

License

oat is distributed under the terms of theApache2 license.

Acknowledgement

We thank the following awesome projects that have contributed to the development of oat:

Disclaimer

This is not an official Sea Limited or Garena Online Private Limited product.

About

🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

arxiv.org/pdf/2411.01493

Releases

No releases published

Packages

No packages published

Languages

Python96.1%
Shell3.6%
Makefile0.3%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

Updates

Introduction

Installation

Usage

Benchmarking

Citation

License

Acknowledgement

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

juvi21/oat

Folders and files

Latest commit

History

Repository files navigation

Updates

Introduction

Installation

Usage

Benchmarking

Citation

License

Acknowledgement

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages