schatty/oprlPublic

NotificationsYou must be signed in to change notification settings
Fork28
Star133

A Modular Library for Off-Policy Reinforcement Learning with a focus on SafeRL and distributed computing

schatty.github.io/oprl

133 stars 28 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github/workflows		.github/workflows
src/oprl		src/oprl
tests/functional		tests/functional
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Repository files navigation

OPRL

A Modular Library for Off-Policy Reinforcement Learning with a focus on SafeRL and distributed computing. Benchmarking resutls are available at associated homepage:Homepage

Disclaimer

The project is under an active renovation, for the old code with D4PG algorithm working with multiprocessing queues andmujoco_py please refer to the branchd4pg_legacy.

Roadmap 🏗

Switching tomujoco 3.1.1
Replacing multiprocessing queues with RabbitMQ for distributed RL
Baselines with DDPG, TQC fordm_control for 1M step
Tests
Support for SafetyGymnasium
Style and readability improvements
Baselines with Distributed algorithms fordm_control
D4PG logic on top of TQC

Installation

pip install -r requirements.txtcd src && pip install -e .

For working withSafetyGymnasium install it manually

git clone https://github.com/PKU-Alignment/safety-gymnasiumcd safety-gymnasium && pip install -e .

Usage

To run DDPG in a single process

python src/oprl/configs/ddpg.py --env walker-walk

To run distributed DDPG

Run RabbitMQ

docker run -it --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3.12-management

Run training

python src/oprl/configs/d3pg.py --env walker-walk

Tests

cd src && pip install -e .cd .. && pip install -r tests/functional/requirements.txtpython -m pytest tests

Results

Results for single process DDPG and TQC:

Acknowledgements

DDPG and TD3 code is based on the official TD3 implementation:sfujim/TD3
TQC code is based on the official TQC implementation:SamsungLabs/tqc
SafetyGymnasium:PKU-Alignment/safety-gymnasium

About

A Modular Library for Off-Policy Reinforcement Learning with a focus on SafeRL and distributed computing

schatty.github.io/oprl

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

OPRL

Disclaimer

Roadmap 🏗

Installation

Usage

Tests

Results

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors2

Languages

Movatterモバイル変換

schatty/oprl

Folders and files

Latest commit

History

Repository files navigation

OPRL

Disclaimer

Roadmap 🏗

Installation

Usage

Tests

Results

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors2

Languages

Packages