ProspectivePulse/rl_optimize_warehouse_storage_managementPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star0

Optimize the storage of items in a simulated warehouse environment

License

MIT license

0 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
docs		docs
images		images
models		models
notebooks		notebooks
references		references
reports		reports
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.yaml		config.yaml
requirements.txt		requirements.txt
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini

Repository files navigation

DQN Agent for Warehouse Storage Optimization (OpenAI Gym)

Project Overview:
- This project implements a Deep Q-Network (DQN) to solve a simulated warehouse environment using PyTorch and TensorBoard for training visualization.
- The objective of the algorithm is to optimize the usage of storage space in the simulated warehouse environment.
Features:
- TensorBoard logging (episode length mean, reward mean, exploration rate, frames per second)
- Model saving/loading for inference
Model Performance Graphs (TensorBoard):
Starting from the top left image, here is an interpretation of the graphs displayed:
- rollout/ep_len_mean (Episode Length Mean): Shows the average number of steps per episode. Since, the plotted curve shows a steep initial increase followed by a plateau and then a slight dip - this suggests that the agent is learning to complete tasks efficiently (or hitting the terminal conditions quicker). On the whole, this indicates policy convergence.
- rollout/ep_reward_mean (Episode Reward Mean): Shows the average reward per episode. The sharp increase and then a plateau - shows learning progress and then eventual performance stabilization. This indicates successful training (as long as the plateau aligns with the desired behaviour).
- rollout/exploration_rate: Shows the epsilon decay in an epsilon-greedy policy. Since it drops from ~0.5 to 0.01 early, this confirms the exploration -> explotation shift. This suggests epsilon decay schedule was well-configured.
- time/fps (Frames Per Second): Shows the training speed (frames per second). Since it increases and stabilizes, this indicates good training pipeline performance. Although, this is not critical for policy performance, it is helpful in profiling runs.
Next Steps:
- Setup Experience replay
- Exploration Strategy to be confirmed
- Create/Upload the following files:
  - dqn_model.pt
  - any other relevant .ipynb files
- Upload TensorBoard logs

About

Optimize the storage of items in a simulated warehouse environment

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

DQN Agent for Warehouse Storage Optimization (OpenAI Gym)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

ProspectivePulse/rl_optimize_warehouse_storage_management

Folders and files

Latest commit

History

Repository files navigation

DQN Agent for Warehouse Storage Optimization (OpenAI Gym)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages