kochlisGit/Reinforcement-Learning-AlgorithmsPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star4

This project focuses on comparing different Reinforcement Learning Algorithms, including monte-carlo, q-learning, lambda q-learning epsilon-greedy variations, etc.

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Frozen Lake		Frozen Lake
Multi-Bandit Army		Multi-Bandit Army
Tic-Tac-Toe		Tic-Tac-Toe
README.md		README.md

Repository files navigation

Reinforcement-Learning-Algorithms

This project focuses on comparing different Reinforcement Learning Algorithms. I have implemented 3 custom(Openai-Gym like) environments to test my algorithms:

Tic-Tac-Toe (The classical tic-tac-toe game)
Frozen Lake (Custom implementation of the openai-gym frozen lake)
Multi-Bandit-Army (Exploration & Exploitation of the best-winning-chance fruit machine)

#Exploration - Exploitation Algorithms

Epsilon Greedy

The agent explores every possible action with a small probability, but most often exploits the best action:https://www.ijntr.org/download_data/IJNTR06090006.pdf

Decaying Epsilon Greedy

The agent starts by exploring every possible action with very high initial probability, however, this probability decays over the time. This is an improvement of Epsilon-Greedyalgorithm:http://tokic.com/www/tokicm/publikationen/papers/AdaptiveEpsilonGreedyExploration.pdf

Initial Optimistic Values

Initially, all actions are considered to be the best. While the agent always exploits the best possible action from a state, the best action will always tend to have the highestmean reward:https://ieeexplore.ieee.org/document/8167915

Upper Confidence Bound (UCB)

This tutorial explains very well how this algorithm works and why it is superior to epsilon greedy:https://www.geeksforgeeks.org/upper-confidence-bound-algorithm-in-reinforcement-learning/

Thompson Sampling (or Bayesian Sampling)

Probably one of the best exploration-exploitation algorithm. However, It is rarely used due to its difficulty in the implementation:https://proceedings.neurips.cc/paper/2011/file/e53a0a2978c28872a4505bdb51db06dc-Paper.pdf

About

This project focuses on comparing different Reinforcement Learning Algorithms, including monte-carlo, q-learning, lambda q-learning epsilon-greedy variations, etc.

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Reinforcement-Learning-Algorithms

Epsilon Greedy

Decaying Epsilon Greedy

Initial Optimistic Values

Upper Confidence Bound (UCB)

Thompson Sampling (or Bayesian Sampling)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

kochlisGit/Reinforcement-Learning-Algorithms

Folders and files

Latest commit

History

Repository files navigation

Reinforcement-Learning-Algorithms

Epsilon Greedy

Decaying Epsilon Greedy

Initial Optimistic Values

Upper Confidence Bound (UCB)

Thompson Sampling (or Bayesian Sampling)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages