Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

This project focuses on comparing different Reinforcement Learning Algorithms, including monte-carlo, q-learning, lambda q-learning epsilon-greedy variations, etc.

NotificationsYou must be signed in to change notification settings

kochlisGit/Reinforcement-Learning-Algorithms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This project focuses on comparing different Reinforcement Learning Algorithms. I have implemented 3 custom(Openai-Gym like) environments to test my algorithms:

  1. Tic-Tac-Toe (The classical tic-tac-toe game)
  2. Frozen Lake (Custom implementation of the openai-gym frozen lake)
  3. Multi-Bandit-Army (Exploration & Exploitation of the best-winning-chance fruit machine)

#Exploration - Exploitation Algorithms

Epsilon Greedy

The agent explores every possible action with a small probability, but most often exploits the best action:https://www.ijntr.org/download_data/IJNTR06090006.pdf

Decaying Epsilon Greedy

The agent starts by exploring every possible action with very high initial probability, however, this probability decays over the time. This is an improvement of Epsilon-Greedyalgorithm:http://tokic.com/www/tokicm/publikationen/papers/AdaptiveEpsilonGreedyExploration.pdf

Initial Optimistic Values

Initially, all actions are considered to be the best. While the agent always exploits the best possible action from a state, the best action will always tend to have the highestmean reward:https://ieeexplore.ieee.org/document/8167915

Upper Confidence Bound (UCB)

This tutorial explains very well how this algorithm works and why it is superior to epsilon greedy:https://www.geeksforgeeks.org/upper-confidence-bound-algorithm-in-reinforcement-learning/

Thompson Sampling (or Bayesian Sampling)

Probably one of the best exploration-exploitation algorithm. However, It is rarely used due to its difficulty in the implementation:https://proceedings.neurips.cc/paper/2011/file/e53a0a2978c28872a4505bdb51db06dc-Paper.pdf

About

This project focuses on comparing different Reinforcement Learning Algorithms, including monte-carlo, q-learning, lambda q-learning epsilon-greedy variations, etc.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp