- Notifications
You must be signed in to change notification settings - Fork11
Kotlin implementation of algorithms, examples, and exercises from the Sutton and Barto: Reinforcement Learning (2nd Edition)
License
wumo/Reinforcement-Learning-An-Introduction
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Kotlin implementation of algorithms, examples, and exercises from theSutton and Barto: Reinforcement Learning (2nd Edition). The purpose of this project is to help understanding RL algorithms and experimenting easily.
Inspired byShangtongZhang/reinforcement-learning-an-introduction (Python)andidsc-frazzoli/subare (Java 8)
Features:
- Algorithms and problems are separated. So you can experiment with various combination of <algorithm, problem> or <algorithm,function approximator, problem>
- Implementation is very close to the pseudo code in the book. So reading source code will help you understand the original algorithm.
Model-based (Dynamic Programming):
- Policy Iteration (Action-Value Iteration) (p.65)
- Value Iteration (p.67)
Monte Carlo (episode backup):
- First-visit MC prediction (p.76)
- Monte Carlo Exploring Starts (p.81)
- On-Policy first-visit MC control (p.83)
- Off-policy MC prediction (p.90)
- Off-policy MC control (p.91)
Temporal Difference (one-step backup):
- Tabular TD(0) (p.98)
- Sarsa (p.106)
- Q-learning (p.107)
- Expected Sarsa (p.109)
- Double Q-Learning (p.111)
n-step Temporal Difference (unify MC and TD):
- n-step TD prediction (p.117)
- n-step Sarsa (p.120)
- Off-policy n-step Sarsa (p.122)
- n-step Tree Backup (p.125)
- Off-policy n-step Q(σ) (p.128)
Dyna (Integrate Planning, Acting, and Learning):
- Random-sample one-step tabular Q-planning (p.133)
- Tabular Dyna-Q (p.135)
- Tabular Dyna-Q+ (p.138)
- Prioritized Sweeping (p.140)
- Prioritized Sweeping Stochastic Environment (p.141)
On-policy Prediction with Function Approximation
- Gradient Monte Carlo algorithm (p.165)
- Semi-gradient TD(0) (p.166)
- n-step semi-gradient TD (p.171)
- Least-Squares TD (p.186)
On-policy Control with Function Approximation
- Episodic semi-gradient Sarsa (p.198)
- Episodic semi-gradient n-step Sarsa (p.200)
- Differential semi-gradient Sarsa (p.203)
- Differential semi-gradient n-step Sarsa (p.206)
Off-policy Methods with Approximation
- Semi-gradient off-policy TD(0) (p.210)
- Semi-gradient Expected Sarsa (p.210)
- n-step semi-gradient off-policy Sarsa (p.211)
- n-step semi-gradient off-policy Q(σ) (p.211)
Eligibility Traces
- Off-line λ-return (p.238)
- Semi-gradient TD(λ) prediction (p.240)
- True Online TD(λ) prediction (p.246)
- Sarsa(λ) (p.250)
- True Online Sarsa(λ) (p.252)
Policy Gradient Methods
- REINFORCE, A Monte-Carlo Policy-Gradient Method (episodic) (p.271)
- REINFORCE with Baseline (episodic) (p.273)
- One-step Actor-Critic (episodic) (p.274)
- Actor-Critic with Eligibility Traces (episodic) (p.275)
- Actor-Critic with Eligibility Traces (continuing) (p.277)
- Grid world (p.61)
- Jack's Car Rental and exercise 4.4 (p.65)
- Gambler's Problem (p.68)
- Blackjack (p.76)
- Random Walk (p.102)
- Windy Gridworld and King's Moves (p.106)
- Cliff Walking (p.108)
- Maximization Bias Example (p.110)
- 19-state Random Walk (p.118)
- Dyna Maze (p.136)
- Rod Maneuvering (p.141)
- 1000-state Random Walk (p.166)
- Mountain Car (p.198)
- Access-Control Queuing Task (p.204)
Built withMaven
TryTestcases
Figure 7.2: Performance of n-step TD methods as acc function of α, for various values of n, on acc 19-state randomwalk task
Figure 10.1: The Mountain Car task and the cost-to-go function learnedduring one run
Figure 10.4: Effect of the α and n on early performance of n-step semi-gradient Sarsa and tile-coding functionapproximation on the Mountain Car task
Figure 12.3: 19-state Random walk results: Performance of the offline λ-return algorithm .
Figure 12.6: 19-state Random walk results: Performance of TD(λ) .
Figure 12.8: 19-state Random walk results: Performance of online λ-return algorithms
Figure 12.10: Early performance on the Mountain Car task of Sarsa(λ) with replacing traces
Figure 12.11: Summary comparison of Sarsa(λ) algorithms on the Mountain Car task.
About
Kotlin implementation of algorithms, examples, and exercises from the Sutton and Barto: Reinforcement Learning (2nd Edition)
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.