Movatterモバイル変換

Skip to content

wumo/Reinforcement-Learning-An-IntroductionPublic

NotificationsYou must be signed in to change notification settings
Fork11
Star39

Kotlin implementation of algorithms, examples, and exercises from the Sutton and Barto: Reinforcement Learning (2nd Edition)

License

39 stars 11 forks Branches Tags Activity

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 367 Commits
gradle/wrapper		gradle/wrapper
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Repository files navigation

Reinforcement Learning: An Introduction

Kotlin implementation of algorithms, examples, and exercises from theSutton and Barto: Reinforcement Learning (2nd Edition). The purpose of this project is to help understanding RL algorithms and experimenting easily.

Inspired byShangtongZhang/reinforcement-learning-an-introduction (Python)andidsc-frazzoli/subare (Java 8)

Features:

Algorithms and problems are separated. So you can experiment with various combination of <algorithm, problem> or <algorithm,function approximator, problem>
Implementation is very close to the pseudo code in the book. So reading source code will help you understand the original algorithm.

Implemented algorithms:

Model-based (Dynamic Programming):

Policy Iteration (Action-Value Iteration) (p.65)
Value Iteration (p.67)

Monte Carlo (episode backup):

Temporal Difference (one-step backup):

Tabular TD(0) (p.98)
Sarsa (p.106)
Q-learning (p.107)
Expected Sarsa (p.109)
Double Q-Learning (p.111)

n-step Temporal Difference (unify MC and TD):

Dyna (Integrate Planning, Acting, and Learning):

Random-sample one-step tabular Q-planning (p.133)
Tabular Dyna-Q (p.135)
Tabular Dyna-Q+ (p.138)
Prioritized Sweeping (p.140)
Prioritized Sweeping Stochastic Environment (p.141)

On-policy Prediction with Function Approximation

On-policy Control with Function Approximation

Episodic semi-gradient Sarsa (p.198)
Episodic semi-gradient n-step Sarsa (p.200)
Differential semi-gradient Sarsa (p.203)
Differential semi-gradient n-step Sarsa (p.206)

Off-policy Methods with Approximation

Semi-gradient off-policy TD(0) (p.210)
Semi-gradient Expected Sarsa (p.210)
n-step semi-gradient off-policy Sarsa (p.211)
n-step semi-gradient off-policy Q(σ) (p.211)

Eligibility Traces

Policy Gradient Methods

REINFORCE, A Monte-Carlo Policy-Gradient Method (episodic) (p.271)
REINFORCE with Baseline (episodic) (p.273)
One-step Actor-Critic (episodic) (p.274)
Actor-Critic with Eligibility Traces (episodic) (p.275)
Actor-Critic with Eligibility Traces (continuing) (p.277)

Implemented problems:

Grid world (p.61)
Jack's Car Rental and exercise 4.4 (p.65)
Gambler's Problem (p.68)
Blackjack (p.76)
Random Walk (p.102)
Windy Gridworld and King's Moves (p.106)
Cliff Walking (p.108)
Maximization Bias Example (p.110)
19-state Random Walk (p.118)
Dyna Maze (p.136)
Rod Maneuvering (p.141)
1000-state Random Walk (p.166)
Mountain Car (p.198)
Access-Control Queuing Task (p.204)

Build

Built withMaven

Test cases

Figure 7.2: Performance of n-step TD methods as acc function of α, for various values of n, on acc 19-state randomwalk task

Figure 10.1: The Mountain Car task and the cost-to-go function learnedduring one run

Figure 10.4: Effect of the α and n on early performance of n-step semi-gradient Sarsa and tile-coding functionapproximation on the Mountain Car task

Figure 12.3: 19-state Random walk results: Performance of the offline λ-return algorithm .

Figure 12.6: 19-state Random walk results: Performance of TD(λ) .

Figure 12.8: 19-state Random walk results: Performance of online λ-return algorithms

Figure 12.10: Early performance on the Mountain Car task of Sarsa(λ) with replacing traces

Figure 12.11: Summary comparison of Sarsa(λ) algorithms on the Mountain Car task.

About

Kotlin implementation of algorithms, examples, and exercises from the Sutton and Barto: Reinforcement Learning (2nd Edition)

Topics

kotlin reinforcement-learning qlearning sarsa dyna

Resources

License

Stars

Watchers

Forks

Report repository

Releases

No releases published

Packages

No packages published

Contributors2

Languages

Kotlin100.0%

[8]ページ先頭

©2009-2025 Movatter.jp