Movatterモバイル変換

0bserver07/Study-Reinforcement-LearningPublic

NotificationsYou must be signed in to change notification settings
Fork37
Star151

Studying Reinforcement Learning Guide

License

View license

151 stars 37 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Bad-Student-Notes-CS294-Spring17-RL		Bad-Student-Notes-CS294-Spring17-RL
Elements-Of-RL		Elements-Of-RL
.DS_Store		.DS_Store
license.md		license.md
readme.md		readme.md

Repository files navigation

Study Reinforcement Learning & (Deep RL) Guide:

Simple guide and collective to study RL/DeepRL in one to 2.5 months of time.

Talks to check out first:

Introduction to Reinforcement Learning by Joelle Pineau, McGill University:
- Applications of RL.
- When to use RL?
- RL vs supervised learning
- What is MDP? Markov Decision Process
- Components of an RL agent:
  - states
  - actions (Probabilistic effects)
  - Reward function
  - Initial state distribution
```
                              +-----------------+       +--------------------- |                 |       |                      |      Agent      |       |                      |                 | +---------------------+       |         +----------> |                 |                       |       |         |            +-----------------+                       |       |         |                                                      | state |         | reward                                               | action S(t)  |         | r(t)                                                 | a(t)       |         |                                                      |       |         | +                                                    |       |         | |  r(t+1) +----------------------------+             |       |         +-----------+                            |             |       |           |         |                            | <-----------+       |           |         |      Environment           |       |           |  S(t+1) |                            |       +---------------------+                            |                   |         +----------------------------+                   + * Sutton and Barto (1998)
```
- Explanation of the Markov Property:
- Why Maximizing utility in:
  - Episodic tasks
  - Continuing tasks
    - The discount factor, gamma γ
- What is the policy & what to do with it?
  - A policy defines the action-selection strategy at every state:
- Value functions:
  - The value of a policy equations are (two forms of) Bellman’s equation.
  - (This is a dynamic programming algorithm).
  - Iterative Policy Evaluation:
    - Main idea: turn Bellman equations into update rules.
- Optimal policies and optimal value functions.
  - Finding a good policy: Policy Iteration (Check the talk Below By Peter Abeel)
  - Finding a good policy: Value iteration
    - Asynchronous value iteration:
    - Instead of updating all states on every iteration, focus on important states.
- Key challenges in RL:
  - Designing the problem domain
    - State representation– Action choice– Cost/reward signal
  - Acquiring data for training– Exploration / exploitation– High cost actions– Time-delayed cost/reward signal
  - Function approximation
  - Validation / confidence measures
- The RL lingo.
- In large state spaces: Need approximation:
  - Fitted Q-iteration:
    - Use supervised learning to estimate the Q-function from a batch of training data:
    - Input, Output and Loss.
      - i.e: The Arcade Learning Environment
- Deep Q-network (DQN) and tips.
Deep Reinforcement Learning by Pieter Abbeel, EE & CS, UC Berkeley
- Why Policy Optimization?
- Cross Entropy Method (CEM) / Finite Differences / Fixing Random Seed
- Likelihood Ratio (LR) Policy Gradient
- Natural Gradient / Trust Regions (-> TRPO)
- Actor-Critic (-> GAE, A3C)
- Path Derivatives (PD) (-> DPG, DDPG, SVG)
- Stochastic Computation Graphs (generalizes LR / PD)
- Guided Policy Search (GPS)
- Inverse Reinforcement Learning
  - Inverse RL vs. behavioral cloning
- Explanation with Implementation for some of the topics mentioned in the Deep Reinforcement Learning talk, written byArthur Juliani

Books:

Before starting out the books, here is a neat overview by Yuxi Li about Deep RL:
- Deep Reinforcement Learning: An Overview
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
Algorithms for Reinforcement Learning.
Reinforcement Learning and Dynamic Programming using Function Approximators.

Courses:

Reinforcement Learning by David Silver.
- Lecture 1: Introduction to Reinforcement Learning
- Lecture 2: Markov Decision Processes
- Lecture 3: Planning by Dynamic Programming
- Lecture 4: Model-Free Prediction
- Lecture 5: Model-Free Control
- Lecture 6: Value Function Approximation
- Lecture 7: Policy Gradient Methods
- Lecture 8: Integrating Learning and Planning
- Lecture 9: Exploration and Exploitation
- Lecture 10: Case Study: RL in Classic Games
CS 294: Deep Reinforcement Learning, Spring 2017 by John Schulman and Pieter Abbeel.
- Instructors: Sergey Levine, John Schulman, Chelsea Finn:
- My Bad Notes

About

Studying Reinforcement Learning Guide

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Study Reinforcement Learning & (Deep RL) Guide:

Talks to check out first:

Books:

Courses:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Movatterモバイル変換

License

0bserver07/Study-Reinforcement-Learning

Folders and files

Latest commit

History

Repository files navigation

Study Reinforcement Learning & (Deep RL) Guide:

Talks to check out first:

Books:

Courses:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Packages