- Notifications
You must be signed in to change notification settings - Fork37
Studying Reinforcement Learning Guide
License
0bserver07/Study-Reinforcement-Learning
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
- Simple guide and collective to study RL/DeepRL in one to 2.5 months of time.
Introduction to Reinforcement Learning by Joelle Pineau, McGill University:
Applications of RL.
When to use RL?
RL vs supervised learning
What is MDP? Markov Decision Process
Components of an RL agent:
- states
- actions (Probabilistic effects)
- Reward function
- Initial state distribution
+-----------------+ +--------------------- | | | | Agent | | | | +---------------------+ | +----------> | | | | | +-----------------+ | | | | state | | reward | action S(t) | | r(t) | a(t) | | | | | + | | | | r(t+1) +----------------------------+ | | +-----------+ | | | | | | <-----------+ | | | Environment | | | S(t+1) | | +---------------------+ | | +----------------------------+ + * Sutton and Barto (1998)
Explanation of the Markov Property:
Why Maximizing utility in:
- Episodic tasks
- Continuing tasks
- The discount factor, gamma γ
What is the policy & what to do with it?
- A policy defines the action-selection strategy at every state:
Value functions:
- The value of a policy equations are (two forms of) Bellman’s equation.
- (This is a dynamic programming algorithm).
- Iterative Policy Evaluation:
- Main idea: turn Bellman equations into update rules.
Optimal policies and optimal value functions.
- Finding a good policy: Policy Iteration (Check the talk Below By Peter Abeel)
- Finding a good policy: Value iteration
- Asynchronous value iteration:
- Instead of updating all states on every iteration, focus on important states.
Key challenges in RL:
- Designing the problem domain
- State representation– Action choice– Cost/reward signal
- Acquiring data for training– Exploration / exploitation– High cost actions– Time-delayed cost/reward signal
- Function approximation
- Validation / confidence measures
- Designing the problem domain
The RL lingo.
In large state spaces: Need approximation:
- Fitted Q-iteration:
- Use supervised learning to estimate the Q-function from a batch of training data:
- Input, Output and Loss.
- i.e: The Arcade Learning Environment
- Fitted Q-iteration:
Deep Q-network (DQN) and tips.
Deep Reinforcement Learning by Pieter Abbeel, EE & CS, UC Berkeley
Why Policy Optimization?
Cross Entropy Method (CEM) / Finite Differences / Fixing Random Seed
Likelihood Ratio (LR) Policy Gradient
Natural Gradient / Trust Regions (-> TRPO)
Actor-Critic (-> GAE, A3C)
Path Derivatives (PD) (-> DPG, DDPG, SVG)
Stochastic Computation Graphs (generalizes LR / PD)
Guided Policy Search (GPS)
Inverse Reinforcement Learning
- Inverse RL vs. behavioral cloning
Explanation with Implementation for some of the topics mentioned in the Deep Reinforcement Learning talk, written byArthur Juliani
- The TF / Python implementationscan be found here.
- Part 0 — Q-Learning Agents
- Part 1 — Two-Armed Bandit
- Part 1.5 — Contextual Bandits
- Part 2 — Policy-Based Agents
- Part 3 — Model-Based RL
- Part 4 — Deep Q-Networks and Beyond
- Part 5 — Visualizing an Agent’s Thoughts and Actions
- Part 6 — Partial Observability and Deep Recurrent Q-Networks
- Part 7 — Action-Selection Strategies for Exploration
- Part 8 — Asynchronous Actor-Critic Agents (A3C)
- Before starting out the books, here is a neat overview by Yuxi Li about Deep RL:
- Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
- Algorithms for Reinforcement Learning.
- Reinforcement Learning and Dynamic Programming using Function Approximators.
Reinforcement Learning by David Silver.
- Lecture 1: Introduction to Reinforcement Learning
- Lecture 2: Markov Decision Processes
- Lecture 3: Planning by Dynamic Programming
- Lecture 4: Model-Free Prediction
- Lecture 5: Model-Free Control
- Lecture 6: Value Function Approximation
- Lecture 7: Policy Gradient Methods
- Lecture 8: Integrating Learning and Planning
- Lecture 9: Exploration and Exploitation
- Lecture 10: Case Study: RL in Classic Games
CS 294: Deep Reinforcement Learning, Spring 2017 by John Schulman and Pieter Abbeel.
- Instructors: Sergey Levine, John Schulman, Chelsea Finn:
- My Bad Notes
About
Studying Reinforcement Learning Guide
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
