Finalized Unit 2

This commit is contained in:
simoninithomas
2022-12-03 11:54:15 +01:00
parent c5a39a171c
commit 8116d87b8f
11 changed files with 55 additions and 19 deletions

View File

@@ -30,6 +30,8 @@
title: Quiz
- local: unit1/conclusion
title: Conclusion
- local: unit1/additional-readings
title: Additional Readings
- title: Bonus Unit 1. Introduction to Deep Reinforcement Learning with Huggy
sections:
- local: unitbonus1/introduction
@@ -60,8 +62,8 @@
title: Second Quiz
- local: unit2/conclusion
title: Conclusion
- local: unit2/additional-reading
title: Additional Reading
- local: unit2/additional-readings
title: Additional Readings
- title: Unit 3. Deep Q-Learning with Atari Games
sections:
- local: unit3/introduction
@@ -78,8 +80,8 @@
title: Quiz
- local: unit3/conclusion
title: Conclusion
- local: unit3/additional-reading
title: Additional Reading
- local: unit3/additional-readings
title: Additional Readings
- title: Unit Bonus 2. Automatic Hyperparameter Tuning with Optuna
sections:
- local: unitbonus2/introduction

View File

@@ -0,0 +1,11 @@
# Additional Readings [[additional-readings]]
## Deep Reinforcement Learning [[deep-rl]]
- [Reinforcement Learning: An Introduction, Richard Sutton and Andrew G. Barto Chapter 1, 2 and 3](http://incompleteideas.net/book/RLbook2020.pdf)
- [Foundations of Deep RL Series, L1 MDPs, Exact Solution Methods, Max-ent RL by Pieter Abbeel](https://youtu.be/2GwBez0D20A)
- [Spinning Up RL by OpenAI Part 1: Key concepts of RL](https://spinningup.openai.com/en/latest/spinningup/rl_intro.html)
## Gym [[gym]]
- [Getting Started With OpenAI Gym: The Basic Building Blocks](https://blog.paperspace.com/getting-started-with-openai-gym/)

View File

@@ -1 +0,0 @@
# Additional Reading [[additional-reading]]

View File

@@ -0,0 +1,13 @@
# Additional Readings [[additional-readings]]
## Monte Carlo and TD Learning [[mc-td]]
To dive deeper on Monte Carlo and Temporal Difference Learning:
- <a href="https://stats.stackexchange.com/questions/355820/why-do-temporal-difference-td-methods-have-lower-variance-than-monte-carlo-met">Why do temporal difference (TD) methods have lower variance than Monte Carlo methods?</a>
- <a href="https://stats.stackexchange.com/questions/336974/when-are-monte-carlo-methods-preferred-over-temporal-difference-ones"> When are Monte Carlo methods preferred over temporal difference ones?</a>
## Q-Learning [[q-learning]]
- <a href="http://incompleteideas.net/book/RLbook2020.pdf">Reinforcement Learning: An Introduction, Richard Sutton and Andrew G. Barto Chapter 5, 6 and 7</a>
- <a href="https://youtu.be/Psrhxy88zww">Foundations of Deep RL Series, L2 Deep Q-Learning by Pieter Abbeel</a>

View File

@@ -5,9 +5,9 @@ The Bellman equation **simplifies our state value or state-action value calcula
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/bellman.jpg" alt="Bellman equation"/>
With what we learned from now, we know that if we calculate the \\(V(S_t)\\) (value of a state), we need to calculate the return starting at that state and then follow the policy forever after. **(Our policy that we defined in the following example is a Greedy Policy, and for simplification, we don't discount the reward).**
With what we learned so far, we know that if we calculate the \\(V(S_t)\\) (value of a state), we need to calculate the return starting at that state and then follow the policy forever after. **(Our policy that we defined in the following example is a Greedy Policy, and for simplification, we don't discount the reward).**
So to calculate \\(V(S_t)\\), we need to make the sum of the expected rewards. Hence:
So to calculate \\(V(S_t)\\), we need to calculate the sum of the expected rewards. Hence:
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/bellman2.jpg" alt="Bellman equation"/>
@@ -35,7 +35,7 @@ The Bellman equation is a recursive equation that works like this: instead of st
</figure>
If we go back to our example, the value of State 1= expected cumulative return if we start at that state.
If we go back to our example, we can say that the value of State 1 is equal to the expected cumulative return if we start at that state.
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/bellman2.jpg" alt="Bellman equation"/>

View File

@@ -1,2 +1,9 @@
# Hands-on [[hands-on]]
n
Now that we studied the Q-Learning algorithm, let's implement it from scratch and train our Q-Learning agent in two environments:
1. [Frozen-Lake-v1 (non-slippery and slippery version)](https://www.gymlibrary.dev/environments/toy_text/frozen_lake/) ☃️ : where our agent will need to **go from the starting state (S) to the goal state (G)** by walking only on frozen tiles (F) and avoiding holes (H).
2. [An autonomous taxi](https://www.gymlibrary.dev/environments/toy_text/taxi/) 🚖 will need **to learn to navigate** a city to **transport its passengers from point A to point B.**
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/envs.gif" alt="Environments"/>
Thanks to a [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard), you'll be able to compare your results with other classmates and exchange the best practices to improve your agent's scores Who will win the challenge for Unit 2?

View File

@@ -1,6 +1,7 @@
# Introduction to Q-Learning [[introduction-q-learning]]
ADD THUMBNAIL
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/thumbnail.jpg" alt="Unit 2 thumbnail" width="100%">
In the first chapter of this class, we learned about Reinforcement Learning (RL), the RL process, and the different methods to solve an RL problem. We also **trained our first agents and uploaded them to the Hugging Face Hub.**
@@ -14,13 +15,11 @@ We'll also **implement our first RL agent from scratch**: a Q-Learning agent an
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/envs.gif" alt="Environments"/>
Concretely, we will:
Concretely, we'll:
* learn about value-based methods
* learn about the differences between Monte Carlo and Temporal Difference Learning
* study and implement our first RL algorithm: Q-Learning
* implement our first RL agent
- Learn about **value-based methods**.
- Learn about the **differences between Monte Carlo and Temporal Difference Learning**.
- Study and implement **our first RL algorithm**: Q-Learning.s
This unit is **fundamental if you want to be able to work on Deep Q-Learning**: the first Deep RL algorithm that played Atari games and beat the human level on some of them (breakout, space invaders…).

View File

@@ -121,6 +121,6 @@ Now we **continue to interact with this environment with our updated value func
If we summarize:
- With *Monte Carlo*, we update the value function from a complete episode, and so we **use the actual accurate discounted return of this episode.**
- With *TD learning*, we update the value function from a step, so we replace \\(G_t\\) that we don't have with **an estimated return called TD target.**
- With *TD Learning*, we update the value function from a step, so we replace \\(G_t\\) that we don't have with **an estimated return called TD target.**
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/Summary.jpg" alt="Summary"/>

View File

@@ -22,4 +22,4 @@ And to find this optimal policy (hence solving the RL problem), there **are two
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/two-approaches.jpg" alt="Two RL approaches"/>
And in this unit, **we'll dive deeper into the Value-based methods.**
And in this unit, **we'll dive deeper into the value-based methods.**

View File

@@ -1 +0,0 @@
# Additional Reading [[additional-reading]]

View File

@@ -0,0 +1,6 @@
# Additional Readings [[additional-readings]]
- [Foundations of Deep RL Series, L2 Deep Q-Learning by Pieter Abbeel](https://youtu.be/Psrhxy88zww)
- [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/abs/1312.5602)
- [Double Deep Q-Learning](https://papers.nips.cc/paper/2010/hash/091d584fced301b442654dd8c23b3fc9-Abstract.html)
- [Prioritized Experience Replay](https://arxiv.org/abs/1511.05952)