diff --git a/units/en/_toctree.yml b/units/en/_toctree.yml index d5541fc..f75636e 100644 --- a/units/en/_toctree.yml +++ b/units/en/_toctree.yml @@ -30,6 +30,8 @@ title: Quiz - local: unit1/conclusion title: Conclusion + - local: unit1/additional-readings + title: Additional Readings - title: Bonus Unit 1. Introduction to Deep Reinforcement Learning with Huggy sections: - local: unitbonus1/introduction @@ -60,8 +62,8 @@ title: Second Quiz - local: unit2/conclusion title: Conclusion - - local: unit2/additional-reading - title: Additional Reading + - local: unit2/additional-readings + title: Additional Readings - title: Unit 3. Deep Q-Learning with Atari Games sections: - local: unit3/introduction @@ -78,8 +80,8 @@ title: Quiz - local: unit3/conclusion title: Conclusion - - local: unit3/additional-reading - title: Additional Reading + - local: unit3/additional-readings + title: Additional Readings - title: Unit Bonus 2. Automatic Hyperparameter Tuning with Optuna sections: - local: unitbonus2/introduction diff --git a/units/en/unit1/additional-readings.mdx b/units/en/unit1/additional-readings.mdx new file mode 100644 index 0000000..73e6a1e --- /dev/null +++ b/units/en/unit1/additional-readings.mdx @@ -0,0 +1,11 @@ +# Additional Readings [[additional-readings]] + +## Deep Reinforcement Learning [[deep-rl]] + +- [Reinforcement Learning: An Introduction, Richard Sutton and Andrew G. Barto Chapter 1, 2 and 3](http://incompleteideas.net/book/RLbook2020.pdf) +- [Foundations of Deep RL Series, L1 MDPs, Exact Solution Methods, Max-ent RL by Pieter Abbeel](https://youtu.be/2GwBez0D20A) +- [Spinning Up RL by OpenAI Part 1: Key concepts of RL](https://spinningup.openai.com/en/latest/spinningup/rl_intro.html) + +## Gym [[gym]] + +- [Getting Started With OpenAI Gym: The Basic Building Blocks](https://blog.paperspace.com/getting-started-with-openai-gym/) diff --git a/units/en/unit2/additional-reading.mdx b/units/en/unit2/additional-reading.mdx deleted file mode 100644 index 46a60ee..0000000 --- a/units/en/unit2/additional-reading.mdx +++ /dev/null @@ -1 +0,0 @@ -# Additional Reading [[additional-reading]] diff --git a/units/en/unit2/additional-readings.mdx b/units/en/unit2/additional-readings.mdx new file mode 100644 index 0000000..ebc3fa9 --- /dev/null +++ b/units/en/unit2/additional-readings.mdx @@ -0,0 +1,13 @@ +# Additional Readings [[additional-readings]] + +## Monte Carlo and TD Learning [[mc-td]] + +To dive deeper on Monte Carlo and Temporal Difference Learning: + +- Why do temporal difference (TD) methods have lower variance than Monte Carlo methods? +- When are Monte Carlo methods preferred over temporal difference ones? + +## Q-Learning [[q-learning]] + +- Reinforcement Learning: An Introduction, Richard Sutton and Andrew G. Barto Chapter 5, 6 and 7 +- Foundations of Deep RL Series, L2 Deep Q-Learning by Pieter Abbeel diff --git a/units/en/unit2/bellman-equation.mdx b/units/en/unit2/bellman-equation.mdx index 27c7d35..9e3753f 100644 --- a/units/en/unit2/bellman-equation.mdx +++ b/units/en/unit2/bellman-equation.mdx @@ -5,9 +5,9 @@ The Bellman equation **simplifies our state value or state-action value calcula Bellman equation -With what we learned from now, we know that if we calculate the \\(V(S_t)\\) (value of a state), we need to calculate the return starting at that state and then follow the policy forever after. **(Our policy that we defined in the following example is a Greedy Policy, and for simplification, we don't discount the reward).** +With what we learned so far, we know that if we calculate the \\(V(S_t)\\) (value of a state), we need to calculate the return starting at that state and then follow the policy forever after. **(Our policy that we defined in the following example is a Greedy Policy, and for simplification, we don't discount the reward).** -So to calculate \\(V(S_t)\\), we need to make the sum of the expected rewards. Hence: +So to calculate \\(V(S_t)\\), we need to calculate the sum of the expected rewards. Hence:
Bellman equation @@ -35,7 +35,7 @@ The Bellman equation is a recursive equation that works like this: instead of st
-If we go back to our example, the value of State 1= expected cumulative return if we start at that state. +If we go back to our example, we can say that the value of State 1 is equal to the expected cumulative return if we start at that state. Bellman equation diff --git a/units/en/unit2/hands-on.mdx b/units/en/unit2/hands-on.mdx index a9f9bb1..7d4365c 100644 --- a/units/en/unit2/hands-on.mdx +++ b/units/en/unit2/hands-on.mdx @@ -1,2 +1,9 @@ # Hands-on [[hands-on]] -n + +Now that we studied the Q-Learning algorithm, let's implement it from scratch and train our Q-Learning agent in two environments: +1. [Frozen-Lake-v1 (non-slippery and slippery version)](https://www.gymlibrary.dev/environments/toy_text/frozen_lake/) ☃️ : where our agent will need to **go from the starting state (S) to the goal state (G)** by walking only on frozen tiles (F) and avoiding holes (H). +2. [An autonomous taxi](https://www.gymlibrary.dev/environments/toy_text/taxi/) 🚖 will need **to learn to navigate** a city to **transport its passengers from point A to point B.** + +Environments + +Thanks to a [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard), you'll be able to compare your results with other classmates and exchange the best practices to improve your agent's scores Who will win the challenge for Unit 2? diff --git a/units/en/unit2/introduction.mdx b/units/en/unit2/introduction.mdx index a05b237..98c56d5 100644 --- a/units/en/unit2/introduction.mdx +++ b/units/en/unit2/introduction.mdx @@ -1,6 +1,7 @@ # Introduction to Q-Learning [[introduction-q-learning]] -ADD THUMBNAIL +Unit 2 thumbnail + In the first chapter of this class, we learned about Reinforcement Learning (RL), the RL process, and the different methods to solve an RL problem. We also **trained our first agents and uploaded them to the Hugging Face Hub.** @@ -14,13 +15,11 @@ We'll also **implement our first RL agent from scratch**: a Q-Learning agent an Environments +Concretely, we will: -Concretely, we'll: - -* learn about value-based methods -* learn about the differences between Monte Carlo and Temporal Difference Learning -* study and implement our first RL algorithm: Q-Learning -* implement our first RL agent +- Learn about **value-based methods**. +- Learn about the **differences between Monte Carlo and Temporal Difference Learning**. +- Study and implement **our first RL algorithm**: Q-Learning.s This unit is **fundamental if you want to be able to work on Deep Q-Learning**: the first Deep RL algorithm that played Atari games and beat the human level on some of them (breakout, space invaders…). diff --git a/units/en/unit2/mc-vs-td.mdx b/units/en/unit2/mc-vs-td.mdx index 5410237..e78ee78 100644 --- a/units/en/unit2/mc-vs-td.mdx +++ b/units/en/unit2/mc-vs-td.mdx @@ -121,6 +121,6 @@ Now we **continue to interact with this environment with our updated value func If we summarize: - With *Monte Carlo*, we update the value function from a complete episode, and so we **use the actual accurate discounted return of this episode.** - - With *TD learning*, we update the value function from a step, so we replace \\(G_t\\) that we don't have with **an estimated return called TD target.** + - With *TD Learning*, we update the value function from a step, so we replace \\(G_t\\) that we don't have with **an estimated return called TD target.** Summary diff --git a/units/en/unit2/what-is-rl.mdx b/units/en/unit2/what-is-rl.mdx index fef3e96..2c31486 100644 --- a/units/en/unit2/what-is-rl.mdx +++ b/units/en/unit2/what-is-rl.mdx @@ -22,4 +22,4 @@ And to find this optimal policy (hence solving the RL problem), there **are two Two RL approaches -And in this unit, **we'll dive deeper into the Value-based methods.** +And in this unit, **we'll dive deeper into the value-based methods.** diff --git a/units/en/unit3/additional-reading.mdx b/units/en/unit3/additional-reading.mdx deleted file mode 100644 index 46a60ee..0000000 --- a/units/en/unit3/additional-reading.mdx +++ /dev/null @@ -1 +0,0 @@ -# Additional Reading [[additional-reading]] diff --git a/units/en/unit3/additional-readings.mdx b/units/en/unit3/additional-readings.mdx new file mode 100644 index 0000000..1c91b69 --- /dev/null +++ b/units/en/unit3/additional-readings.mdx @@ -0,0 +1,6 @@ +# Additional Readings [[additional-readings]] + +- [Foundations of Deep RL Series, L2 Deep Q-Learning by Pieter Abbeel](https://youtu.be/Psrhxy88zww) +- [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/abs/1312.5602) +- [Double Deep Q-Learning](https://papers.nips.cc/paper/2010/hash/091d584fced301b442654dd8c23b3fc9-Abstract.html) +- [Prioritized Experience Replay](https://arxiv.org/abs/1511.05952)