From 9b531c3be00359013c54b192a0b1b486a31cf5b3 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 31 Dec 2022 20:52:40 +0100 Subject: [PATCH] Some small updates --- units/en/_toctree.yml | 6 +++++- units/en/live1/live1.mdx | 10 ++++++++++ units/en/unit1/conclusion.mdx | 6 ++++++ units/en/unit2/conclusion.mdx | 2 ++ units/en/unit2/two-types-value-based-methods.mdx | 2 +- units/en/unit3/conclusion.mdx | 3 +++ units/en/unit3/from-q-to-dqn.mdx | 2 +- units/en/unitbonus1/conclusion.mdx | 4 +++- units/en/unitbonus2/hands-on.mdx | 5 +++++ 9 files changed, 36 insertions(+), 4 deletions(-) create mode 100644 units/en/live1/live1.mdx diff --git a/units/en/_toctree.yml b/units/en/_toctree.yml index a46425e..be9464a 100644 --- a/units/en/_toctree.yml +++ b/units/en/_toctree.yml @@ -46,6 +46,10 @@ title: Play with Huggy - local: unitbonus1/conclusion title: Conclusion +- title: Live 1. How the course work, Q&A, and playing with Huggy ๐Ÿถ + sections: + - local: live1/live1.mdx + title: Live 1. How the course work, Q&A, and playing with Huggy ๐Ÿถ - title: Unit 2. Introduction to Q-Learning sections: - local: unit2/introduction @@ -96,7 +100,7 @@ title: Conclusion - local: unit3/additional-readings title: Additional Readings -- title: Unit Bonus 2. Automatic Hyperparameter Tuning with Optuna +- title: Bonus Unit 2. Automatic Hyperparameter Tuning with Optuna sections: - local: unitbonus2/introduction title: Introduction diff --git a/units/en/live1/live1.mdx b/units/en/live1/live1.mdx new file mode 100644 index 0000000..821ad23 --- /dev/null +++ b/units/en/live1/live1.mdx @@ -0,0 +1,10 @@ +# Live 1: Deep RL Course. Intro, Q&A, and playing with Huggy ๐Ÿถ + +In this first live stream, we explained how the course work (scope, units, challenges, and more) and answered your questions. + +And finally, we saw some LunarLander agents you've trained and play with your Huggies ๐Ÿถ + + + + +To know when the next live is scheduled **check the discord server**. We will also send **you an email**. If you can't participate, don't worry, we record the live sessions. \ No newline at end of file diff --git a/units/en/unit1/conclusion.mdx b/units/en/unit1/conclusion.mdx index de31951..d84665b 100644 --- a/units/en/unit1/conclusion.mdx +++ b/units/en/unit1/conclusion.mdx @@ -14,3 +14,9 @@ You will be able then to play with him ๐Ÿค—. + +Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please ๐Ÿ‘‰ [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9) + +### Keep Learning, stay awesome ๐Ÿค— + + diff --git a/units/en/unit2/conclusion.mdx b/units/en/unit2/conclusion.mdx index f271ce0..42ad84e 100644 --- a/units/en/unit2/conclusion.mdx +++ b/units/en/unit2/conclusion.mdx @@ -15,5 +15,7 @@ In the next chapter, weโ€™re going to dive deeper by studying our first Deep Rei Atari environments +Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please ๐Ÿ‘‰ [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9) ### Keep Learning, stay awesome ๐Ÿค— + diff --git a/units/en/unit2/two-types-value-based-methods.mdx b/units/en/unit2/two-types-value-based-methods.mdx index df83311..5ceb191 100644 --- a/units/en/unit2/two-types-value-based-methods.mdx +++ b/units/en/unit2/two-types-value-based-methods.mdx @@ -62,7 +62,7 @@ For each state, the state-value function outputs the expected return if the agen In the action-value function, for each state and action pair, the action-value functionย **outputs the expected return**ย if the agent starts in that state and takes action, and then follows the policy forever after. -The value of taking action an in state \\(s\\) under a policy \\(ฯ€\\) is: +The value of taking action \\(a\\) in state \\(s\\) under a policy \\(ฯ€\\) is: Action State value function Action State value function diff --git a/units/en/unit3/conclusion.mdx b/units/en/unit3/conclusion.mdx index 1e3592d..5b9754d 100644 --- a/units/en/unit3/conclusion.mdx +++ b/units/en/unit3/conclusion.mdx @@ -11,4 +11,7 @@ Don't hesitate to train your agent in other environments (Pong, Seaquest, QBert, In the next unit, **we're going to learn about Optuna**. One of the most critical task in Deep Reinforcement Learning is to find a good set of training hyperparameters. And Optuna is a library that helps you to automate the search. +Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please ๐Ÿ‘‰ [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9) + ### Keep Learning, stay awesome ๐Ÿค— + diff --git a/units/en/unit3/from-q-to-dqn.mdx b/units/en/unit3/from-q-to-dqn.mdx index d4f77a8..5b119c2 100644 --- a/units/en/unit3/from-q-to-dqn.mdx +++ b/units/en/unit3/from-q-to-dqn.mdx @@ -13,7 +13,7 @@ Internally, our Q-function hasย **a Q-table, a table where each cell corresponds The problem is that Q-Learning is a *tabular method*. This raises a problem in which the states and actions spaces **are small enough to approximate value functions to be represented as arrays and tables**. Also, this is **not scalable**. Q-Learning worked well with small state space environments like: -- FrozenLake, we had 14 states. +- FrozenLake, we had 16 states. - Taxi-v3, we had 500 states. But think of what we're going to do today: we will train an agent to learn to play Space Invaders a more complex game, using the frames as input. diff --git a/units/en/unitbonus1/conclusion.mdx b/units/en/unitbonus1/conclusion.mdx index 57cd254..d715edc 100644 --- a/units/en/unitbonus1/conclusion.mdx +++ b/units/en/unitbonus1/conclusion.mdx @@ -6,5 +6,7 @@ You can now sit and enjoy playing with your Huggy ๐Ÿถ. And don't **forget to sp Huggy cover +Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please ๐Ÿ‘‰ [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9) + +### Keep Learning, stay awesome ๐Ÿค— -### Keep Learning, Stay Awesome ๐Ÿค— diff --git a/units/en/unitbonus2/hands-on.mdx b/units/en/unitbonus2/hands-on.mdx index 4130e38..a49dcf7 100644 --- a/units/en/unitbonus2/hands-on.mdx +++ b/units/en/unitbonus2/hands-on.mdx @@ -9,3 +9,8 @@ Now that you've learned to use Optuna, we give you some ideas to apply what you' By doing that, you're going to see how Optuna is valuable and powerful in training better agents, Have fun, + +Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please ๐Ÿ‘‰ [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9) + +### Keep Learning, stay awesome ๐Ÿค— +