From d48dc1ad6cdc16b7e71ddc9087047ae8ca7c62c0 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 31 Dec 2022 20:23:46 +0100 Subject: [PATCH 01/11] Update introduction.mdx --- units/en/unit0/introduction.mdx | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/units/en/unit0/introduction.mdx b/units/en/unit0/introduction.mdx index 3118d0d..aba76dd 100644 --- a/units/en/unit0/introduction.mdx +++ b/units/en/unit0/introduction.mdx @@ -23,7 +23,7 @@ In this course, you will: - ๐Ÿ“– Study Deep Reinforcement Learning in **theory and practice.** - ๐Ÿง‘โ€๐Ÿ’ป Learn to **use famous Deep RL libraries** such as [Stable Baselines3](https://stable-baselines3.readthedocs.io/en/master/), [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), [Sample Factory](https://samplefactory.dev/) and [CleanRL](https://github.com/vwxyzjn/cleanrl). -- ๐Ÿค– **Train agents in unique environments** such as [SnowballFight](https://huggingface.co/spaces/ThomasSimonini/SnowballFight), [Huggy the Doggo ๐Ÿถ](https://huggingface.co/spaces/ThomasSimonini/Huggy), [MineRL (Minecraftย โ›๏ธ)](https://minerl.io/), [VizDoom (Doom)](https://vizdoom.cs.put.edu.pl/)ย and classical ones such as [Space Invaders](https://www.gymlibrary.dev/environments/atari/) and [PyBullet](https://pybullet.org/wordpress/). +- ๐Ÿค– **Train agents in unique environments** such as [SnowballFight](https://huggingface.co/spaces/ThomasSimonini/SnowballFight), [Huggy the Doggo ๐Ÿถ](https://huggingface.co/spaces/ThomasSimonini/Huggy), [VizDoom (Doom)](https://vizdoom.cs.put.edu.pl/)ย and classical ones such as [Space Invaders](https://www.gymlibrary.dev/environments/atari/), [PyBullet](https://pybullet.org/wordpress/) and more. - ๐Ÿ’พ Share your **trained agents with one line of code to the Hub** and also download powerful agents from the community. - ๐Ÿ† Participate in challenges where you will **evaluate your agents against other teams. You'll also get to play against the agents you'll train.** @@ -58,7 +58,8 @@ You can choose to follow this course either: Both paths **are completely free**. Whatever path you choose, we advise you **to follow the recommended pace to enjoy the course and challenges with your fellow classmates.** -You don't need to tell us which path you choose. At the end of March, when we verify the assignments **if you get more than 80% of the assignments done, you'll get a certificate.** + +You don't need to tell us which path you choose. At the end of March, when we will verify the assignments **if you get more than 80% of the assignments done, you'll get a certificate.** ## The Certification Process [[certification-process]] @@ -92,7 +93,7 @@ You need only 3 things: ## What is the publishing schedule? [[publishing-schedule]] -We publish **a new unit every Monday** (except Monday, the 26th of December). +We publish **a new unit every Tuesday**. Schedule 1 Schedule 2 @@ -128,7 +129,7 @@ In this new version of the course, you have two types of challenges: Challenges -These AI vs.AI challenges will be announced **later in December**. +These AI vs.AI challenges will be announced **in January**. ## I found a bug, or I want to improve the course [[contribute]] From 8a68a5e8dc0a7683b6d8769b0b4f9a4befcc8394 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 31 Dec 2022 20:25:03 +0100 Subject: [PATCH 02/11] Update setup.mdx --- units/en/unit0/setup.mdx | 1 + 1 file changed, 1 insertion(+) diff --git a/units/en/unit0/setup.mdx b/units/en/unit0/setup.mdx index 4fc55bb..d02fd75 100644 --- a/units/en/unit0/setup.mdx +++ b/units/en/unit0/setup.mdx @@ -21,6 +21,7 @@ We have multiple RL-related channels: - `rl-announcements`: where we give the last information about the course. - `rl-discussions`: where you can exchange about RL and share information. - `rl-study-group`: where you can create and join study groups. +- `rl-i-made-this`: where you can share your projects and models. If this is your first time using Discord, we wrote a Discord 101 to get the best practices. Check the next section. From 10d539f24f78e808d1c3b2db4ac5bbf940c66100 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 31 Dec 2022 20:31:21 +0100 Subject: [PATCH 03/11] Update discord101.mdx --- units/en/unit0/discord101.mdx | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/units/en/unit0/discord101.mdx b/units/en/unit0/discord101.mdx index f970432..9904168 100644 --- a/units/en/unit0/discord101.mdx +++ b/units/en/unit0/discord101.mdx @@ -9,7 +9,13 @@ Discord is a free chat platform. If you've used Slack, **it's quite similar**. T Starting in Discord can be a bit intimidating, so let me take you through it. -When you sign-up to our Discord server, you'll need to specify which topics you're interested in by **clicking #role-assignment at the left**. Here, you can pick different categories. Make sure to **click "Reinforcement Learning"**! :fire:. You'll then get to **introduce yourself in the `#introduction-yourself` channel**. +When you sign-up to our Discord server, you'll need to specify which topics you're interested in by **clicking #role-assignment at the left**. + +Discord + +In #role-assignment, you can pick different categories. Make sure to **click "Reinforcement Learning"**. You'll then get to **introduce yourself in the `#introduction-yourself` channel**. + +Discord ## So which channels are interesting to me? [[channels]] From 9b531c3be00359013c54b192a0b1b486a31cf5b3 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 31 Dec 2022 20:52:40 +0100 Subject: [PATCH 04/11] Some small updates --- units/en/_toctree.yml | 6 +++++- units/en/live1/live1.mdx | 10 ++++++++++ units/en/unit1/conclusion.mdx | 6 ++++++ units/en/unit2/conclusion.mdx | 2 ++ units/en/unit2/two-types-value-based-methods.mdx | 2 +- units/en/unit3/conclusion.mdx | 3 +++ units/en/unit3/from-q-to-dqn.mdx | 2 +- units/en/unitbonus1/conclusion.mdx | 4 +++- units/en/unitbonus2/hands-on.mdx | 5 +++++ 9 files changed, 36 insertions(+), 4 deletions(-) create mode 100644 units/en/live1/live1.mdx diff --git a/units/en/_toctree.yml b/units/en/_toctree.yml index a46425e..be9464a 100644 --- a/units/en/_toctree.yml +++ b/units/en/_toctree.yml @@ -46,6 +46,10 @@ title: Play with Huggy - local: unitbonus1/conclusion title: Conclusion +- title: Live 1. How the course work, Q&A, and playing with Huggy ๐Ÿถ + sections: + - local: live1/live1.mdx + title: Live 1. How the course work, Q&A, and playing with Huggy ๐Ÿถ - title: Unit 2. Introduction to Q-Learning sections: - local: unit2/introduction @@ -96,7 +100,7 @@ title: Conclusion - local: unit3/additional-readings title: Additional Readings -- title: Unit Bonus 2. Automatic Hyperparameter Tuning with Optuna +- title: Bonus Unit 2. Automatic Hyperparameter Tuning with Optuna sections: - local: unitbonus2/introduction title: Introduction diff --git a/units/en/live1/live1.mdx b/units/en/live1/live1.mdx new file mode 100644 index 0000000..821ad23 --- /dev/null +++ b/units/en/live1/live1.mdx @@ -0,0 +1,10 @@ +# Live 1: Deep RL Course. Intro, Q&A, and playing with Huggy ๐Ÿถ + +In this first live stream, we explained how the course work (scope, units, challenges, and more) and answered your questions. + +And finally, we saw some LunarLander agents you've trained and play with your Huggies ๐Ÿถ + + + + +To know when the next live is scheduled **check the discord server**. We will also send **you an email**. If you can't participate, don't worry, we record the live sessions. \ No newline at end of file diff --git a/units/en/unit1/conclusion.mdx b/units/en/unit1/conclusion.mdx index de31951..d84665b 100644 --- a/units/en/unit1/conclusion.mdx +++ b/units/en/unit1/conclusion.mdx @@ -14,3 +14,9 @@ You will be able then to play with him ๐Ÿค—. + +Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please ๐Ÿ‘‰ [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9) + +### Keep Learning, stay awesome ๐Ÿค— + + diff --git a/units/en/unit2/conclusion.mdx b/units/en/unit2/conclusion.mdx index f271ce0..42ad84e 100644 --- a/units/en/unit2/conclusion.mdx +++ b/units/en/unit2/conclusion.mdx @@ -15,5 +15,7 @@ In the next chapter, weโ€™re going to dive deeper by studying our first Deep Rei Atari environments +Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please ๐Ÿ‘‰ [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9) ### Keep Learning, stay awesome ๐Ÿค— + diff --git a/units/en/unit2/two-types-value-based-methods.mdx b/units/en/unit2/two-types-value-based-methods.mdx index df83311..5ceb191 100644 --- a/units/en/unit2/two-types-value-based-methods.mdx +++ b/units/en/unit2/two-types-value-based-methods.mdx @@ -62,7 +62,7 @@ For each state, the state-value function outputs the expected return if the agen In the action-value function, for each state and action pair, the action-value functionย **outputs the expected return**ย if the agent starts in that state and takes action, and then follows the policy forever after. -The value of taking action an in state \\(s\\) under a policy \\(ฯ€\\) is: +The value of taking action \\(a\\) in state \\(s\\) under a policy \\(ฯ€\\) is: Action State value function Action State value function diff --git a/units/en/unit3/conclusion.mdx b/units/en/unit3/conclusion.mdx index 1e3592d..5b9754d 100644 --- a/units/en/unit3/conclusion.mdx +++ b/units/en/unit3/conclusion.mdx @@ -11,4 +11,7 @@ Don't hesitate to train your agent in other environments (Pong, Seaquest, QBert, In the next unit, **we're going to learn about Optuna**. One of the most critical task in Deep Reinforcement Learning is to find a good set of training hyperparameters. And Optuna is a library that helps you to automate the search. +Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please ๐Ÿ‘‰ [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9) + ### Keep Learning, stay awesome ๐Ÿค— + diff --git a/units/en/unit3/from-q-to-dqn.mdx b/units/en/unit3/from-q-to-dqn.mdx index d4f77a8..5b119c2 100644 --- a/units/en/unit3/from-q-to-dqn.mdx +++ b/units/en/unit3/from-q-to-dqn.mdx @@ -13,7 +13,7 @@ Internally, our Q-function hasย **a Q-table, a table where each cell corresponds The problem is that Q-Learning is a *tabular method*. This raises a problem in which the states and actions spaces **are small enough to approximate value functions to be represented as arrays and tables**. Also, this is **not scalable**. Q-Learning worked well with small state space environments like: -- FrozenLake, we had 14 states. +- FrozenLake, we had 16 states. - Taxi-v3, we had 500 states. But think of what we're going to do today: we will train an agent to learn to play Space Invaders a more complex game, using the frames as input. diff --git a/units/en/unitbonus1/conclusion.mdx b/units/en/unitbonus1/conclusion.mdx index 57cd254..d715edc 100644 --- a/units/en/unitbonus1/conclusion.mdx +++ b/units/en/unitbonus1/conclusion.mdx @@ -6,5 +6,7 @@ You can now sit and enjoy playing with your Huggy ๐Ÿถ. And don't **forget to sp Huggy cover +Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please ๐Ÿ‘‰ [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9) + +### Keep Learning, stay awesome ๐Ÿค— -### Keep Learning, Stay Awesome ๐Ÿค— diff --git a/units/en/unitbonus2/hands-on.mdx b/units/en/unitbonus2/hands-on.mdx index 4130e38..a49dcf7 100644 --- a/units/en/unitbonus2/hands-on.mdx +++ b/units/en/unitbonus2/hands-on.mdx @@ -9,3 +9,8 @@ Now that you've learned to use Optuna, we give you some ideas to apply what you' By doing that, you're going to see how Optuna is valuable and powerful in training better agents, Have fun, + +Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please ๐Ÿ‘‰ [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9) + +### Keep Learning, stay awesome ๐Ÿค— + From 6d2b2b6ae7311169825cd66914c45ed52586a250 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 31 Dec 2022 20:58:56 +0100 Subject: [PATCH 05/11] Update _toctree.yml --- units/en/_toctree.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/units/en/_toctree.yml b/units/en/_toctree.yml index be9464a..520655b 100644 --- a/units/en/_toctree.yml +++ b/units/en/_toctree.yml @@ -48,7 +48,7 @@ title: Conclusion - title: Live 1. How the course work, Q&A, and playing with Huggy ๐Ÿถ sections: - - local: live1/live1.mdx + - local: live1/live1 title: Live 1. How the course work, Q&A, and playing with Huggy ๐Ÿถ - title: Unit 2. Introduction to Q-Learning sections: From 90ce3173b8d98b43adac05198bfe3157180d1d1c Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 31 Dec 2022 21:06:20 +0100 Subject: [PATCH 06/11] Update Unit 1 notebook --- notebooks/unit1/requirements-unit1.txt | 5 +++++ notebooks/unit1/unit1.ipynb | 2 +- units/en/unit1/hands-on.mdx | 2 +- 3 files changed, 7 insertions(+), 2 deletions(-) create mode 100644 notebooks/unit1/requirements-unit1.txt diff --git a/notebooks/unit1/requirements-unit1.txt b/notebooks/unit1/requirements-unit1.txt new file mode 100644 index 0000000..b67799f --- /dev/null +++ b/notebooks/unit1/requirements-unit1.txt @@ -0,0 +1,5 @@ +stable-baselines3[extra] +box2d +box2d-kengz +huggingface_sb3 +pyglet==1.5.1 \ No newline at end of file diff --git a/notebooks/unit1/unit1.ipynb b/notebooks/unit1/unit1.ipynb index e3af2df..fff439e 100644 --- a/notebooks/unit1/unit1.ipynb +++ b/notebooks/unit1/unit1.ipynb @@ -247,7 +247,7 @@ }, "outputs": [], "source": [ - "!pip install -r https://huggingface.co/spaces/ThomasSimonini/temp-space-requirements/raw/main/requirements/requirements-unit1.txt" + "!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt" ] }, { diff --git a/units/en/unit1/hands-on.mdx b/units/en/unit1/hands-on.mdx index d3645de..0d5732d 100644 --- a/units/en/unit1/hands-on.mdx +++ b/units/en/unit1/hands-on.mdx @@ -139,7 +139,7 @@ To make things easier, we created a script to install all these dependencies. ``` ```python -!pip install -r https://huggingface.co/spaces/ThomasSimonini/temp-space-requirements/raw/main/requirements/requirements-unit1.txt +!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt ``` During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames). From 45d6455cd20bb92ed4d51a85f0a1772aaed67933 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 31 Dec 2022 21:09:32 +0100 Subject: [PATCH 07/11] Update schedule --- units/en/communication/publishing-schedule.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/units/en/communication/publishing-schedule.mdx b/units/en/communication/publishing-schedule.mdx index fe24045..c4fa7a7 100644 --- a/units/en/communication/publishing-schedule.mdx +++ b/units/en/communication/publishing-schedule.mdx @@ -1,6 +1,6 @@ # Publishing Schedule [[publishing-schedule]] -We publish a **new unit every Monday** (except Monday, the 26th of December). +We publish a **new unit every Tuesday**. If you don't want to miss any of the updates, don't forget to: From 633639d3e77f844badc5913c73458dc868d2d5ad Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 31 Dec 2022 21:11:47 +0100 Subject: [PATCH 08/11] Update --- units/en/live1/live1.mdx | 3 +-- units/en/unit3/deep-q-network.mdx | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/units/en/live1/live1.mdx b/units/en/live1/live1.mdx index 821ad23..f81bca8 100644 --- a/units/en/live1/live1.mdx +++ b/units/en/live1/live1.mdx @@ -1,4 +1,4 @@ -# Live 1: Deep RL Course. Intro, Q&A, and playing with Huggy ๐Ÿถ +# Live 1: How the course work, Q&A, and playing with Huggy ๐Ÿถ In this first live stream, we explained how the course work (scope, units, challenges, and more) and answered your questions. @@ -6,5 +6,4 @@ And finally, we saw some LunarLander agents you've trained and play with your Hu - To know when the next live is scheduled **check the discord server**. We will also send **you an email**. If you can't participate, don't worry, we record the live sessions. \ No newline at end of file diff --git a/units/en/unit3/deep-q-network.mdx b/units/en/unit3/deep-q-network.mdx index 75c66d3..b69dc58 100644 --- a/units/en/unit3/deep-q-network.mdx +++ b/units/en/unit3/deep-q-network.mdx @@ -30,7 +30,7 @@ No, because one frame is not enough to have a sense of motion! But what if I add Temporal Limitation Thatโ€™s why, to capture temporal information, we stack four frames together. -Then, the stacked frames are processed by three convolutional layers. These layers **allow us to capture and exploit spatial relationships in images**. But also, because frames are stacked together, **you can exploit some spatial properties across those frames**. +Then, the stacked frames are processed by three convolutional layers. These layers **allow us to capture and exploit spatial relationships in images**. But also, because frames are stacked together, **you can exploit some temporal properties across those frames**. If you don't know what are convolutional layers, don't worry. You can check the [Lesson 4 of this free Deep Reinforcement Learning Course by Udacity](https://www.udacity.com/course/deep-learning-pytorch--ud188) From 31105e358f59c4dc9448a4b2ee7ed8a1d84097ff Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 31 Dec 2022 21:22:17 +0100 Subject: [PATCH 09/11] Update --- units/en/_toctree.yml | 2 +- units/en/live1/live1.mdx | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/units/en/_toctree.yml b/units/en/_toctree.yml index 520655b..d7772a7 100644 --- a/units/en/_toctree.yml +++ b/units/en/_toctree.yml @@ -46,7 +46,7 @@ title: Play with Huggy - local: unitbonus1/conclusion title: Conclusion -- title: Live 1. How the course work, Q&A, and playing with Huggy ๐Ÿถ +- title: Live 1. How the course work, Q&A, and playing with Huggy sections: - local: live1/live1 title: Live 1. How the course work, Q&A, and playing with Huggy ๐Ÿถ diff --git a/units/en/live1/live1.mdx b/units/en/live1/live1.mdx index f81bca8..624365d 100644 --- a/units/en/live1/live1.mdx +++ b/units/en/live1/live1.mdx @@ -1,4 +1,4 @@ -# Live 1: How the course work, Q&A, and playing with Huggy ๐Ÿถ +# Live 1: How the course work, Q&A, and playing with Huggy In this first live stream, we explained how the course work (scope, units, challenges, and more) and answered your questions. From f9de15477c430d258f0cde18221a05a0c5898666 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 31 Dec 2022 21:30:31 +0100 Subject: [PATCH 10/11] Replace Huggy image --- units/en/unit1/introduction.mdx | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/units/en/unit1/introduction.mdx b/units/en/unit1/introduction.mdx index 660ec20..f8017cd 100644 --- a/units/en/unit1/introduction.mdx +++ b/units/en/unit1/introduction.mdx @@ -22,7 +22,6 @@ It's essential **to master these elements**ย before diving into implementing Dee After this unit, in a bonus unit, you'll be **able to train Huggy the Dog ๐Ÿถ to fetch the stick and play with him ๐Ÿค—**. - +Huggy So let's get started! ๐Ÿš€ From e60f817254b9198b9367d71e8a0a59df6e368c0f Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 31 Dec 2022 21:35:06 +0100 Subject: [PATCH 11/11] Update conclusion --- units/en/unit1/conclusion.mdx | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/units/en/unit1/conclusion.mdx b/units/en/unit1/conclusion.mdx index d84665b..504c1c0 100644 --- a/units/en/unit1/conclusion.mdx +++ b/units/en/unit1/conclusion.mdx @@ -12,8 +12,7 @@ In the next (bonus) unit, weโ€™re going to reinforce what we just learned by **t You will be able then to play with him ๐Ÿค—. - +Huggy Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please ๐Ÿ‘‰ [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9)