Merge pull request #171 from huggingface/ThomasSimonini/BigUpdate

Big Update (small typos, feedback form etc)
2026-06-15 06:27:24 +08:00 · 2022-12-31 21:40:55 +01:00
parent 935b712cda e60f817254
commit b9856e2f54
18 changed files with 59 additions and 17 deletions
--- a/notebooks/unit1/requirements-unit1.txt
+++ b/notebooks/unit1/requirements-unit1.txt
@@ -0,0 +1,5 @@
+stable-baselines3[extra]
+box2d
+box2d-kengz
+huggingface_sb3
+pyglet==1.5.1
--- a/notebooks/unit1/unit1.ipynb
+++ b/notebooks/unit1/unit1.ipynb
@@ -247,7 +247,7 @@
      },
      "outputs": [],
      "source": [
-        "!pip install -r https://huggingface.co/spaces/ThomasSimonini/temp-space-requirements/raw/main/requirements/requirements-unit1.txt"
+        "!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt"
      ]
    },
    {
--- a/units/en/_toctree.yml
+++ b/units/en/_toctree.yml
@@ -46,6 +46,10 @@
    title: Play with Huggy
  - local: unitbonus1/conclusion
    title: Conclusion
+- title: Live 1. How the course work, Q&A, and playing with Huggy
+  sections:
+  - local: live1/live1
+    title: Live 1. How the course work, Q&A, and playing with Huggy 🐶
 - title: Unit 2. Introduction to Q-Learning
  sections:
  - local: unit2/introduction
@@ -96,7 +100,7 @@
    title: Conclusion
  - local: unit3/additional-readings
    title: Additional Readings
- title: Unit Bonus 2. Automatic Hyperparameter Tuning with Optuna
+- title: Bonus Unit 2. Automatic Hyperparameter Tuning with Optuna
  sections:
  - local: unitbonus2/introduction
    title: Introduction
--- a/units/en/communication/publishing-schedule.mdx
+++ b/units/en/communication/publishing-schedule.mdx
@@ -1,6 +1,6 @@
 # Publishing Schedule [[publishing-schedule]]

-We publish a **new unit every Monday** (except Monday, the 26th of December).
+We publish a **new unit every Tuesday**.

 If you don't want to miss any of the updates, don't forget to:

--- a/units/en/live1/live1.mdx
+++ b/units/en/live1/live1.mdx
@@ -0,0 +1,9 @@
+# Live 1: How the course work, Q&A, and playing with Huggy
+
+In this first live stream, we explained how the course work (scope, units, challenges, and more) and answered your questions.
+
+And finally, we saw some LunarLander agents you've trained and play with your Huggies 🐶
+
+<Youtube id="JeJIswxyrsM" />
+
+To know when the next live is scheduled **check the discord server**. We will also send **you an email**. If you can't participate, don't worry, we record the live sessions.
--- a/units/en/unit0/discord101.mdx
+++ b/units/en/unit0/discord101.mdx
@@ -9,7 +9,13 @@ Discord is a free chat platform. If you've used Slack, **it's quite similar**. T

 Starting in Discord can be a bit intimidating, so let me take you through it.

-When you sign-up to our Discord server, you'll need to specify which topics you're interested in by **clicking #role-assignment at the left**. Here, you can pick different categories. Make sure to **click "Reinforcement Learning"**! :fire:.  You'll then get to **introduce yourself in the `#introduction-yourself` channel**.
+When you sign-up to our Discord server, you'll need to specify which topics you're interested in by **clicking #role-assignment at the left**. 
+
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/discord1.jpg" alt="Discord"/>
+
+In #role-assignment, you can pick different categories. Make sure to **click "Reinforcement Learning"**. You'll then get to **introduce yourself in the `#introduction-yourself` channel**.
+
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/discord2.jpg" alt="Discord"/>

 ## So which channels are interesting to me? [[channels]]

--- a/units/en/unit0/introduction.mdx
+++ b/units/en/unit0/introduction.mdx
@@ -23,7 +23,7 @@ In this course, you will:

 - 📖 Study Deep Reinforcement Learning in **theory and practice.**
 - 🧑‍💻 Learn to **use famous Deep RL libraries** such as [Stable Baselines3](https://stable-baselines3.readthedocs.io/en/master/), [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), [Sample Factory](https://samplefactory.dev/) and [CleanRL](https://github.com/vwxyzjn/cleanrl).
- 🤖 **Train agents in unique environments** such as [SnowballFight](https://huggingface.co/spaces/ThomasSimonini/SnowballFight), [Huggy the Doggo 🐶](https://huggingface.co/spaces/ThomasSimonini/Huggy), [MineRL (Minecraft ⛏️)](https://minerl.io/), [VizDoom (Doom)](https://vizdoom.cs.put.edu.pl/) and classical ones such as [Space Invaders](https://www.gymlibrary.dev/environments/atari/) and [PyBullet](https://pybullet.org/wordpress/).
+- 🤖 **Train agents in unique environments** such as [SnowballFight](https://huggingface.co/spaces/ThomasSimonini/SnowballFight), [Huggy the Doggo 🐶](https://huggingface.co/spaces/ThomasSimonini/Huggy), [VizDoom (Doom)](https://vizdoom.cs.put.edu.pl/) and classical ones such as [Space Invaders](https://www.gymlibrary.dev/environments/atari/), [PyBullet](https://pybullet.org/wordpress/) and more.
 - 💾 Share your **trained agents with one line of code to the Hub** and also download powerful agents from the community.
 - 🏆 Participate in challenges where you will **evaluate your agents against other teams. You'll also get to play against the agents you'll train.**

@@ -58,7 +58,8 @@ You can choose to follow this course either:

 Both paths **are completely free**.
 Whatever path you choose, we advise you **to follow the recommended pace to enjoy the course and challenges with your fellow classmates.**
-You don't need to tell us which path you choose. At the end of March, when we verify the assignments **if you get more than 80% of the assignments done, you'll get a certificate.**
+
+You don't need to tell us which path you choose. At the end of March, when we will verify the assignments **if you get more than 80% of the assignments done, you'll get a certificate.**

 ## The Certification Process [[certification-process]]

@@ -92,7 +93,7 @@ You need only 3 things:
 ## What is the publishing schedule? [[publishing-schedule]]


-We publish **a new unit every Monday** (except Monday, the 26th of December).
+We publish **a new unit every Tuesday**.

 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/communication/schedule1.png" alt="Schedule 1" width="100%"/>
 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/communication/schedule2.png" alt="Schedule 2" width="100%"/>
@@ -128,7 +129,7 @@ In this new version of the course, you have two types of challenges:

 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/challenges.jpg" alt="Challenges" width="100%"/>

-These AI vs.AI challenges will be announced **later in December**.
+These AI vs.AI challenges will be announced **in January**.


 ## I found a bug, or I want to improve the course [[contribute]]
--- a/units/en/unit0/setup.mdx
+++ b/units/en/unit0/setup.mdx
@@ -21,6 +21,7 @@ We have multiple RL-related channels:
 - `rl-announcements`: where we give the last information about the course.
 - `rl-discussions`: where you can exchange about RL and share information.
 - `rl-study-group`: where you can create and join study groups.
+- `rl-i-made-this`: where you can share your projects and models.

 If this is your first time using Discord, we wrote a Discord 101 to get the best practices. Check the next section.

--- a/units/en/unit1/conclusion.mdx
+++ b/units/en/unit1/conclusion.mdx
@@ -12,5 +12,10 @@ In the next (bonus) unit, we’re going to reinforce what we just learned by **t

 You will be able then to play with him 🤗.

-<video src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/huggy.mp4" alt="Huggy" type="video/mp4">
-</video>
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/huggy.jpg" alt="Huggy"/>
+
+Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉  [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9)
+
+### Keep Learning, stay awesome 🤗
+
+
--- a/units/en/unit1/hands-on.mdx
+++ b/units/en/unit1/hands-on.mdx
@@ -139,7 +139,7 @@ To make things easier, we created a script to install all these dependencies.
 ```

 ```python
-!pip install -r https://huggingface.co/spaces/ThomasSimonini/temp-space-requirements/raw/main/requirements/requirements-unit1.txt
+!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt
 ```

 During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames).
--- a/units/en/unit1/introduction.mdx
+++ b/units/en/unit1/introduction.mdx
@@ -22,7 +22,6 @@ It's essential **to master these elements** before diving into implementing Dee

 After this unit, in a bonus unit, you'll be **able to train Huggy the Dog 🐶 to fetch the stick and play with him 🤗**.

-<video src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/huggy.mp4" alt="Huggy" type="video/mp4">
-</video>
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/huggy.jpg" alt="Huggy"/>

 So let's get started! 🚀
--- a/units/en/unit2/conclusion.mdx
+++ b/units/en/unit2/conclusion.mdx
@@ -15,5 +15,7 @@ In the next chapter, we’re going to dive deeper by studying our first Deep Rei
 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit4/atari-envs.gif" alt="Atari environments"/>


+Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉  [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9)

 ### Keep Learning, stay awesome 🤗
+
--- a/units/en/unit2/two-types-value-based-methods.mdx
+++ b/units/en/unit2/two-types-value-based-methods.mdx
@@ -62,7 +62,7 @@ For each state, the state-value function outputs the expected return if the agen

 In the action-value function, for each state and action pair, the action-value function **outputs the expected return** if the agent starts in that state and takes action, and then follows the policy forever after.

-The value of taking action an in state \\(s\\) under a policy \\(π\\) is:
+The value of taking action \\(a\\) in state \\(s\\) under a policy \\(π\\) is:

 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/action-state-value-function-1.jpg" alt="Action State value function"/>
 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/action-state-value-function-2.jpg" alt="Action State value function"/>
--- a/units/en/unit3/conclusion.mdx
+++ b/units/en/unit3/conclusion.mdx
@@ -11,4 +11,7 @@ Don't hesitate to train your agent in other environments (Pong, Seaquest, QBert,

 In the next unit, **we're going to learn about Optuna**. One of the most critical task in Deep Reinforcement Learning is to find a good set of training hyperparameters. And Optuna is a library that helps you to automate the search.

+Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉  [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9)
+
 ### Keep Learning, stay awesome 🤗
+
--- a/units/en/unit3/deep-q-network.mdx
+++ b/units/en/unit3/deep-q-network.mdx
@@ -30,7 +30,7 @@ No, because one frame is not enough to have a sense of motion! But what if I add
 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit4/temporal-limitation-2.jpg" alt="Temporal Limitation"/>
 That’s why, to capture temporal information, we stack four frames together.

-Then, the stacked frames are processed by three convolutional layers. These layers **allow us to capture and exploit spatial relationships in images**. But also, because frames are stacked together, **you can exploit some spatial properties across those frames**.
+Then, the stacked frames are processed by three convolutional layers. These layers **allow us to capture and exploit spatial relationships in images**. But also, because frames are stacked together, **you can exploit some temporal properties across those frames**.

 If you don't know what are convolutional layers, don't worry. You can check the [Lesson 4 of this free Deep Reinforcement Learning Course by Udacity](https://www.udacity.com/course/deep-learning-pytorch--ud188)

--- a/units/en/unit3/from-q-to-dqn.mdx
+++ b/units/en/unit3/from-q-to-dqn.mdx
@@ -13,7 +13,7 @@ Internally, our Q-function has **a Q-table, a table where each cell corresponds
 The problem is that Q-Learning is a *tabular method*. This raises a problem in which the states and actions spaces **are small enough to approximate value functions to be represented as arrays and tables**. Also, this is **not scalable**.
 Q-Learning worked well with small state space environments like:

- FrozenLake, we had 14 states.
+- FrozenLake, we had 16 states.
 - Taxi-v3, we had 500 states.

 But think of what we're going to do today: we will train an agent to learn to play Space Invaders a more complex game, using the frames as input.
--- a/units/en/unitbonus1/conclusion.mdx
+++ b/units/en/unitbonus1/conclusion.mdx
@@ -6,5 +6,7 @@ You can now sit and enjoy playing with your Huggy 🐶. And don't **forget to sp

 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/huggy-cover.jpeg" alt="Huggy cover" width="100%">

+Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉  [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9)
+
+### Keep Learning, stay awesome 🤗

-### Keep Learning, Stay Awesome 🤗
--- a/units/en/unitbonus2/hands-on.mdx
+++ b/units/en/unitbonus2/hands-on.mdx
@@ -9,3 +9,8 @@ Now that you've learned to use Optuna, we give you some ideas to apply what you'
 By doing that, you're going to see how Optuna is valuable and powerful in training better agents,

 Have fun,
+
+Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉  [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9)
+
+### Keep Learning, stay awesome 🤗
+