Various updates

2026-02-03 02:14:53 +08:00 · 2023-02-25 10:49:08 +01:00
parent b3baa12013
commit 4f0bcb5c18
18 changed files with 75 additions and 26 deletions
--- a/README.md
+++ b/README.md
@@ -1,6 +1,8 @@
 # [The Hugging Face Deep Reinforcement Learning Course 🤗 (v2.0)](https://huggingface.co/deep-rl-course/unit0/introduction)

-This repository contains the Deep Reinforcement Learning Course mdx files and notebooks. The website is here: https://huggingface.co/deep-rl-course/unit0/introduction?fw=pt
+If you like the course, don't hesitate to **⭐ star this repository. This helps us 🤗**.
+
+This repository contains the Deep Reinforcement Learning Course mdx files and notebooks. **The website is here**: https://huggingface.co/deep-rl-course/unit0/introduction?fw=pt

 - The syllabus 📚: https://simoninithomas.github.io/deep-rl-course

--- a/unit1/unit1.ipynb
+++ b/unit1/unit1.ipynb
@@ -17,6 +17,7 @@
      },
      "source": [
        "# DEPRECIATED NOTEBOOK, THE NEW UNIT 1 IS HERE: https://huggingface.co/deep-rl-course/unit1/introduction",
+        "\n",
        "**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit1/introduction",
        "\n",
        "\n",
--- a/unit2/README.md
+++ b/unit2/README.md
@@ -1,4 +1,4 @@
-"# DEPRECIATED UNIT, THE NEW UNIT 2 IS HERE: https://huggingface.co/deep-rl-course/unit2/introduction
+# DEPRECIATED UNIT, THE NEW UNIT 2 IS HERE: https://huggingface.co/deep-rl-course/unit2/introduction
 **Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit2/introduction"


--- a/unit2/unit2.ipynb
+++ b/unit2/unit2.ipynb
@@ -16,8 +16,11 @@
        "id": "njb_ProuHiOe"
      },
      "source": [
-        "# DEPRECIATED NOTEBOOK, THE NEW UNIT 2 IS HERE: https://huggingface.co/deep-rl-course/unit2/introduction
-        **Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit2/introduction"
+        "# DEPRECIATED NOTEBOOK, THE NEW UNIT 2 IS HERE: https://huggingface.co/deep-rl-course/unit2/introduction",
+        "\n",
+        "**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit2/introduction",
+        "\n",
+        "\n",
        "# Unit 2: Q-Learning with FrozenLake-v1 ⛄ and Taxi-v3 🚕\n",
        "\n",
        "In this notebook, **you'll code from scratch your first Reinforcement Learning agent** playing FrozenLake ❄️ using Q-Learning, share it to the community, and experiment with different configurations\n",
--- a/unit3/README.md
+++ b/unit3/README.md
@@ -1,6 +1,10 @@
+# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit3/introduction
+**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit3/introduction
+
+
 # Unit 3: Deep Q-Learning with Atari Games 👾

-In this Unit, **we'll study our first Deep Reinforcement Learning agent**: Deep Q-Learning. 
+In this Unit, **we'll study our first Deep Reinforcement Learning agent**: Deep Q-Learning.

 And **we'll train it to play Space Invaders and other Atari environments using [RL-Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)**, a training framework for RL using Stable-Baselines that provides scripts for training, evaluating agents, tuning hyperparameters, plotting results, and recording videos.

--- a/unit3/unit3.ipynb
+++ b/unit3/unit3.ipynb
@@ -16,6 +16,11 @@
    "id": "k7xBVPzoXxOg"
   },
   "source": [
+     "# DEPRECIATED NOTEBOOK, THE NEW UNIT 3 IS HERE: https://huggingface.co/deep-rl-course/unit3/introduction",
+     "\n",
+     "**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit3/introduction",
+     "\n",
+     "\n",
    "# Unit 3: Deep Q-Learning with Atari Games 👾 using RL Baselines3 Zoo\n",
    "\n",
    "In this notebook, **you'll train a Deep Q-Learning agent** playing Space Invaders using [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), a training framework based on [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) that provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.\n",
--- a/unit4/README.md
+++ b/unit4/README.md
@@ -1,3 +1,7 @@
+# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit5/introduction
+**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit5/introduction
+
+
 # Unit 4: An Introduction to Unity MLAgents with Hugging Face 🤗
 ![cover](https://miro.medium.com/max/1400/1*8DV9EFl-vdijvcTHilHuEw.png)

--- a/unit4/unit4.ipynb
+++ b/unit4/unit4.ipynb
@@ -16,6 +16,11 @@
        "id": "2D3NL_e4crQv"
      },
      "source": [
+        "# DEPRECIATED NOTEBOOK, THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit5/introduction",
+        "\n",
+        "**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit5/introduction",
+        "\n",
+        "\n",
        "# Unit 4: Let's learn about Unity ML-Agents with Hugging Face 🤗\n",
        "\n"
      ]
@@ -561,4 +566,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 0
-}
+}
--- a/unit5/README.md
+++ b/unit5/README.md
@@ -1,6 +1,10 @@
-# Unit 5: Policy Gradient with PyTorch 
+# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit4/introduction
+**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit4/introduction

-In this Unit, **we'll study Policy Gradient Methods**. 
+
+# Unit 5: Policy Gradient with PyTorch
+
+In this Unit, **we'll study Policy Gradient Methods**.

 And we'll **implement Reinforce (a policy gradient method) from scratch using PyTorch**. Before testing its robustness using CartPole-v1, PixelCopter, and Pong.

--- a/unit5/unit5.ipynb
+++ b/unit5/unit5.ipynb
@@ -16,6 +16,11 @@
    "id": "CjRWziAVU2lZ"
   },
   "source": [
+     "# DEPRECIATED NOTEBOOK, THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit4/introduction",
+     "\n",
+     "**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit4/introduction",
+     "\n",
+     "\n",
    "# Unit 5: Code your first Deep Reinforcement Learning Algorithm with PyTorch: Reinforce. And test its robustness 💪\n",
    "In this notebook, you'll code your first Deep Reinforcement Learning algorithm from scratch: Reinforce (also called Monte Carlo Policy Gradient).\n",
    "\n",
--- a/unit7/README.md
+++ b/unit7/README.md
@@ -1,3 +1,7 @@
+# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit6/introduction
+**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit6/introduction
+
+
 # Unit 7: Advantage Actor Critic (A2C) using Robotics Simulations with PyBullet 🤖

 One of the major industries that use Reinforcement Learning is robotics. Unfortunately, **having access to robot equipment is very expensive**. Fortunately, some simulations exist to train Robots:
@@ -32,7 +36,7 @@ Thanks to a leaderboard, you'll be able to compare your results with other class
 The leaderboard 👉 https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard

 ## Additional readings 📚
- [Making Sense of the Bias / Variance Trade-off in (Deep) Reinforcement Learning](https://blog.mlreview.com/making-sense-of-the-bias-variance-trade-off-in-deep-reinforcement-learning-79cf1e83d565) 
+- [Making Sense of the Bias / Variance Trade-off in (Deep) Reinforcement Learning](https://blog.mlreview.com/making-sense-of-the-bias-variance-trade-off-in-deep-reinforcement-learning-79cf1e83d565)
 - [Bias-variance Tradeoff in Reinforcement Learning](https://www.endtoend.ai/blog/bias-variance-tradeoff-in-reinforcement-learning/)
 - [Foundations of Deep RL Series, L3 Policy Gradients and Advantage Estimation by Pieter Abbeel](https://youtu.be/AKbX1Zvo7r8)

--- a/unit7/unit7.ipynb
+++ b/unit7/unit7.ipynb
@@ -34,6 +34,11 @@
    {
      "cell_type": "markdown",
      "source": [
+        "# DEPRECIATED NOTEBOOK, THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit6/introduction",
+        "\n",
+        "**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit6/introduction",
+        "\n",
+        "\n",
        "# Unit 7: Advantage Actor Critic (A2C) using Robotics Simulations with PyBullet 🤖\n",
        "In this small notebook you'll learn to use A2C with PyBullet. And train an agent to walk. More precisely a spider (they say Ant but come on... it's a spider 😆) 🕸️\n",
        "\n",
@@ -533,4 +538,4 @@
      }
    }
  ]
-}
+}
--- a/unit8/README.md
+++ b/unit8/README.md
@@ -1,3 +1,7 @@
+# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit8/introduction
+**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit8/introduction
+
+
 # Unit 8: Proximal Policy Optimization (PPO) with PyTorch

 Today we'll learn about Proximal Policy Optimization (PPO), an architecture that improves our agent's training stability by avoiding too large policy updates. To do that, we use a ratio that will indicates the difference between our current and old policy and clip this ratio from a specific range  $[1 - \epsilon, 1 + \epsilon]$. Doing this will ensure that our policy update will not be too large and that the training is more stable.
@@ -29,7 +33,7 @@ Thanks to a leaderboard, you'll be able to compare your results with other class
 The leaderboard 👉 https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard

 ## Additional readings 📚
- [Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization by Daniel Bick](https://fse.studenttheses.ub.rug.nl/25709/1/mAI_2021_BickD.pdf) 
+- [Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization by Daniel Bick](https://fse.studenttheses.ub.rug.nl/25709/1/mAI_2021_BickD.pdf)
 - [What is the way to understand Proximal Policy Optimization Algorithm in RL?](https://stackoverflow.com/questions/46422845/what-is-the-way-to-understand-proximal-policy-optimization-algorithm-in-rl)
 - [Foundations of Deep RL Series, L4 TRPO and PPO by Pieter Abbeel](https://youtu.be/KjWF8VIMGiY)
 - [OpenAI PPO Blogpost](https://openai.com/blog/openai-baselines-ppo/)
--- a/unit8/unit8.ipynb
+++ b/unit8/unit8.ipynb
@@ -16,6 +16,11 @@
        "id": "-cf5-oDPjwf8"
      },
      "source": [
+        "# DEPRECIATED NOTEBOOK, THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit8/introduction",
+        "\n",
+        "**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit8/introduction",
+        "\n",
+        "\n",
        "# Unit 8: Proximal Policy Gradient (PPO) with PyTorch 🤖\n",
        "\n",
        "In this unit, you'll learn to **code your PPO agent from scratch with PyTorch**.\n",
--- a/unit9/README.md
+++ b/unit9/README.md
@@ -1,8 +1,13 @@
+# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unitbonus3/decision-transformers
+**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unitbonus3/decision-transformers
+
+
+
 # Unit 9: Decision Transformers and offline Reinforcement Learning 🤖

 ![cover](assets/img/thumbnail.gif)

-In this Unit, you'll learn what is Decision Transformer and Offline Reinforcement Learning. And then, you’ll train your first Offline Decision Transformer model from scratch to make a half-cheetah run. 
+In this Unit, you'll learn what is Decision Transformer and Offline Reinforcement Learning. And then, you’ll train your first Offline Decision Transformer model from scratch to make a half-cheetah run.

 This course is **self-paced**, you can start whenever you want.

@@ -18,12 +23,12 @@ Here are the steps for this Unit:

 2️⃣ 👩‍💻 Then dive on the first hands-on.
 👩‍💻 The hands-on 👉 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1K3UuajwoPY1MzRKNkONNRS3gS5DxZ-qF?usp=sharing)
- 
+
 3️⃣ 📖 Read [Train your first Decision Transformer](https://huggingface.co/blog/train-decision-transformers)

-4️⃣ 👩‍💻 Then dive on the hands-on, where **you’ll train your first Offline Decision Transformer model from scratch to make a half-cheetah run**. 
+4️⃣ 👩‍💻 Then dive on the hands-on, where **you’ll train your first Offline Decision Transformer model from scratch to make a half-cheetah run**.
 👩‍💻 The hands-on 👉 https://github.com/huggingface/blog/blob/main/notebooks/101_train-decision-transformers.ipynb
- 
+
 ## How to make the most of this course

 To make the most of the course, my advice is to:
--- a/units/en/unit0/discord101.mdx
+++ b/units/en/unit0/discord101.mdx
@@ -9,11 +9,11 @@ Discord is a free chat platform. If you've used Slack, **it's quite similar**. T

 Starting in Discord can be a bit intimidating, so let me take you through it.

-When you sign-up to our Discord server, you'll need to specify which topics you're interested in by **clicking #role-assignment at the left**. 
+When you sign-up to our Discord server, you'll need to specify which topics you're interested in by **clicking #role-assignment at the left**.

 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/discord1.jpg" alt="Discord"/>

-In #role-assignment, you can pick different categories. Make sure to **click "Reinforcement Learning"**. You'll then get to **introduce yourself in the `#introduction-yourself` channel**.
+In #role-assignment, you can pick different categories. Make sure to **click "Reinforcement Learning"**. You'll then get to **introduce yourself in the `#introduce-yourself` channel**.

 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/discord2.jpg" alt="Discord"/>

--- a/units/en/unit1/hands-on.mdx
+++ b/units/en/unit1/hands-on.mdx
@@ -43,13 +43,6 @@ You can either do this hands-on by reading the notebook or following it with the

 In this notebook, you'll train your **first Deep Reinforcement Learning agent** a Lunar Lander agent that will learn to **land correctly on the Moon 🌕**. Using [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) a Deep Reinforcement Learning library, share them with the community, and experiment with different configurations

-⬇️ Here is an example of what **you will achieve in just a couple of minutes.** ⬇️
-
-```python
-%%html
-<video controls autoplay><source src="https://huggingface.co/ThomasSimonini/ppo-LunarLander-v2/resolve/main/replay.mp4" type="video/mp4"></video>
-```
-
 ### The environment 🎮

 - [LunarLander-v2](https://www.gymlibrary.dev/environments/box2d/lunar_lander/)
@@ -92,7 +85,7 @@ Before diving into the notebook, you need to:

 🔲 📝 **Read Unit 0** that gives you all the **information about the course and help you to onboard** 🤗

-🔲 📚 **Develop an understanding of the foundations of Reinforcement learning** (MC, TD, Rewards hypothesis...) by doing Unit 1
+🔲 📚 **Develop an understanding of the foundations of Reinforcement learning** by reading Unit 1

 ## A small recap of what is Deep Reinforcement Learning 📚
 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/RL_process_game.jpg" alt="The RL process" width="100%">
--- a/units/en/unit1/introduction.mdx
+++ b/units/en/unit1/introduction.mdx
@@ -22,6 +22,6 @@ It's essential **to master these elements** before diving into implementing Dee

 After this unit, in a bonus unit, you'll be **able to train Huggy the Dog 🐶 to fetch the stick and play with him 🤗**.

-<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/huggy.jpg" alt="Huggy"/>
+<video src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/huggy.mp4" type="video/mp4" controls autoplay loop />

 So let's get started! 🚀