Updates MLAgents Unit

2026-06-15 06:27:24 +08:00 · 2023-01-07 10:12:52 +01:00
parent fb12b509ef
commit 759bf0d113
5 changed files with 29 additions and 13 deletions
--- a/units/en/_toctree.yml
+++ b/units/en/_toctree.yml
@@ -144,10 +144,10 @@
    title: (Optional) What is curiosity in Deep Reinforcement Learning?
  - local: unit5/hands-on
    title: Hands-on
-  - local: unit5/conclusion
-    title: Conclusion
  - local: unit5/bonus
    title: Bonus. Learn to create your own environments with Unity and MLAgents
+  - local: unit5/conclusion
+    title: Conclusion
 - title: What's next? New Units Publishing Schedule
  sections:
  - local: communication/publishing-schedule
--- a/units/en/unit5/conclusion.mdx
+++ b/units/en/unit5/conclusion.mdx
@@ -11,9 +11,9 @@ For instance:
 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/envs-unity.jpeg" alt="Example envs"/>


-In the next unit, we're going to learn about multi-agents. And you're going to train your first multi-agents to compete in Soccer and Snowball fights against other classmate's agents.
+In the next unit, we're going to learn about multi-agents. And you're going to train your first multi-agents to compete in Soccer and Snowball fight against other classmate's agents.

-TODO add image
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/snowballfight.gif" alt="Snownball fight"/>

 Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉  [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9)

--- a/units/en/unit5/how-mlagents-works.mdx
+++ b/units/en/unit5/how-mlagents-works.mdx
@@ -1,6 +1,6 @@
 # How do Unity ML-Agents work? [[how-mlagents-works]]

-Before training our agent, we need to understand what is ML-Agents and how it works.
+Before training our agent, we need to understand **what is ML-Agents and how it works**.

 ## What is Unity ML-Agents? [[what-is-mlagents]]

--- a/units/en/unit5/introduction.mdx
+++ b/units/en/unit5/introduction.mdx
@@ -1,6 +1,9 @@
 # An Introduction to Unity ML-Agents [[introduction-to-ml-agents]]

-One of the critical elements in Reinforcement Learning is **to be able to create environments**. An interesting tool to use for that is game engines such as Godot, Unity, or Unreal Engine.
+One of the challenges in Reinforcement Learning is to **create environments**. Fortunately for us, game engines are the perfect tool to use.
+Game engines like [Unity](https://unity.com/), [Godot](https://godotengine.org/) or [Unreal Engine](https://www.unrealengine.com/), are programs made to create video games. They are perfectly suited
+for creating environments: they provide physics systems, 2D/3D rendering, and more.
+

 One of them, [Unity](https://unity.com/), created the [Unity ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents), a plugin based on the game engine Unity that allows us **to use the Unity Game Engine as an environment builder to train agents**.

--- a/units/en/unit5/snowball-target.mdx
+++ b/units/en/unit5/snowball-target.mdx
@@ -1,26 +1,30 @@
 # The SnowballTarget Environment

+TODO Add gif snowballtarget environment
+
 ## The Agent's Goal

 The first agent you're going to train is Julien the bear (the name is based after our [CTO Julien Chaumond](https://twitter.com/julien_c)) **to hit targets with snowballs**.

-The goal in this environment is that Julien the bear **hit as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot**. In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep),**Julien the bear has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again).
+The goal in this environment is that Julien the bear **hit as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot**.
+
+In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep), **Julien the bear has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again).
+
+ADD GIF COOLOFF

 ## The reward function and the reward engineering problem

-The reward function is simple. **The environment gives a +1 reward every time the agent hits a target**.
-Because the agent's goal is to maximize the expected cumulative reward, it will try to hit as many targets as possible.
+The reward function is simple. **The environment gives a +1 reward every time the agent's snowball hits a target**.
+Because the agent's goal is to maximize the expected cumulative reward, **it will try to hit as many targets as possible**.

 We could have a more complex reward function (with a penalty to push the agent to go faster, etc.). But when you design an environment, you need to avoid the *reward engineering problem*, which is having a too complex reward function to force your agent to behave as you want it to do.
 Why? Because by doing that, **you might miss interesting strategies that the agent will find with a simpler reward function**.

-TODO ADD IMAGE REWARD
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/snowballtarget_reward.png" alt="Reward system"/>

 ## The observation space

-Regarding observations, we don't use normal vision (frame), but we use raycasts.
-
- TOOD ADD raycasts that can each detect objects (target, walls) and how much we have
+Regarding observations, we don't use normal vision (frame), but **we use raycasts**.

 Think of raycasts as lasers that will detect if it passes through an object.

@@ -29,6 +33,15 @@ Think of raycasts as lasers that will detect if it passes through an object.
 <figcaption>Source: <a href="https://github.com/Unity-Technologies/ml-agents">ML-Agents documentation</a></figcaption>
 </figure>

+
+In this environment our agent have multiple set of raycasts:
+-
+
+
+ TOOD ADD raycasts that can each detect objects (target, walls) and how much we have
+
+
+
 ## The action space

 The action space is discrete with TODO ADD