Update MLAgents

2026-06-15 14:36:45 +08:00 · 2023-01-07 17:27:14 +01:00
parent 98f4c85709
commit bce8ba85ed
5 changed files with 49 additions and 13 deletions
--- a/units/en/unit5/conclusion.mdx
+++ b/units/en/unit5/conclusion.mdx
@@ -5,8 +5,10 @@ Congrats on finishing this unit! You’ve just trained your first ML-Agents and
 The best way to learn is to **practice and try stuff**. Why not try another environment? [ML-Agents has 18 different environments](https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Learning-Environment-Examples.md).

 For instance:
- *Worm*, where you teach a worm to crawl.
- *Walker*: teach an agent to walk towards a goal.
+- [Worm](https://huggingface.co/spaces/unity/ML-Agents-Worm), where you teach a worm to crawl.
+- [Walker](https://huggingface.co/spaces/unity/ML-Agents-Walker): teach an agent to walk towards a goal.
+
+Check the documentation to find how to train them and the list of already integrated MLAgents environments on the Hub: https://github.com/huggingface/ml-agents#getting-started

 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/envs-unity.jpeg" alt="Example envs"/>

--- a/units/en/unit5/hands-on.mdx
+++ b/units/en/unit5/hands-on.mdx
@@ -1 +1,29 @@
 # Hands-on
+
+<CourseFloatingBanner classNames="absolute z-10 right-0 top-0"
+notebooks={[
+  {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/deep-rl-class/blob/main/notebooks/unit5/unit5.ipynb"}
+  ]}
+  askForHelpUrl="http://hf.co/join/discord" />
+
+
+Now that we learned what is ML-Agents, how it works and that we studied the two environments we're going to use. We're ready to train our agents.
+
+- The first one will learn to **shoot snowballs onto spawning target**.
+- The second need to **press a button to spawn a pyramid, then navigate to the pyramid, knock it over, and move to the gold brick at the top**. To do that, it will need to explore its environment, and we will use a technique called curiosity.
+
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/envs.png" alt="Environments" />
+
+After that, you'll be able to watch your agents playing directly on your browser.
+
+The ML-Agents integration on the Hub **is still experimental**, some features will be added in the future. But for now, to validate this hands-on for the certification process, you just need to push your trained models to the Hub.
+There's no results to attain to validate this one. But if you want to get nice results you can try to attain:
+
+- For [Pyramids](https://huggingface.co/spaces/unity/ML-Agents-Pyramids): Mean Reward = 1.75
+- For [SnowballTarget](https://huggingface.co/spaces/ThomasSimonini/ML-Agents-SnowballTarget): Mean Reward ⁼ 15 or 30 targets shoot in an episode.
+
+  For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process
+
+  **To start the hands-on click on Open In Colab button** 👇 :
+
+  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/unit5/unit5.ipynb)
--- a/units/en/unit5/introduction.mdx
+++ b/units/en/unit5/introduction.mdx
@@ -1,6 +1,6 @@
 # An Introduction to Unity ML-Agents [[introduction-to-ml-agents]]

-One of the challenges in Reinforcement Learning is to **create environments**. Fortunately for us, game engines are the perfect tool to use.
+One of the challenges in Reinforcement Learning is to **create environments**. Fortunately for us, we can use game engines.
 Game engines like [Unity](https://unity.com/), [Godot](https://godotengine.org/) or [Unreal Engine](https://www.unrealengine.com/), are programs made to create video games. They are perfectly suited
 for creating environments: they provide physics systems, 2D/3D rendering, and more.

--- a/units/en/unit5/pyramids.mdx
+++ b/units/en/unit5/pyramids.mdx
@@ -11,7 +11,6 @@ The reward function is:

 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/pyramids-reward.png" alt="Pyramids Environment"/>

-
 To train this new agent that seeks that button and then the Pyramid to destroy, we’ll use a combination of two types of rewards:

 - The *extrinsic one* given by the environment (illustration above).
@@ -27,7 +26,8 @@ In terms of observation, we **use 148 raycasts that can each detect objects** (s

 We also use a **boolean variable indicating the switch state** (did we turn on or not the switch to spawn the Pyramid) and a vector that **contains the agent’s speed**.

-ADD SCREENSHOT CODE
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/pyramids-obs-code.png" alt="Pyramids obs code"/>
+

 ## The action space

--- a/units/en/unit5/snowball-target.mdx
+++ b/units/en/unit5/snowball-target.mdx
@@ -1,14 +1,14 @@
 # The SnowballTarget Environment

-TODO Add gif snowballtarget environment
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/snowballtarget.gif" alt="SnowballTarget"/>

 ## The Agent's Goal

-The first agent you're going to train is Julien the bear (the name is based after our [CTO Julien Chaumond](https://twitter.com/julien_c)) **to hit targets with snowballs**.
+The first agent you're going to train is Julien the bear 🐻 (the name is based after our [CTO Julien Chaumond](https://twitter.com/julien_c)) **to hit targets with snowballs**.

-The goal in this environment is that Julien the bear **hit as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot**.
+The goal in this environment is that Julien **hits as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot**.

-In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep), **Julien the bear has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again).
+In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep), **Julien has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again).

 <figure>
 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/cooloffsystem.gif" alt="Cool Off System"/>
@@ -17,8 +17,11 @@ In addition, to avoid "snowball spamming" (aka shooting a snowball every timeste

 ## The reward function and the reward engineering problem

-The reward function is simple. **The environment gives a +1 reward every time the agent's snowball hits a target**.
-Because the agent's goal is to maximize the expected cumulative reward, **it will try to hit as many targets as possible**.
+The reward function is simple. **The environment gives a +1 reward every time the agent's snowball hits a target** and because the agent's goal is to maximize the expected cumulative reward, **it will try to hit as many targets as possible**.
+
+In terms of code it looks like this:
+
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/snowballtarget-reward-code.png" alt="Reward"/>

 We could have a more complex reward function (with a penalty to push the agent to go faster, etc.). But when you design an environment, you need to avoid the *reward engineering problem*, which is having a too complex reward function to force your agent to behave as you want it to do.
 Why? Because by doing that, **you might miss interesting strategies that the agent will find with a simpler reward function**.
@@ -38,11 +41,14 @@ Think of raycasts as lasers that will detect if it passes through an object.


 In this environment our agent have multiple set of raycasts:
-
 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/snowball_target_raycasts.png" alt="Raycasts"/>

+TODO: ADd explanation vector
+
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/snowballtarget-obs-code.png" alt="Obs"/>

 ## The action space

 The action space is discrete with TODO ADD
-IMAGE
+
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/snowballtarget_action_space.png" alt="Action Space"/>