From bce8ba85ed6c856d294044e96b265d012c1fcd6b Mon Sep 17 00:00:00 2001
From: simoninithomas <simonini_thomas@outlook.fr>
Date: Sat, 7 Jan 2023 17:27:14 +0100
Subject: [PATCH] Update MLAgents

---
 units/en/unit5/conclusion.mdx      |  6 ++++--
 units/en/unit5/hands-on.mdx        | 28 ++++++++++++++++++++++++++++
 units/en/unit5/introduction.mdx    |  2 +-
 units/en/unit5/pyramids.mdx        |  4 ++--
 units/en/unit5/snowball-target.mdx | 22 ++++++++++++++--------
 5 files changed, 49 insertions(+), 13 deletions(-)
diff --git a/units/en/unit5/conclusion.mdx b/units/en/unit5/conclusion.mdx
index 4719c61..8f173fc 100644
--- a/units/en/unit5/conclusion.mdx
+++ b/units/en/unit5/conclusion.mdx
@@ -5,8 +5,10 @@ Congrats on finishing this unit! You’ve just trained your first ML-Agents and
 The best way to learn is to **practice and try stuff**. Why not try another environment? [ML-Agents has 18 different environments](https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Learning-Environment-Examples.md).
 
 For instance:
-- *Worm*, where you teach a worm to crawl.
-- *Walker*: teach an agent to walk towards a goal.
+- [Worm](https://huggingface.co/spaces/unity/ML-Agents-Worm), where you teach a worm to crawl.
+- [Walker](https://huggingface.co/spaces/unity/ML-Agents-Walker): teach an agent to walk towards a goal.
+
+Check the documentation to find how to train them and the list of already integrated MLAgents environments on the Hub: https://github.com/huggingface/ml-agents#getting-started
 
 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/envs-unity.jpeg" alt="Example envs"/>
 
diff --git a/units/en/unit5/hands-on.mdx b/units/en/unit5/hands-on.mdx
index 3654eda..258ae79 100644
--- a/units/en/unit5/hands-on.mdx
+++ b/units/en/unit5/hands-on.mdx
@@ -1 +1,29 @@
 # Hands-on
+
+<CourseFloatingBanner classNames="absolute z-10 right-0 top-0"
+notebooks={[
+  {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/deep-rl-class/blob/main/notebooks/unit5/unit5.ipynb"}
+  ]}
+  askForHelpUrl="http://hf.co/join/discord" />
+
+
+Now that we learned what is ML-Agents, how it works and that we studied the two environments we're going to use. We're ready to train our agents.
+
+- The first one will learn to **shoot snowballs onto spawning target**.
+- The second need to **press a button to spawn a pyramid, then navigate to the pyramid, knock it over, and move to the gold brick at the top**. To do that, it will need to explore its environment, and we will use a technique called curiosity.
+
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/envs.png" alt="Environments" />
+
+After that, you'll be able to watch your agents playing directly on your browser.
+
+The ML-Agents integration on the Hub **is still experimental**, some features will be added in the future. But for now, to validate this hands-on for the certification process, you just need to push your trained models to the Hub.
+There's no results to attain to validate this one. But if you want to get nice results you can try to attain:
+
+- For [Pyramids](https://huggingface.co/spaces/unity/ML-Agents-Pyramids): Mean Reward = 1.75
+- For [SnowballTarget](https://huggingface.co/spaces/ThomasSimonini/ML-Agents-SnowballTarget): Mean Reward ⁼ 15 or 30 targets shoot in an episode.
+
+  For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process
+
+  **To start the hands-on click on Open In Colab button** 👇 :
+
+  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/unit5/unit5.ipynb)
diff --git a/units/en/unit5/introduction.mdx b/units/en/unit5/introduction.mdx
index 5746ac3..5ef5795 100644
--- a/units/en/unit5/introduction.mdx
+++ b/units/en/unit5/introduction.mdx
@@ -1,6 +1,6 @@
 # An Introduction to Unity ML-Agents [[introduction-to-ml-agents]]
 
-One of the challenges in Reinforcement Learning is to **create environments**. Fortunately for us, game engines are the perfect tool to use.
+One of the challenges in Reinforcement Learning is to **create environments**. Fortunately for us, we can use game engines.
 Game engines like [Unity](https://unity.com/), [Godot](https://godotengine.org/) or [Unreal Engine](https://www.unrealengine.com/), are programs made to create video games. They are perfectly suited
 for creating environments: they provide physics systems, 2D/3D rendering, and more.
 
diff --git a/units/en/unit5/pyramids.mdx b/units/en/unit5/pyramids.mdx
index 2b6cdf9..4ddf267 100644
--- a/units/en/unit5/pyramids.mdx
+++ b/units/en/unit5/pyramids.mdx
@@ -11,7 +11,6 @@ The reward function is:
 
 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/pyramids-reward.png" alt="Pyramids Environment"/>
 
-
 To train this new agent that seeks that button and then the Pyramid to destroy, we’ll use a combination of two types of rewards:
 
 - The *extrinsic one* given by the environment (illustration above).
@@ -27,7 +26,8 @@ In terms of observation, we **use 148 raycasts that can each detect objects** (s
 
 We also use a **boolean variable indicating the switch state** (did we turn on or not the switch to spawn the Pyramid) and a vector that **contains the agent’s speed**.
 
-ADD SCREENSHOT CODE
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/pyramids-obs-code.png" alt="Pyramids obs code"/>
+
 
 ## The action space
 
diff --git a/units/en/unit5/snowball-target.mdx b/units/en/unit5/snowball-target.mdx
index a277b5d..5101716 100644
--- a/units/en/unit5/snowball-target.mdx
+++ b/units/en/unit5/snowball-target.mdx
@@ -1,14 +1,14 @@
 # The SnowballTarget Environment
 
-TODO Add gif snowballtarget environment
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/snowballtarget.gif" alt="SnowballTarget"/>
 
 ## The Agent's Goal
 
-The first agent you're going to train is Julien the bear (the name is based after our [CTO Julien Chaumond](https://twitter.com/julien_c)) **to hit targets with snowballs**.
+The first agent you're going to train is Julien the bear 🐻 (the name is based after our [CTO Julien Chaumond](https://twitter.com/julien_c)) **to hit targets with snowballs**.
 
-The goal in this environment is that Julien the bear **hit as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot**.
+The goal in this environment is that Julien **hits as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot**.
 
-In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep), **Julien the bear has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again).
+In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep), **Julien has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again).
 
 <figure>
 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/cooloffsystem.gif" alt="Cool Off System"/>
@@ -17,8 +17,11 @@ In addition, to avoid "snowball spamming" (aka shooting a snowball every timeste
 
 ## The reward function and the reward engineering problem
 
-The reward function is simple. **The environment gives a +1 reward every time the agent's snowball hits a target**.
-Because the agent's goal is to maximize the expected cumulative reward, **it will try to hit as many targets as possible**.
+The reward function is simple. **The environment gives a +1 reward every time the agent's snowball hits a target** and because the agent's goal is to maximize the expected cumulative reward, **it will try to hit as many targets as possible**.
+
+In terms of code it looks like this:
+
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/snowballtarget-reward-code.png" alt="Reward"/>
 
 We could have a more complex reward function (with a penalty to push the agent to go faster, etc.). But when you design an environment, you need to avoid the *reward engineering problem*, which is having a too complex reward function to force your agent to behave as you want it to do.
 Why? Because by doing that, **you might miss interesting strategies that the agent will find with a simpler reward function**.
@@ -38,11 +41,14 @@ Think of raycasts as lasers that will detect if it passes through an object.
 
 
 In this environment our agent have multiple set of raycasts:
--
 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/snowball_target_raycasts.png" alt="Raycasts"/>
 
+TODO: ADd explanation vector
+
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/snowballtarget-obs-code.png" alt="Obs"/>
 
 ## The action space
 
 The action space is discrete with TODO ADD
-IMAGE
+
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/snowballtarget_action_space.png" alt="Action Space"/>