From bce8ba85ed6c856d294044e96b265d012c1fcd6b Mon Sep 17 00:00:00 2001 From: simoninithomas Date: Sat, 7 Jan 2023 17:27:14 +0100 Subject: [PATCH] Update MLAgents --- units/en/unit5/conclusion.mdx | 6 ++++-- units/en/unit5/hands-on.mdx | 28 ++++++++++++++++++++++++++++ units/en/unit5/introduction.mdx | 2 +- units/en/unit5/pyramids.mdx | 4 ++-- units/en/unit5/snowball-target.mdx | 22 ++++++++++++++-------- 5 files changed, 49 insertions(+), 13 deletions(-) diff --git a/units/en/unit5/conclusion.mdx b/units/en/unit5/conclusion.mdx index 4719c61..8f173fc 100644 --- a/units/en/unit5/conclusion.mdx +++ b/units/en/unit5/conclusion.mdx @@ -5,8 +5,10 @@ Congrats on finishing this unit! You’ve just trained your first ML-Agents and The best way to learn is to **practice and try stuff**. Why not try another environment? [ML-Agents has 18 different environments](https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Learning-Environment-Examples.md). For instance: -- *Worm*, where you teach a worm to crawl. -- *Walker*: teach an agent to walk towards a goal. +- [Worm](https://huggingface.co/spaces/unity/ML-Agents-Worm), where you teach a worm to crawl. +- [Walker](https://huggingface.co/spaces/unity/ML-Agents-Walker): teach an agent to walk towards a goal. + +Check the documentation to find how to train them and the list of already integrated MLAgents environments on the Hub: https://github.com/huggingface/ml-agents#getting-started Example envs diff --git a/units/en/unit5/hands-on.mdx b/units/en/unit5/hands-on.mdx index 3654eda..258ae79 100644 --- a/units/en/unit5/hands-on.mdx +++ b/units/en/unit5/hands-on.mdx @@ -1 +1,29 @@ # Hands-on + + + + +Now that we learned what is ML-Agents, how it works and that we studied the two environments we're going to use. We're ready to train our agents. + +- The first one will learn to **shoot snowballs onto spawning target**. +- The second need to **press a button to spawn a pyramid, then navigate to the pyramid, knock it over, and move to the gold brick at the top**. To do that, it will need to explore its environment, and we will use a technique called curiosity. + +Environments + +After that, you'll be able to watch your agents playing directly on your browser. + +The ML-Agents integration on the Hub **is still experimental**, some features will be added in the future. But for now, to validate this hands-on for the certification process, you just need to push your trained models to the Hub. +There's no results to attain to validate this one. But if you want to get nice results you can try to attain: + +- For [Pyramids](https://huggingface.co/spaces/unity/ML-Agents-Pyramids): Mean Reward = 1.75 +- For [SnowballTarget](https://huggingface.co/spaces/ThomasSimonini/ML-Agents-SnowballTarget): Mean Reward ⁼ 15 or 30 targets shoot in an episode. + + For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process + + **To start the hands-on click on Open In Colab button** 👇 : + + [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/unit5/unit5.ipynb) diff --git a/units/en/unit5/introduction.mdx b/units/en/unit5/introduction.mdx index 5746ac3..5ef5795 100644 --- a/units/en/unit5/introduction.mdx +++ b/units/en/unit5/introduction.mdx @@ -1,6 +1,6 @@ # An Introduction to Unity ML-Agents [[introduction-to-ml-agents]] -One of the challenges in Reinforcement Learning is to **create environments**. Fortunately for us, game engines are the perfect tool to use. +One of the challenges in Reinforcement Learning is to **create environments**. Fortunately for us, we can use game engines. Game engines like [Unity](https://unity.com/), [Godot](https://godotengine.org/) or [Unreal Engine](https://www.unrealengine.com/), are programs made to create video games. They are perfectly suited for creating environments: they provide physics systems, 2D/3D rendering, and more. diff --git a/units/en/unit5/pyramids.mdx b/units/en/unit5/pyramids.mdx index 2b6cdf9..4ddf267 100644 --- a/units/en/unit5/pyramids.mdx +++ b/units/en/unit5/pyramids.mdx @@ -11,7 +11,6 @@ The reward function is: Pyramids Environment - To train this new agent that seeks that button and then the Pyramid to destroy, we’ll use a combination of two types of rewards: - The *extrinsic one* given by the environment (illustration above). @@ -27,7 +26,8 @@ In terms of observation, we **use 148 raycasts that can each detect objects** (s We also use a **boolean variable indicating the switch state** (did we turn on or not the switch to spawn the Pyramid) and a vector that **contains the agent’s speed**. -ADD SCREENSHOT CODE +Pyramids obs code + ## The action space diff --git a/units/en/unit5/snowball-target.mdx b/units/en/unit5/snowball-target.mdx index a277b5d..5101716 100644 --- a/units/en/unit5/snowball-target.mdx +++ b/units/en/unit5/snowball-target.mdx @@ -1,14 +1,14 @@ # The SnowballTarget Environment -TODO Add gif snowballtarget environment +SnowballTarget ## The Agent's Goal -The first agent you're going to train is Julien the bear (the name is based after our [CTO Julien Chaumond](https://twitter.com/julien_c)) **to hit targets with snowballs**. +The first agent you're going to train is Julien the bear 🐻 (the name is based after our [CTO Julien Chaumond](https://twitter.com/julien_c)) **to hit targets with snowballs**. -The goal in this environment is that Julien the bear **hit as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot**. +The goal in this environment is that Julien **hits as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot**. -In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep), **Julien the bear has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again). +In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep), **Julien has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again).
Cool Off System @@ -17,8 +17,11 @@ In addition, to avoid "snowball spamming" (aka shooting a snowball every timeste ## The reward function and the reward engineering problem -The reward function is simple. **The environment gives a +1 reward every time the agent's snowball hits a target**. -Because the agent's goal is to maximize the expected cumulative reward, **it will try to hit as many targets as possible**. +The reward function is simple. **The environment gives a +1 reward every time the agent's snowball hits a target** and because the agent's goal is to maximize the expected cumulative reward, **it will try to hit as many targets as possible**. + +In terms of code it looks like this: + +Reward We could have a more complex reward function (with a penalty to push the agent to go faster, etc.). But when you design an environment, you need to avoid the *reward engineering problem*, which is having a too complex reward function to force your agent to behave as you want it to do. Why? Because by doing that, **you might miss interesting strategies that the agent will find with a simpler reward function**. @@ -38,11 +41,14 @@ Think of raycasts as lasers that will detect if it passes through an object. In this environment our agent have multiple set of raycasts: -- Raycasts +TODO: ADd explanation vector + +Obs ## The action space The action space is discrete with TODO ADD -IMAGE + +Action Space