diff --git a/units/en/_toctree.yml b/units/en/_toctree.yml index 3d6a696..8d1b138 100644 --- a/units/en/_toctree.yml +++ b/units/en/_toctree.yml @@ -144,10 +144,10 @@ title: (Optional) What is curiosity in Deep Reinforcement Learning? - local: unit5/hands-on title: Hands-on - - local: unit5/conclusion - title: Conclusion - local: unit5/bonus title: Bonus. Learn to create your own environments with Unity and MLAgents + - local: unit5/conclusion + title: Conclusion - title: What's next? New Units Publishing Schedule sections: - local: communication/publishing-schedule diff --git a/units/en/unit5/conclusion.mdx b/units/en/unit5/conclusion.mdx index 4083bb1..4719c61 100644 --- a/units/en/unit5/conclusion.mdx +++ b/units/en/unit5/conclusion.mdx @@ -11,9 +11,9 @@ For instance: Example envs -In the next unit, we're going to learn about multi-agents. And you're going to train your first multi-agents to compete in Soccer and Snowball fights against other classmate's agents. +In the next unit, we're going to learn about multi-agents. And you're going to train your first multi-agents to compete in Soccer and Snowball fight against other classmate's agents. -TODO add image +Snownball fight Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉 [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9) diff --git a/units/en/unit5/how-mlagents-works.mdx b/units/en/unit5/how-mlagents-works.mdx index 55c29f9..95f3b87 100644 --- a/units/en/unit5/how-mlagents-works.mdx +++ b/units/en/unit5/how-mlagents-works.mdx @@ -1,6 +1,6 @@ # How do Unity ML-Agents work? [[how-mlagents-works]] -Before training our agent, we need to understand what is ML-Agents and how it works. +Before training our agent, we need to understand **what is ML-Agents and how it works**. ## What is Unity ML-Agents? [[what-is-mlagents]] diff --git a/units/en/unit5/introduction.mdx b/units/en/unit5/introduction.mdx index fc13ec3..5746ac3 100644 --- a/units/en/unit5/introduction.mdx +++ b/units/en/unit5/introduction.mdx @@ -1,6 +1,9 @@ # An Introduction to Unity ML-Agents [[introduction-to-ml-agents]] -One of the critical elements in Reinforcement Learning is **to be able to create environments**. An interesting tool to use for that is game engines such as Godot, Unity, or Unreal Engine. +One of the challenges in Reinforcement Learning is to **create environments**. Fortunately for us, game engines are the perfect tool to use. +Game engines like [Unity](https://unity.com/), [Godot](https://godotengine.org/) or [Unreal Engine](https://www.unrealengine.com/), are programs made to create video games. They are perfectly suited +for creating environments: they provide physics systems, 2D/3D rendering, and more. + One of them, [Unity](https://unity.com/), created the [Unity ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents), a plugin based on the game engine Unity that allows us **to use the Unity Game Engine as an environment builder to train agents**. diff --git a/units/en/unit5/snowball-target.mdx b/units/en/unit5/snowball-target.mdx index 4d1e7fe..c65511d 100644 --- a/units/en/unit5/snowball-target.mdx +++ b/units/en/unit5/snowball-target.mdx @@ -1,26 +1,30 @@ # The SnowballTarget Environment +TODO Add gif snowballtarget environment + ## The Agent's Goal The first agent you're going to train is Julien the bear (the name is based after our [CTO Julien Chaumond](https://twitter.com/julien_c)) **to hit targets with snowballs**. -The goal in this environment is that Julien the bear **hit as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot**. In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep),**Julien the bear has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again). +The goal in this environment is that Julien the bear **hit as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot**. + +In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep), **Julien the bear has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again). + +ADD GIF COOLOFF ## The reward function and the reward engineering problem -The reward function is simple. **The environment gives a +1 reward every time the agent hits a target**. -Because the agent's goal is to maximize the expected cumulative reward, it will try to hit as many targets as possible. +The reward function is simple. **The environment gives a +1 reward every time the agent's snowball hits a target**. +Because the agent's goal is to maximize the expected cumulative reward, **it will try to hit as many targets as possible**. We could have a more complex reward function (with a penalty to push the agent to go faster, etc.). But when you design an environment, you need to avoid the *reward engineering problem*, which is having a too complex reward function to force your agent to behave as you want it to do. Why? Because by doing that, **you might miss interesting strategies that the agent will find with a simpler reward function**. -TODO ADD IMAGE REWARD +Reward system ## The observation space -Regarding observations, we don't use normal vision (frame), but we use raycasts. - - TOOD ADD raycasts that can each detect objects (target, walls) and how much we have +Regarding observations, we don't use normal vision (frame), but **we use raycasts**. Think of raycasts as lasers that will detect if it passes through an object. @@ -29,6 +33,15 @@ Think of raycasts as lasers that will detect if it passes through an object.
Source: ML-Agents documentation
+ +In this environment our agent have multiple set of raycasts: +- + + + TOOD ADD raycasts that can each detect objects (target, walls) and how much we have + + + ## The action space The action space is discrete with TODO ADD