Updates MLAgents Unit

This commit is contained in:
simoninithomas
2023-01-07 10:12:52 +01:00
parent fb12b509ef
commit 759bf0d113
5 changed files with 29 additions and 13 deletions

View File

@@ -144,10 +144,10 @@
title: (Optional) What is curiosity in Deep Reinforcement Learning?
- local: unit5/hands-on
title: Hands-on
- local: unit5/conclusion
title: Conclusion
- local: unit5/bonus
title: Bonus. Learn to create your own environments with Unity and MLAgents
- local: unit5/conclusion
title: Conclusion
- title: What's next? New Units Publishing Schedule
sections:
- local: communication/publishing-schedule

View File

@@ -11,9 +11,9 @@ For instance:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/envs-unity.jpeg" alt="Example envs"/>
In the next unit, we're going to learn about multi-agents. And you're going to train your first multi-agents to compete in Soccer and Snowball fights against other classmate's agents.
In the next unit, we're going to learn about multi-agents. And you're going to train your first multi-agents to compete in Soccer and Snowball fight against other classmate's agents.
TODO add image
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/snowballfight.gif" alt="Snownball fight"/>
Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉 [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9)

View File

@@ -1,6 +1,6 @@
# How do Unity ML-Agents work? [[how-mlagents-works]]
Before training our agent, we need to understand what is ML-Agents and how it works.
Before training our agent, we need to understand **what is ML-Agents and how it works**.
## What is Unity ML-Agents? [[what-is-mlagents]]

View File

@@ -1,6 +1,9 @@
# An Introduction to Unity ML-Agents [[introduction-to-ml-agents]]
One of the critical elements in Reinforcement Learning is **to be able to create environments**. An interesting tool to use for that is game engines such as Godot, Unity, or Unreal Engine.
One of the challenges in Reinforcement Learning is to **create environments**. Fortunately for us, game engines are the perfect tool to use.
Game engines like [Unity](https://unity.com/), [Godot](https://godotengine.org/) or [Unreal Engine](https://www.unrealengine.com/), are programs made to create video games. They are perfectly suited
for creating environments: they provide physics systems, 2D/3D rendering, and more.
One of them, [Unity](https://unity.com/), created the [Unity ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents), a plugin based on the game engine Unity that allows us **to use the Unity Game Engine as an environment builder to train agents**.

View File

@@ -1,26 +1,30 @@
# The SnowballTarget Environment
TODO Add gif snowballtarget environment
## The Agent's Goal
The first agent you're going to train is Julien the bear (the name is based after our [CTO Julien Chaumond](https://twitter.com/julien_c)) **to hit targets with snowballs**.
The goal in this environment is that Julien the bear **hit as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot**. In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep),**Julien the bear has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again).
The goal in this environment is that Julien the bear **hit as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot**.
In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep), **Julien the bear has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again).
ADD GIF COOLOFF
## The reward function and the reward engineering problem
The reward function is simple. **The environment gives a +1 reward every time the agent hits a target**.
Because the agent's goal is to maximize the expected cumulative reward, it will try to hit as many targets as possible.
The reward function is simple. **The environment gives a +1 reward every time the agent's snowball hits a target**.
Because the agent's goal is to maximize the expected cumulative reward, **it will try to hit as many targets as possible**.
We could have a more complex reward function (with a penalty to push the agent to go faster, etc.). But when you design an environment, you need to avoid the *reward engineering problem*, which is having a too complex reward function to force your agent to behave as you want it to do.
Why? Because by doing that, **you might miss interesting strategies that the agent will find with a simpler reward function**.
TODO ADD IMAGE REWARD
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/snowballtarget_reward.png" alt="Reward system"/>
## The observation space
Regarding observations, we don't use normal vision (frame), but we use raycasts.
TOOD ADD raycasts that can each detect objects (target, walls) and how much we have
Regarding observations, we don't use normal vision (frame), but **we use raycasts**.
Think of raycasts as lasers that will detect if it passes through an object.
@@ -29,6 +33,15 @@ Think of raycasts as lasers that will detect if it passes through an object.
<figcaption>Source: <a href="https://github.com/Unity-Technologies/ml-agents">ML-Agents documentation</a></figcaption>
</figure>
In this environment our agent have multiple set of raycasts:
-
TOOD ADD raycasts that can each detect objects (target, walls) and how much we have
## The action space
The action space is discrete with TODO ADD