mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-13 18:00:45 +08:00
Updates MLAgents Unit
This commit is contained in:
@@ -144,10 +144,10 @@
|
||||
title: (Optional) What is curiosity in Deep Reinforcement Learning?
|
||||
- local: unit5/hands-on
|
||||
title: Hands-on
|
||||
- local: unit5/conclusion
|
||||
title: Conclusion
|
||||
- local: unit5/bonus
|
||||
title: Bonus. Learn to create your own environments with Unity and MLAgents
|
||||
- local: unit5/conclusion
|
||||
title: Conclusion
|
||||
- title: What's next? New Units Publishing Schedule
|
||||
sections:
|
||||
- local: communication/publishing-schedule
|
||||
|
||||
@@ -11,9 +11,9 @@ For instance:
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/envs-unity.jpeg" alt="Example envs"/>
|
||||
|
||||
|
||||
In the next unit, we're going to learn about multi-agents. And you're going to train your first multi-agents to compete in Soccer and Snowball fights against other classmate's agents.
|
||||
In the next unit, we're going to learn about multi-agents. And you're going to train your first multi-agents to compete in Soccer and Snowball fight against other classmate's agents.
|
||||
|
||||
TODO add image
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/snowballfight.gif" alt="Snownball fight"/>
|
||||
|
||||
Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉 [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9)
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# How do Unity ML-Agents work? [[how-mlagents-works]]
|
||||
|
||||
Before training our agent, we need to understand what is ML-Agents and how it works.
|
||||
Before training our agent, we need to understand **what is ML-Agents and how it works**.
|
||||
|
||||
## What is Unity ML-Agents? [[what-is-mlagents]]
|
||||
|
||||
|
||||
@@ -1,6 +1,9 @@
|
||||
# An Introduction to Unity ML-Agents [[introduction-to-ml-agents]]
|
||||
|
||||
One of the critical elements in Reinforcement Learning is **to be able to create environments**. An interesting tool to use for that is game engines such as Godot, Unity, or Unreal Engine.
|
||||
One of the challenges in Reinforcement Learning is to **create environments**. Fortunately for us, game engines are the perfect tool to use.
|
||||
Game engines like [Unity](https://unity.com/), [Godot](https://godotengine.org/) or [Unreal Engine](https://www.unrealengine.com/), are programs made to create video games. They are perfectly suited
|
||||
for creating environments: they provide physics systems, 2D/3D rendering, and more.
|
||||
|
||||
|
||||
One of them, [Unity](https://unity.com/), created the [Unity ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents), a plugin based on the game engine Unity that allows us **to use the Unity Game Engine as an environment builder to train agents**.
|
||||
|
||||
|
||||
@@ -1,26 +1,30 @@
|
||||
# The SnowballTarget Environment
|
||||
|
||||
TODO Add gif snowballtarget environment
|
||||
|
||||
## The Agent's Goal
|
||||
|
||||
The first agent you're going to train is Julien the bear (the name is based after our [CTO Julien Chaumond](https://twitter.com/julien_c)) **to hit targets with snowballs**.
|
||||
|
||||
The goal in this environment is that Julien the bear **hit as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot**. In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep),**Julien the bear has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again).
|
||||
The goal in this environment is that Julien the bear **hit as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot**.
|
||||
|
||||
In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep), **Julien the bear has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again).
|
||||
|
||||
ADD GIF COOLOFF
|
||||
|
||||
## The reward function and the reward engineering problem
|
||||
|
||||
The reward function is simple. **The environment gives a +1 reward every time the agent hits a target**.
|
||||
Because the agent's goal is to maximize the expected cumulative reward, it will try to hit as many targets as possible.
|
||||
The reward function is simple. **The environment gives a +1 reward every time the agent's snowball hits a target**.
|
||||
Because the agent's goal is to maximize the expected cumulative reward, **it will try to hit as many targets as possible**.
|
||||
|
||||
We could have a more complex reward function (with a penalty to push the agent to go faster, etc.). But when you design an environment, you need to avoid the *reward engineering problem*, which is having a too complex reward function to force your agent to behave as you want it to do.
|
||||
Why? Because by doing that, **you might miss interesting strategies that the agent will find with a simpler reward function**.
|
||||
|
||||
TODO ADD IMAGE REWARD
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/snowballtarget_reward.png" alt="Reward system"/>
|
||||
|
||||
## The observation space
|
||||
|
||||
Regarding observations, we don't use normal vision (frame), but we use raycasts.
|
||||
|
||||
TOOD ADD raycasts that can each detect objects (target, walls) and how much we have
|
||||
Regarding observations, we don't use normal vision (frame), but **we use raycasts**.
|
||||
|
||||
Think of raycasts as lasers that will detect if it passes through an object.
|
||||
|
||||
@@ -29,6 +33,15 @@ Think of raycasts as lasers that will detect if it passes through an object.
|
||||
<figcaption>Source: <a href="https://github.com/Unity-Technologies/ml-agents">ML-Agents documentation</a></figcaption>
|
||||
</figure>
|
||||
|
||||
|
||||
In this environment our agent have multiple set of raycasts:
|
||||
-
|
||||
|
||||
|
||||
TOOD ADD raycasts that can each detect objects (target, walls) and how much we have
|
||||
|
||||
|
||||
|
||||
## The action space
|
||||
|
||||
The action space is discrete with TODO ADD
|
||||
|
||||
Reference in New Issue
Block a user