mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-08 21:30:45 +08:00
Update MLAgents draft
This commit is contained in:
@@ -76,9 +76,21 @@
|
||||
title: Conclusion
|
||||
- local: unit2/additional-readings
|
||||
title: Additional Readings
|
||||
- title: Unit 4. Introduction to ML-Agents
|
||||
- title: Unit 5. Introduction to ML-Agents
|
||||
sections:
|
||||
- local: unit4/introduction
|
||||
- local: unit5/introduction
|
||||
title: Introduction
|
||||
- local: unit4/how-mlagents-works
|
||||
- local: unit5/how-mlagents-works
|
||||
title: How ML-Agents works?
|
||||
- local: unit5/shoot-target-env
|
||||
title: The Shoot Target environment
|
||||
- local: unit5/pyramids
|
||||
title: The Pyramids environment
|
||||
- local: unit5/curiosity
|
||||
title: (Optional) What is curiosity in Deep Reinforcement Learning?
|
||||
- local: unit5/hands-on
|
||||
title: Hands-on
|
||||
- local: unit5/conclusion
|
||||
title: Conclusion
|
||||
- local: unit5/bonus
|
||||
title: Bonus. Learn to create your own environments with Unity and MLAgents
|
||||
|
||||
@@ -1,3 +0,0 @@
|
||||
# An Introduction to Unity ML-Agents [[introduction-to-ml-agents]]
|
||||
|
||||
Environment: Snowball target
|
||||
19
units/en/unit5/bonus.mdx
Normal file
19
units/en/unit5/bonus.mdx
Normal file
@@ -0,0 +1,19 @@
|
||||
# Bonus: Learn to create your own environments with Unity and MLAgents
|
||||
|
||||
**You can create your own reinforcement learning environments with Unity and MLAgents**. But, using a game engine such as Unity, can be intimidating at first but here are the steps you can do to learn smoothly.
|
||||
|
||||
## Step 1: Know how to use Unity
|
||||
|
||||
- The best way to learn Unity is to do ["Create with Code" course](https://learn.unity.com/course/create-with-code): it's a series of videos for beginners where **you will create 5 small games with Unity**.
|
||||
|
||||
## Step 2: Create the simplest environment with this tutorial
|
||||
|
||||
- Then, when you know how to use Unity, you can create your [first basic RL environment using this tutorial](https://github.com/Unity-Technologies/ml-agents/blob/release_20_docs/docs/Learning-Environment-Create-New.md).
|
||||
|
||||
## Step 3: Iterate and create nice environments
|
||||
|
||||
- Now that you've created a first simple environment you can iterate in more complex one using the [MLAgents documentation (especially Designing Agents and Agent part)](https://github.com/Unity-Technologies/ml-agents/blob/release_20_docs/docs/)
|
||||
- In addition, you can follow this free course ["Create a hummingbird environment"](https://learn.unity.com/course/ml-agents-hummingbirds) by [Adam Kelly](https://twitter.com/aktwelve)
|
||||
|
||||
|
||||
Have fun! And if you create custom environments don't hesitate to share them to `#rl-i-made-this` discord channel.
|
||||
20
units/en/unit5/conclusion.mdx
Normal file
20
units/en/unit5/conclusion.mdx
Normal file
@@ -0,0 +1,20 @@
|
||||
# Conclusion
|
||||
|
||||
Congrats on finishing this unit! You’ve just trained your first ML-Agents and shared it to the Hub 🥳.
|
||||
|
||||
The best way to learn is to **practice and try stuff**. Why not try another environment? [ML-Agents has 18 different environments](https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Learning-Environment-Examples.md).
|
||||
|
||||
For instance:
|
||||
- *Worm*, where you teach a worm to crawl.
|
||||
- *Walker*: teach an agent to walk towards a goal.
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/envs-unity.jpeg" alt="Example envs"/>
|
||||
|
||||
|
||||
In the next unit, we're going to learn about multi-agents. And you're going to train your first multi-agents to compete in Soccer and Snowball fights against other classmate's agents.
|
||||
|
||||
TODO add image
|
||||
|
||||
Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉 [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9)
|
||||
|
||||
### Keep Learning, stay awesome 🤗
|
||||
50
units/en/unit5/curiosity.mdx
Normal file
50
units/en/unit5/curiosity.mdx
Normal file
@@ -0,0 +1,50 @@
|
||||
# (Optional) What is curiosity in Deep Reinforcement Learning?
|
||||
|
||||
This is an (optional) introduction about curiosity. If you want to learn more you can read my two articles where I dive into the mathematical details:
|
||||
|
||||
- [Curiosity-Driven Learning through Next State Prediction](https://medium.com/data-from-the-trenches/curiosity-driven-learning-through-next-state-prediction-f7f4e2f592fa)
|
||||
- [Random Network Distillation: a new take on Curiosity-Driven Learning](https://medium.com/data-from-the-trenches/curiosity-driven-learning-through-random-network-distillation-488ffd8e5938)
|
||||
|
||||
## Two Major Problems in Modern RL
|
||||
|
||||
To understand what is curiosity, we need first to understand the two major problems with RL:
|
||||
|
||||
First, the *sparse rewards problem:* that is, **most rewards do not contain information, and hence are set to zero**.
|
||||
|
||||
Remember that RL is based on the *reward hypothesis*, which is the idea that each goal can be described as the maximization of the rewards. Therefore, rewards act as feedback for RL agents, **if they don’t receive any, their knowledge of which action is appropriate (or not) cannot change**.
|
||||
|
||||
|
||||
<figure>
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/curiosity1.png" alt="Curiosity"/>
|
||||
<figcaption>Source: Thanks to the reward, our agent knows that this action at that state was good</figcaption>
|
||||
</figure>
|
||||
|
||||
|
||||
For instance, in [Vizdoom](https://vizdoom.cs.put.edu.pl/), a set of environments based on the game Doom “DoomMyWayHome,” your agent is only rewarded **if it finds the vest**.
|
||||
However, the vest is far away from your starting point, so most of your rewards will be zero. Therefore, if our agent does not receive useful feedback (dense rewards), it will take much longer to learn an optimal policy and **it can spend time turning around without finding the goal**.
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/curiosity2.png" alt="Curiosity"/>
|
||||
|
||||
The second big problem is that **the extrinsic reward function is handmade, that is in each environment, a human has to implement a reward function**. But how we can scale that in big and complex environments?
|
||||
|
||||
## So what is curiosity?
|
||||
|
||||
A solution to these problems is **to develop a reward function that is intrinsic to the agent, i.e., generated by the agent itself**. The agent will act as a self-learner since it will be the student, but also its own feedback master.
|
||||
|
||||
**This intrinsic reward mechanism is known as curiosity** because this reward push to explore states that are novel/unfamiliar. In order to achieve that, our agent will receive a high reward when exploring new trajectories.
|
||||
|
||||
This reward is in fact designed on how human acts, **we have naturally an intrinsic desire to explore environments and discover new things**.
|
||||
|
||||
There are different ways to calculate this intrinsic reward, the classical one (curiosity through next-state prediction) was to calculate curiosity **as the error of our agent of predicting the next state, given the current state and action taken**.
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/curiosity3.png" alt="Curiosity"/>
|
||||
|
||||
Because the idea of curiosity is to **encourage our agent to perform actions that reduce the uncertainty in the agent’s ability to predict the consequences of its own actions** (uncertainty will be higher in areas where the agent has spent less time, or in areas with complex dynamics).
|
||||
|
||||
If the agent spend a lot of times on these states, it will be good to predict the next state (low curiosity), on the other hand, if it’s a new state unexplored, it will be bad to predict the next state (high curiosity).
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/curiosity4.png" alt="Curiosity"/>
|
||||
|
||||
Using curiosity will push our agent to favor transitions with high prediction error (which will be higher in areas where the agent has spent less time, or in areas with complex dynamics) and **consequently better explore our environment**.
|
||||
|
||||
There’s also **other curiosity calculation methods**. ML-Agents uses a more advanced one called Curiosity through random network distillation. This is out of the scope of the tutorial but if you’re interested [I wrote an article explaining it in detail](https://medium.com/data-from-the-trenches/curiosity-driven-learning-through-random-network-distillation-488ffd8e5938).
|
||||
1
units/en/unit5/hands-on.mdx
Normal file
1
units/en/unit5/hands-on.mdx
Normal file
@@ -0,0 +1 @@
|
||||
# Hands-on
|
||||
@@ -13,9 +13,9 @@ It’s developed by [Unity Technologies](https://unity.com/), the developers of
|
||||
<figcaption>Firewatch was made with Unity</figcaption>
|
||||
</figure>
|
||||
|
||||
## The four components [[four-components]]
|
||||
## The six components [[six-components]]
|
||||
|
||||
With Unity ML-Agents, you have four essential components:
|
||||
With Unity ML-Agents, you have six essential components:
|
||||
|
||||
<figure>
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/mlagents-1.png" alt="MLAgents"/>
|
||||
@@ -23,15 +23,17 @@ With Unity ML-Agents, you have four essential components:
|
||||
</figure>
|
||||
|
||||
- The first is the *Learning Environment*, which contains **the Unity scene (the environment) and the environment elements** (game characters).
|
||||
- The second is the *Python API* which contains **the low-level Python interface for interacting and manipulating the environment**. It’s the API we use to launch the training.
|
||||
- Then, we have the *Communicator* that **connects the environment (C#) with the Python API (Python)**.
|
||||
- Finally, we have the *Python trainers*: the **Reinforcement algorithms made with PyTorch (PPO, SAC…)**.
|
||||
- The second is the *Python Low-level API* which contains **the low-level Python interface for interacting and manipulating the environment**. It’s the API we use to launch the training.
|
||||
- Then, we have the *External Communicator* that **connects the Learning Environment (made with C#) with the low level Python API (Python)**.
|
||||
- The *Python trainers*: the **Reinforcement algorithms made with PyTorch (PPO, SAC…)**.
|
||||
- The *Gym wrapper*: to encapsulate RL environment in a gym wrapper.
|
||||
- The *PettingZoo wrapper*: PettingZoo is the multi-agents of gym wrapper.
|
||||
|
||||
## Inside the Learning Component [[inside-learning-component]]
|
||||
|
||||
Inside the Learning Component, we have **three important elements**:
|
||||
|
||||
- The first is the *agent*, the actor of the scene. We’ll **train the agent by optimizing its policy** (which will tell us what action to take in each state). The policy is called *Brain*.
|
||||
- The first is the *agent component*, the actor of the scene. We’ll **train the agent by optimizing its policy** (which will tell us what action to take in each state). The policy is called *Brain*.
|
||||
- Finally, there is the *Academy*. This component **orchestrates agents and their decision-making processes**. Think of this Academy as a teacher that handles the requests from the Python API.
|
||||
|
||||
To better understand its role, let’s remember the RL process. This can be modeled as a loop that works like this:
|
||||
26
units/en/unit5/introduction.mdx
Normal file
26
units/en/unit5/introduction.mdx
Normal file
@@ -0,0 +1,26 @@
|
||||
# An Introduction to Unity ML-Agents [[introduction-to-ml-agents]]
|
||||
|
||||
One of the critical element in Reinforcement Learning is **to be able to create environments**. An interesting tool to use for that is game engines such as Godot, Unity or Unreal Engine.
|
||||
|
||||
One of them, [Unity](https://unity.com/), created the [Unity ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents) , a plugin based on the game engine Unity that allows us **to use the Unity Game Engine as an environment builder to train agents**.
|
||||
|
||||
<figure>
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/example-envs.png" alt="MLAgents environments"/>
|
||||
<figcaption>Source: <a href="https://github.com/Unity-Technologies/ml-agents">ML-Agents documentation</figcaption>
|
||||
</figure>
|
||||
|
||||
From playing football (soccer), learning to walk and jumping big walls, Unity ML-Agents Toolkit provides a ton of exceptional pre-made environments.
|
||||
|
||||
In this Unit, we're going to learn to use ML-Agents, but **don't worry if you don't know how to use the Unity Game Engine**, you'll don't need to use.
|
||||
|
||||
And so, today, we're going to train two agents:
|
||||
- The first one will learn to **shoot snowballs onto spawning target**.
|
||||
- The second needs to **press a button to spawn a pyramid, then navigate to the pyramid, knock it over, and move to the gold brick at the top**. To do that, it will need to explore its environment, and we will use a technique called curiosity.
|
||||
|
||||
TODO: Add illustration environments
|
||||
|
||||
Then, after training we’ll push the trained agents to the Hugging Face Hub and you’ll be able to visualize it playing directly on your browser without having to use the Unity Editor. You’ll be also be able to visualize and download others trained agents from the community.
|
||||
|
||||
Doing this Unit will prepare you for the next challenge where you will train agent in multi-agents environments and compete against your classmates' agents.
|
||||
|
||||
Sounds exciting? Let's get started,
|
||||
32
units/en/unit5/pyramids.mdx
Normal file
32
units/en/unit5/pyramids.mdx
Normal file
@@ -0,0 +1,32 @@
|
||||
# The Pyramid environment
|
||||
|
||||
The goal in this environment is to train our agent to **get the gold brick on the top of the Pyramid. In order to do that, it needs to press a button to spawn a pyramid, navigate to the Pyramid, knock it over, and move to the gold brick at the top**.
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/pyramids.png" alt="Pyramids Environment"/>
|
||||
|
||||
|
||||
## The reward function
|
||||
|
||||
The reward function is:
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/pyramids-reward.png" alt="Pyramids Environment"/>
|
||||
|
||||
|
||||
To train this new agent that seeks that button and then the Pyramid to destroy, we’ll use a combination of two types of rewards:
|
||||
|
||||
- The *extrinsic one* given by the environment (illustration above).
|
||||
- But also an *intrinsic* one called **curiosity**. This second will **push our agent to be curious, or in other terms, to better explore its environment**.
|
||||
|
||||
If you want to know more about curiosity, the next section (optional) will explain the basics.
|
||||
|
||||
## The observation space
|
||||
|
||||
In terms of observation, we **use 148 raycasts that can each detect objects** (switch, bricks, golden brick, and walls.)
|
||||
|
||||
We also use a **boolean variable indicating the switch state** (did we turn on or not the switch to spawn the Pyramid) and a vector that **contains agent’ speed**.
|
||||
|
||||
## The action space
|
||||
|
||||
The action space is **discrete** with four possible actions:
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/pyramids-action.png" alt="Pyramids Environment"/>
|
||||
29
units/en/unit5/shoot-target-env.mdx
Normal file
29
units/en/unit5/shoot-target-env.mdx
Normal file
@@ -0,0 +1,29 @@
|
||||
# The Shoot Target Environment
|
||||
|
||||
## The Agent's Goal
|
||||
The first agent you're going to train is Julien the bear (the name is based after our [CTO Julien Chaumond](https://twitter.com/julien_c)) to shoot targets with snowballs.
|
||||
|
||||
The goal in this environment is that Julien the bear shoot the maximum of targets that spawned, in the limited time. To do that, he will need to move correctly towards the target and shoot.
|
||||
Given he needs to wait 2 second after launching a snowball, he needs to learn to shoot correctly.
|
||||
|
||||
TODO ADD GIF
|
||||
|
||||
## The reward function
|
||||
|
||||
TODO ADD IMAGE REWARD
|
||||
|
||||
## The observation space
|
||||
|
||||
In terms of observations, we don’t use normal vision (frame), but we use TOOD ADD raycasts that can each detect objects (target, walls)
|
||||
|
||||
Think of raycasts as lasers that will detect if it passes through an object.
|
||||
|
||||
<figure>
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/raycasts.png" alt="Raycasts"/>
|
||||
<figcaption>Source: <a href="https://github.com/Unity-Technologies/ml-agents">ML-Agents documentation</a></figcaption>
|
||||
</figure>
|
||||
|
||||
## The action space
|
||||
|
||||
The action space is discrete with TODO ADD
|
||||
IMAGE
|
||||
Reference in New Issue
Block a user