mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-03 10:38:27 +08:00
Update MLAgents
This commit is contained in:
@@ -5,8 +5,10 @@ Congrats on finishing this unit! You’ve just trained your first ML-Agents and
|
||||
The best way to learn is to **practice and try stuff**. Why not try another environment? [ML-Agents has 18 different environments](https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Learning-Environment-Examples.md).
|
||||
|
||||
For instance:
|
||||
- *Worm*, where you teach a worm to crawl.
|
||||
- *Walker*: teach an agent to walk towards a goal.
|
||||
- [Worm](https://huggingface.co/spaces/unity/ML-Agents-Worm), where you teach a worm to crawl.
|
||||
- [Walker](https://huggingface.co/spaces/unity/ML-Agents-Walker): teach an agent to walk towards a goal.
|
||||
|
||||
Check the documentation to find how to train them and the list of already integrated MLAgents environments on the Hub: https://github.com/huggingface/ml-agents#getting-started
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/envs-unity.jpeg" alt="Example envs"/>
|
||||
|
||||
|
||||
@@ -1 +1,29 @@
|
||||
# Hands-on
|
||||
|
||||
<CourseFloatingBanner classNames="absolute z-10 right-0 top-0"
|
||||
notebooks={[
|
||||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/deep-rl-class/blob/main/notebooks/unit5/unit5.ipynb"}
|
||||
]}
|
||||
askForHelpUrl="http://hf.co/join/discord" />
|
||||
|
||||
|
||||
Now that we learned what is ML-Agents, how it works and that we studied the two environments we're going to use. We're ready to train our agents.
|
||||
|
||||
- The first one will learn to **shoot snowballs onto spawning target**.
|
||||
- The second need to **press a button to spawn a pyramid, then navigate to the pyramid, knock it over, and move to the gold brick at the top**. To do that, it will need to explore its environment, and we will use a technique called curiosity.
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/envs.png" alt="Environments" />
|
||||
|
||||
After that, you'll be able to watch your agents playing directly on your browser.
|
||||
|
||||
The ML-Agents integration on the Hub **is still experimental**, some features will be added in the future. But for now, to validate this hands-on for the certification process, you just need to push your trained models to the Hub.
|
||||
There's no results to attain to validate this one. But if you want to get nice results you can try to attain:
|
||||
|
||||
- For [Pyramids](https://huggingface.co/spaces/unity/ML-Agents-Pyramids): Mean Reward = 1.75
|
||||
- For [SnowballTarget](https://huggingface.co/spaces/ThomasSimonini/ML-Agents-SnowballTarget): Mean Reward ⁼ 15 or 30 targets shoot in an episode.
|
||||
|
||||
For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process
|
||||
|
||||
**To start the hands-on click on Open In Colab button** 👇 :
|
||||
|
||||
[](https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/unit5/unit5.ipynb)
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# An Introduction to Unity ML-Agents [[introduction-to-ml-agents]]
|
||||
|
||||
One of the challenges in Reinforcement Learning is to **create environments**. Fortunately for us, game engines are the perfect tool to use.
|
||||
One of the challenges in Reinforcement Learning is to **create environments**. Fortunately for us, we can use game engines.
|
||||
Game engines like [Unity](https://unity.com/), [Godot](https://godotengine.org/) or [Unreal Engine](https://www.unrealengine.com/), are programs made to create video games. They are perfectly suited
|
||||
for creating environments: they provide physics systems, 2D/3D rendering, and more.
|
||||
|
||||
|
||||
@@ -11,7 +11,6 @@ The reward function is:
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/pyramids-reward.png" alt="Pyramids Environment"/>
|
||||
|
||||
|
||||
To train this new agent that seeks that button and then the Pyramid to destroy, we’ll use a combination of two types of rewards:
|
||||
|
||||
- The *extrinsic one* given by the environment (illustration above).
|
||||
@@ -27,7 +26,8 @@ In terms of observation, we **use 148 raycasts that can each detect objects** (s
|
||||
|
||||
We also use a **boolean variable indicating the switch state** (did we turn on or not the switch to spawn the Pyramid) and a vector that **contains the agent’s speed**.
|
||||
|
||||
ADD SCREENSHOT CODE
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/pyramids-obs-code.png" alt="Pyramids obs code"/>
|
||||
|
||||
|
||||
## The action space
|
||||
|
||||
|
||||
@@ -1,14 +1,14 @@
|
||||
# The SnowballTarget Environment
|
||||
|
||||
TODO Add gif snowballtarget environment
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/snowballtarget.gif" alt="SnowballTarget"/>
|
||||
|
||||
## The Agent's Goal
|
||||
|
||||
The first agent you're going to train is Julien the bear (the name is based after our [CTO Julien Chaumond](https://twitter.com/julien_c)) **to hit targets with snowballs**.
|
||||
The first agent you're going to train is Julien the bear 🐻 (the name is based after our [CTO Julien Chaumond](https://twitter.com/julien_c)) **to hit targets with snowballs**.
|
||||
|
||||
The goal in this environment is that Julien the bear **hit as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot**.
|
||||
The goal in this environment is that Julien **hits as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot**.
|
||||
|
||||
In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep), **Julien the bear has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again).
|
||||
In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep), **Julien has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again).
|
||||
|
||||
<figure>
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/cooloffsystem.gif" alt="Cool Off System"/>
|
||||
@@ -17,8 +17,11 @@ In addition, to avoid "snowball spamming" (aka shooting a snowball every timeste
|
||||
|
||||
## The reward function and the reward engineering problem
|
||||
|
||||
The reward function is simple. **The environment gives a +1 reward every time the agent's snowball hits a target**.
|
||||
Because the agent's goal is to maximize the expected cumulative reward, **it will try to hit as many targets as possible**.
|
||||
The reward function is simple. **The environment gives a +1 reward every time the agent's snowball hits a target** and because the agent's goal is to maximize the expected cumulative reward, **it will try to hit as many targets as possible**.
|
||||
|
||||
In terms of code it looks like this:
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/snowballtarget-reward-code.png" alt="Reward"/>
|
||||
|
||||
We could have a more complex reward function (with a penalty to push the agent to go faster, etc.). But when you design an environment, you need to avoid the *reward engineering problem*, which is having a too complex reward function to force your agent to behave as you want it to do.
|
||||
Why? Because by doing that, **you might miss interesting strategies that the agent will find with a simpler reward function**.
|
||||
@@ -38,11 +41,14 @@ Think of raycasts as lasers that will detect if it passes through an object.
|
||||
|
||||
|
||||
In this environment our agent have multiple set of raycasts:
|
||||
-
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/snowball_target_raycasts.png" alt="Raycasts"/>
|
||||
|
||||
TODO: ADd explanation vector
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/snowballtarget-obs-code.png" alt="Obs"/>
|
||||
|
||||
## The action space
|
||||
|
||||
The action space is discrete with TODO ADD
|
||||
IMAGE
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/snowballtarget_action_space.png" alt="Action Space"/>
|
||||
|
||||
Reference in New Issue
Block a user