Add hands on and conclusion

2026-06-15 06:27:24 +08:00 · 2023-01-31 16:43:38 +01:00
parent bd35700e90
commit 02a2b0de3f
3 changed files with 60 additions and 48 deletions
--- a/units/en/unit7/conclusion.mdx
+++ b/units/en/unit7/conclusion.mdx
@@ -0,0 +1,11 @@
+# Conclusion
+
+That’s all for today. Congrats on finishing unit and the tutorial!
+
+The best way to learn is to practice and try stuff. **Why not training another agent with a different configuration?**
+
+And don’t hesitate from time to time to check the [leaderboard](https://huggingface.co/spaces/huggingface-projects/AIvsAI-SoccerTwos)
+
+See you on Unit 8 🔥,
+
+## Keep Learning, Stay awesome 🤗
--- a/units/en/unit7/hands-on.mdx
+++ b/units/en/unit7/hands-on.mdx
@@ -8,7 +8,7 @@ To validate this hands-on for the certification process, you just need to push a

 For more information about the certification process, check this section 👉 [https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process)

-This hands-on will be different, since to get correct results you need to train your agents from 4 hours to 8 hours. And given the risk of timeout in colab we advise you to train on your own computer. You don’t need a super computer, a simple laptop is good enough for this exercise.
+This hands-on will be different, since to get correct results **you need to train your agents from 4 hours to 8 hours**. And given the risk of timeout in colab we advise you to train on your own computer. You don’t need a super computer, a simple laptop is good enough for this exercise.

 Let's get started,

@@ -59,48 +59,55 @@ We're constantly trying to improve our tutorials, so **if you find some issues

 ⚠ ⚠ ⚠  We’re not going to use the same version than for the Unit 5: Introduction to ML-Agents ⚠ ⚠ ⚠

-We advise you to use (conda)[[https://docs.conda.io/en/latest/](https://docs.conda.io/en/latest/)] as a package manager and create a new environment.
+We advise you to use [conda](https://docs.conda.io/en/latest/](https://docs.conda.io/en/latest/) as a package manager and create a new environment.

 With conda, we create a new environment called rl:

+```bash
 conda create --name rl python=3.8
-
 conda activate rl
+```

 To be able to train correctly our agents and push to the Hub, we need to install an experimental version of ML-Agents (the branch aivsai from Hugging Face ML-Agents fork)

+```bash
 git clone --branch aivsai [https://github.com/huggingface/ml-agents/](https://github.com/huggingface/ml-agents/)
+```

 When the cloning is done (it takes 2.5Go), we go inside the repository and install the package

+```bash
 cd ml-agents
 pip install -e ./ml-agents-envs
 pip install -e ./ml-agents
+```

 We also need to install pytorch with:

+```bash
 pip install torch
+```

 Now that’s installed we need to add the environment training executable. Based on your operating system you need to download one of them, unzip it and place it in a new folder inside `ml-agents`that you call `training-envs-executables`

 At the end your executable should be in `mlagents/training-envs-executables/SoccerTwos`

-Windows: [https://drive.google.com/file/d/1sqFxbEdTMubjVktnV4C6ICjp89wLhUcP/view?usp=sharing](https://drive.google.com/file/d/1sqFxbEdTMubjVktnV4C6ICjp89wLhUcP/view?usp=sharing)
+Windows: Download [this executable](https://drive.google.com/file/d/1sqFxbEdTMubjVktnV4C6ICjp89wLhUcP/view?usp=sharing)

-Linux (Ubuntu): [https://drive.google.com/file/d/1KuqBKYiXiIcU4kNMqEzhgypuFP5_45CL/view?usp=sharing](https://drive.google.com/file/d/1KuqBKYiXiIcU4kNMqEzhgypuFP5_45CL/view?usp=sharing)
+Linux (Ubuntu): Download [this executable](https://drive.google.com/file/d/1KuqBKYiXiIcU4kNMqEzhgypuFP5_45CL/view?usp=sharing)

 Mac: [ADD Berangere]

 ## Step 1: Understand the environment

-The environment is called `SoccerTwos` it was made by the Unity MLAgents Team. You can find its documentation here: [https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Learning-Environment-Examples.md#soccer-twos](https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Learning-Environment-Examples.md#soccer-twos)
+The environment is called `SoccerTwos` it was made by the Unity MLAgents Team. You can find its documentation [here](https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Learning-Environment-Examples.md#soccer-twos)

 The goal in this environment **is to get the ball into the opponent's goal while preventing the ball from entering its own goal.**

 <figure>
-<img src=”[https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/soccertwos.gif”](https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/soccertwos.gif%E2%80%9D) alt=”SoccerTwos”/>
+<img src="[https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/soccertwos.gif"](https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/soccertwos.gif%E2%80%9D) alt="SoccerTwos"/>

-<figcaption>This environment was made by the <a href=”[https://github.com/Unity-Technologies/ml-agents”](https://github.com/Unity-Technologies/ml-agents%E2%80%9D)>Unity MLAgents Team</a></figcaption>
+<figcaption>This environment was made by the <a href="[https://github.com/Unity-Technologies/ml-agents"](https://github.com/Unity-Technologies/ml-agents%E2%80%9D)>Unity MLAgents Team</a></figcaption>

 </figure>

@@ -108,7 +115,7 @@ The goal in this environment **is to get the ball into the opponent's goal while

 The reward function is:

-<img src=”https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/soccerreward.png” alt=”SoccerTwos Reward”/>
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/soccerreward.png" alt="SoccerTwos Reward"/>

 ### The observation space

@@ -128,15 +135,15 @@ The observation space is composed vector size of 336:

 The action space is three discrete branches:

-<img src=”https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/socceraction.png” alt=”SoccerTwos Action”/>
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/socceraction.png" alt="SoccerTwos Action"/>

 ## Step 2: Understand MA-POCA

 [https://arxiv.org/pdf/2111.05992.pdf](https://arxiv.org/pdf/2111.05992.pdf)

-## Step 3: Define the Config
+## Step 3: Define the config file

-We already learned in (Unit 5)[https://huggingface.co/deep-rl-course/unit5/introduction] that in ML-Agents, you define **the training hyperparameters into config.yaml files.**
+We already learned in (Unit 5)[https://huggingface.co/deep-rl-course/unit5/introduction] that in ML-Agents, you define **the training hyperparameters into `config.yaml` files.**

 There are multiple hyperparameters. To know them better, you should check for each explanation with **[the documentation](https://github.com/Unity-Technologies/ml-agents/blob/release_20_docs/docs/Training-Configuration-File.md)**

@@ -198,7 +205,9 @@ For 5M timesteps (which is the recommended value) it will take from 5 to 8 hours

 Depending on the executable you use (windows, ubuntu, mac) the training command will look like this (your executable path can be different so don’t hesitate to check before running).

-`mlagents-learn ./config/poca/SoccerTwos.yaml --env=./training-envs-executables/SoccerTwos.exe --run-id="SoccerTwos" --no-graphics`
+```bash
+mlagents-learn ./config/poca/SoccerTwos.yaml --env=./training-envs-executables/SoccerTwos.exe --run-id="SoccerTwos" --no-graphics
+```

 The executable contains 8 copies of SoccerTwos.

@@ -220,9 +229,8 @@ Create a new token (**[https://huggingface.co/settings/tokens](https://huggingfa

 Copy the token, run this, and paste the token

-```
+```bash
 huggingface-cli login
-
 ```

 Then, we need to run `mlagents-push-to-hf`.
@@ -237,9 +245,11 @@ If the repo does not exist **it will be created automatically**

 In my case

-`mlagents-push-to-hf  --run-id="SoccerTwos" --local-dir="./results/SoccerTwos" --repo-id="ThomasSimonini/poca-SoccerTwos" --commit-message="First Push"`
-
+```bash
+mlagents-push-to-hf  --run-id="SoccerTwos" --local-dir="./results/SoccerTwos" --repo-id="ThomasSimonini/poca-SoccerTwos" --commit-message="First Push"`
 ```
+
+```bash
 mlagents-push-to-hf  --run-id= # Add your run id  --local-dir= # Your local dir  --repo-id= # Your repo id --commit-message="First Push"
 ```

@@ -259,38 +269,27 @@ But in order that everything works perfectly you need to check:

 1. That you have this tag in your model: ML-Agents-SoccerTwos. This is the tag we use to select models to be added to the challenge pool. To do that go to your model and check the tags

-verify1.png
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/verify1.png" alt="Verify"/>
+

 If it’s not the case you just need to modify readme and add it

-verify2.png
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/verify2.png" alt="Verify"/>

-1. That you have a SoccerTwos onnx file
+2. That you have a `SoccerTwos.onnx` file

-verify3.png
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/verify3.png" alt="Verify"/>

 We strongly advise you to create a new model when you push to hub if you want to train it again, or train a new version.

 ## Step 7: Visualize some match in our demo

-Now that your model is part of AI vs AI Challenge, you can visualize how good it is compared to others.
+Now that your model is part of AI vs AI Challenge, **you can visualize how good it is compared to others**.

 In order to do that, you just need to go on this demo:

-Select your model as team blue (or team purple if you prefer) and another. The best to compare your model is either with the one who’s on top of the leaderboard. Or use the baseline model as opponent [https://huggingface.co/unity/MLAgents-SoccerTwos](https://huggingface.co/unity/MLAgents-SoccerTwos)
+- Select your model as team blue (or team purple if you prefer) and another. The best to compare your model is either with the one who’s on top of the leaderboard. Or use the [baseline model as opponent](https://huggingface.co/unity/MLAgents-SoccerTwos)

-This matches you see live are not used to the calculation of your result but are good way to visualize how good your agent is.
+This matches you see live are not used to the calculation of your result **but are good way to visualize how good your agent is**.

 And don't hesitate to share the best score your agent gets on discord in #rl-i-made-this channel 🔥
-
-### Conclusion
-
-That’s all for today. Congrats on finishing this tutorial!
-
-The best way to learn is to practice and try stuff. Why not training another agent with a different configuration?
-
-And don’t hesitate from time to time to check the [leaderboard](https://huggingface.co/spaces/huggingface-projects/AIvsAI-SoccerTwos)
-
-See you on Unit 8 🔥,
-
-## Keep Learning, Stay awesome 🤗
--- a/units/en/unit7/self-play.mdx
+++ b/units/en/unit7/self-play.mdx
@@ -1,6 +1,6 @@
 # Self-Play: a classic technique to train competitive agents in adversarial games

-Now that we studied the basics of multi-agents. We're ready to go deeper. As mentioned in the introduction, we're going **to train agents in an adversarial games with SoccerTwos a 2vs2 game.
+Now that we studied the basics of multi-agents. We're ready to go deeper. As mentioned in the introduction, we're going **to train agents in an adversarial games with SoccerTwos a 2vs2 game**.

 <figure>
 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/soccertwos.gif" alt="SoccerTwos"/>
@@ -17,9 +17,9 @@ On the one hand, we need to find how to get a well-trained opponent to play agai

 Think of a child that just started to learn soccer, playing against a very good soccer player will be useless since it will be too hard to win or at least get the ball from time to time. So the child will continuously lose without having time to learn a good policy.

-The best solution would be to have an opponent that is on the same level as the agent and will upgrade its level as the agent upgrade its own. Because if the opponent is too strong we’ll learn nothing and if it is too weak, we’re going to overlearn useless behavior against a stronger opponent then.
+The best solution would be **to have an opponent that is on the same level as the agent and will upgrade its level as the agent upgrade its own**. Because if the opponent is too strong we’ll learn nothing and if it is too weak, we’re going to overlearn useless behavior against a stronger opponent then.

-This solution is called *self-play*. In self-play, the agent uses former copies of itself (of its policy) as an opponent. This way, the agent will play against an agent of the same level (challenging but not too much), have opportunities to improve gradually its policy, and then, as it becomes better update its opponent. It’s a way to bootstrap an opponent and have a gradual increase of opponent complexity.
+This solution is called *self-play*. In self-play, **the agent uses former copies of itself (of its policy) as an opponent**. This way, the agent will play against an agent of the same level (challenging but not too much), have opportunities to improve gradually its policy, and then, as it becomes better update its opponent. It’s a way to bootstrap an opponent and have a gradual increase of opponent complexity.

 It’s the same way human learn in competition:

@@ -28,32 +28,34 @@ It’s the same way human learn in competition:

 We do the same with self-play:

- We start with a copy of our agent as an opponent this way this opponent is on a similar level.
- We learn from it, and when we acquired some skills, we update our opponent with a more recent copy of our training policy.
+- We **start with a copy of our agent as an opponent** this way this opponent is on a similar level.
+- We **learn from it**, and when we acquired some skills, we **update our opponent with a more recent copy of our training policy**.

-The theory behind self-play is not something new, it was already used by Arthur Samuel’s checker player system in the fifties, and by Gerald Tesauro’s TD-Gammon in 1955. If you want to learn more about the history of self-play check this very good blogpost by Andrew Cohen: [https://blog.unity.com/technology/training-intelligent-adversaries-using-self-play-with-ml-agents](https://blog.unity.com/technology/training-intelligent-adversaries-using-self-play-with-ml-agents)
+The theory behind self-play is not something new, it was already used by Arthur Samuel’s checker player system in the fifties, and by Gerald Tesauro’s TD-Gammon in 1955. If you want to learn more about the history of self-play [check this very good blogpost by Andrew Cohen](https://blog.unity.com/technology/training-intelligent-adversaries-using-self-play-with-ml-agents)

 ## Self-Play in MLAgents

-Self-Play is integrated into the MLAgents library and is managed by multiple hyperparameters that we’re going to study. But the main focus as explained in the documentation is the tradeoff between the skill level and generality of the final policy and the stability of learning.
+Self-Play is integrated into the MLAgents library and is managed by multiple hyperparameters that we’re going to study. But the main focus as explained in the documentation is the **tradeoff between the skill level and generality of the final policy and the stability of learning**.

 Training against a set of slowly changing or unchanging adversaries with low diversity **results in more stable training. But a risk to overfit if the change is too slow.**

 We need then to control:

- How often do we change opponents with swap_steps and team_change parameters.
- The number of opponents saved with window parameter. A larger value of `window`
+- How **often do we change opponents** with `swap_steps` and `team_change` parameters.
+- The **number of opponents saved** with `window` parameter. A larger value of `window`
  means that an agent's pool of opponents will contain a larger diversity of behaviors since it will contain policies from earlier in the training run.
- Probability of playing against the current self vs opponent sampled in the pool with play_against_latest_model_ratio. A larger value of `play_against_latest_model_ratio`
+- **Probability of playing against the current self vs opponent** sampled in the pool with `play_against_latest_model_ratio`. A larger value of `play_against_latest_model_ratio`
  indicates that an agent will be playing against the current opponent more often.
- The number of training steps before saving a new opponent with save_steps parameters. A larger value of `save_steps`
+- The **number of training steps before saving a new opponent** with `save_steps` parameters. A larger value of `save_steps`
  will yield a set of opponents that cover a wider range of skill levels and possibly play styles since the policy receives more training.

-To get more details about these hyperparameters you definitely need to check this part of the documentation: [https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Training-Configuration-File.md#self-play](https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Training-Configuration-File.md#self-play)
+To get more details about these hyperparameters you definitely need [to check this part of the documentation](https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Training-Configuration-File.md#self-play)


 ## The ELO Score to evaluate our agent
+
 ### What is ELO Score?
+
 In adversarial games, tracking the **cumulative reward is not always a meaningful metric to track the learning progress:** because this metric is **dependent only on the skill of the opponent.**

 Instead, we’re using an ***ELO rating system*** (named after Arpad Elo) that calculates the **relative skill level** between 2 players from a given population in a zero-sum game.