From d5bedcee2f735fe1eac2e6819f4664687af3aa70 Mon Sep 17 00:00:00 2001
From: simoninithomas <simonini_thomas@outlook.fr>
Date: Tue, 31 Jan 2023 16:17:50 +0100
Subject: [PATCH] Add illustration links

---
 units/en/unit7/introduction.mdx        |  2 +-
 units/en/unit7/multi-agent-setting.mdx | 14 ++++++++++++--
 units/en/unit7/self-play.mdx           | 10 +++++-----
 3 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/units/en/unit7/introduction.mdx b/units/en/unit7/introduction.mdx
index ead04df..92be024 100644
--- a/units/en/unit7/introduction.mdx
+++ b/units/en/unit7/introduction.mdx
@@ -14,7 +14,7 @@ This worked great, and the single-agent system is useful for many applications.
 <figcaption>
 
 A patchwork of all the environments you’ve trained your agents on since the beginning of the course
-
+</figcaption>
 </figure>
 
 But, as humans, **we live in a multi-agent world**. Our intelligence comes from interaction with other agents. And so, our **goal is to create agents that can interact with other humans and other agents**.
diff --git a/units/en/unit7/multi-agent-setting.mdx b/units/en/unit7/multi-agent-setting.mdx
index 83f670d..6185df4 100644
--- a/units/en/unit7/multi-agent-setting.mdx
+++ b/units/en/unit7/multi-agent-setting.mdx
@@ -11,7 +11,12 @@ To design this multi-agents reinforcement learning system (MARL), we have two so
 
 ## Decentralized system
 
-[ADD illustration decentralized approach]
+<figure>
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/decentralized.png" alt="Decentralized"/>
+<figcaption>
+Source: <a href="https://www.youtube.com/watch?v=qgb0gyrpiGk"> Introduction to Multi-Agent Reinforcement Learning </a>
+</figcaption>
+</figure>
 
 In decentralized learning, **each agent is trained independently from others**. In the example given each vacuum learns to clean as much place it can **without caring about what other vacuums (agents) are doing**.
 
@@ -24,7 +29,12 @@ And this is problematic for many reinforcement Learning algorithms **that can't
 
 ## Centralized approach
 
-[ADD illustration centralized approach]
+<figure>
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/centralized.png" alt="Centralized"/>
+<figcaption>
+Source: <a href="https://www.youtube.com/watch?v=qgb0gyrpiGk"> Introduction to Multi-Agent Reinforcement Learning </a>
+</figcaption>
+</figure>
 
 In this architecture, **we have a high level process that collect agents experiences**: experience buffer. And we'll use these experience **to learn a common policy**.
 
diff --git a/units/en/unit7/self-play.mdx b/units/en/unit7/self-play.mdx
index f553432..4d5ac48 100644
--- a/units/en/unit7/self-play.mdx
+++ b/units/en/unit7/self-play.mdx
@@ -1,11 +1,11 @@
 # Self-Play: a classic technique to train competitive agents in adversarial games
 
-Now that we studied the basics of multi-agents. We're ready to go deeper. As mentioned in the introduction, we're going to train agents in an adversarial games a Soccer 2vs2 game.
+Now that we studied the basics of multi-agents. We're ready to go deeper. As mentioned in the introduction, we're going **to train agents in an adversarial games with SoccerTwos a 2vs2 game.
 
 <figure>
-<img src=”https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/soccertwos.gif” alt=”SoccerTwos”/>
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/soccertwos.gif" alt="SoccerTwos"/>
 
-<figcaption>This environment was made by the <a href=”https://github.com/Unity-Technologies/ml-agents”>Unity MLAgents Team</a></figcaption>
+<figcaption>This environment was made by the <a href="https://github.com/Unity-Technologies/ml-agents">Unity MLAgents Team</a></figcaption>
 
 </figure>
 
@@ -80,7 +80,7 @@ After every game:
 
 So if A and B have rating Ra, and Rb, then the **expected scores are** given by:
 
-<img src=”https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/elo1.png” alt=”ELO Score”/>
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/elo1.png" alt="ELO Score"/>
 
 Then, at the end of the game, we need to update the player’s actual Elo score, we use a linear adjustment **proportional to the amount by which the player over-performed or under-performed.**
 
@@ -91,7 +91,7 @@ We also define a maximum adjustment rating per game: K-factor.
 
 If Player A has Ea points but scored Sa points, then the player’s rating is updated using the formula:
 
-<img src=”https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/elo2.png” alt=”ELO Score”/>
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/elo2.png" alt="ELO Score"/>
 
 ### Example