diff --git a/units/en/unit7/additional-readings.mdx b/units/en/unit7/additional-readings.mdx
index 71aba31..6cb3239 100644
--- a/units/en/unit7/additional-readings.mdx
+++ b/units/en/unit7/additional-readings.mdx
@@ -1,3 +1,17 @@
# Additional Readings [[additional-readings]]
-## Self-Play
+## An introduction to multi-agents
+
+- [Multi-agent reinforcement learning: An overview](https://www.dcsc.tudelft.nl/~bdeschutter/pub/rep/10_003.pdf)
+- [Multiagent Reinforcement Learning, Marc Lanctot](https://rlss.inria.fr/files/2019/07/RLSS_Multiagent.pdf)
+- [Example of a multi-agent environment](https://www.mathworks.com/help/reinforcement-learning/ug/train-3-agents-for-area-coverage.html?s_eid=PSM_15028)
+- [A list of different multi-agent environments](https://agents.inf.ed.ac.uk/blog/multiagent-learning-environments/)
+- [Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents](https://bit.ly/3nVK7My)
+- [Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning](https://bit.ly/3v7LxaT)
+
+## Self-Play and MA-POCA
+
+- [Self Play Theory and with MLAgents](https://blog.unity.com/technology/training-intelligent-adversaries-using-self-play-with-ml-agents)
+- [Training complex behavior with MLAgents](https://blog.unity.com/technology/ml-agents-v20-release-now-supports-training-complex-cooperative-behaviors)
+- [MLAgents plays dodgeball](https://blog.unity.com/technology/ml-agents-plays-dodgeball)
+- [On the Use and Misuse of Absorbing States in Multi-agent Reinforcement Learning (MA-POCA)](https://arxiv.org/pdf/2111.05992.pdf)
diff --git a/units/en/unit7/hands-on.mdx b/units/en/unit7/hands-on.mdx
index 27b3255..31db82b 100644
--- a/units/en/unit7/hands-on.mdx
+++ b/units/en/unit7/hands-on.mdx
@@ -29,7 +29,7 @@ It's a matchmaking algorithm where your pushed models are ranked by playing aga
AI vs. AI is three tools:
- A *matchmaking process* defining which model against which model and running the model fights using a background task in the Space.
-- A *leaderboard* getting the match history results and displaying the models ELO ratings: [ADD LEADERBOARD]
+- A *leaderboard* getting the match history results and displaying the models ELO ratings: https://huggingface.co/spaces/huggingface-projects/AIvsAI-SoccerTwos
- A *Space demo* to visualize your agents playing against others : https://huggingface.co/spaces/unity/ML-Agents-SoccerTwos
@@ -54,10 +54,54 @@ What will make the difference during this challenge are **the hyperparameters yo
# Step 0: Install MLAgents and download the correct executable
+You need to install a specific version of MLAgents
-# Step 1: Understand the environment
-
+
+
+## Step 1: Understand the environment
+
+The environment is called `` it was made by the Unity MLAgents Team.
+
+The goal in this
+
+The goal in this environment is to train our agent to **get the gold brick on the top of the Pyramid. To do that, it needs to press a button to spawn a Pyramid, navigate to the Pyramid, knock it over, and move to the gold brick at the top**.
+
+
+
+
+## The reward function
+
+The reward function is:
+
+
+
+In terms of code, it looks like this
+
+
+To train this new agent that seeks that button and then the Pyramid to destroy, we’ll use a combination of two types of rewards:
+
+- The *extrinsic one* given by the environment (illustration above).
+- But also an *intrinsic* one called **curiosity**. This second will **push our agent to be curious, or in other terms, to better explore its environment**.
+
+If you want to know more about curiosity, the next section (optional) will explain the basics.
+
+## The observation space
+
+In terms of observation, we **use 148 raycasts that can each detect objects** (switch, bricks, golden brick, and walls.)
+
+
+
+We also use a **boolean variable indicating the switch state** (did we turn on or off the switch to spawn the Pyramid) and a vector that **contains the agent’s speed**.
+
+
+
+
+## The action space
+
+The action space is **discrete** with four possible actions:
+
+
@@ -67,6 +111,8 @@ What will make the difference during this challenge are **the hyperparameters yo
+## MA POCA
+https://arxiv.org/pdf/2111.05992.pdf
diff --git a/units/en/unit7/self-play.mdx b/units/en/unit7/self-play.mdx
index 307354e..f553432 100644
--- a/units/en/unit7/self-play.mdx
+++ b/units/en/unit7/self-play.mdx
@@ -104,16 +104,19 @@ Player B has a rating of 2300
- We first calculate the expected score:
\\(E_{A} = \frac{1}{1+10^{(2300-2600)/400}} = 0.849 \\)
+
\\(E_{B} = \frac{1}{1+10^{(2600-2300)/400}} = 0.151 \\)
- If the organizers determined that K=16 and A wins, the new rating would be:
\\(ELO_A = 2600 + 16*(1-0.849) = 2602 \\)
+
\\(ELO_B = 2300 + 16*(1-0.151) = 2298 \\)
- If the organizers determined that K=16 and B wins, the new rating would be:
\\(ELO_A = 2600 + 16*(0-0.849) = 2586 \\)
+
\\(ELO_B = 2300 + 16 *(1-0.151) = 2314 \\)