Add illustration links

This commit is contained in:
simoninithomas
2023-01-31 16:17:50 +01:00
parent 00974cc6b3
commit d5bedcee2f
3 changed files with 18 additions and 8 deletions

View File

@@ -14,7 +14,7 @@ This worked great, and the single-agent system is useful for many applications.
<figcaption>
A patchwork of all the environments youve trained your agents on since the beginning of the course
</figcaption>
</figure>
But, as humans, **we live in a multi-agent world**. Our intelligence comes from interaction with other agents. And so, our **goal is to create agents that can interact with other humans and other agents**.

View File

@@ -11,7 +11,12 @@ To design this multi-agents reinforcement learning system (MARL), we have two so
## Decentralized system
[ADD illustration decentralized approach]
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/decentralized.png" alt="Decentralized"/>
<figcaption>
Source: <a href="https://www.youtube.com/watch?v=qgb0gyrpiGk"> Introduction to Multi-Agent Reinforcement Learning </a>
</figcaption>
</figure>
In decentralized learning, **each agent is trained independently from others**. In the example given each vacuum learns to clean as much place it can **without caring about what other vacuums (agents) are doing**.
@@ -24,7 +29,12 @@ And this is problematic for many reinforcement Learning algorithms **that can't
## Centralized approach
[ADD illustration centralized approach]
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/centralized.png" alt="Centralized"/>
<figcaption>
Source: <a href="https://www.youtube.com/watch?v=qgb0gyrpiGk"> Introduction to Multi-Agent Reinforcement Learning </a>
</figcaption>
</figure>
In this architecture, **we have a high level process that collect agents experiences**: experience buffer. And we'll use these experience **to learn a common policy**.

View File

@@ -1,11 +1,11 @@
# Self-Play: a classic technique to train competitive agents in adversarial games
Now that we studied the basics of multi-agents. We're ready to go deeper. As mentioned in the introduction, we're going to train agents in an adversarial games a Soccer 2vs2 game.
Now that we studied the basics of multi-agents. We're ready to go deeper. As mentioned in the introduction, we're going **to train agents in an adversarial games with SoccerTwos a 2vs2 game.
<figure>
<img src=https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/soccertwos.gif alt=SoccerTwos/>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/soccertwos.gif" alt="SoccerTwos"/>
<figcaption>This environment was made by the <a href=https://github.com/Unity-Technologies/ml-agents>Unity MLAgents Team</a></figcaption>
<figcaption>This environment was made by the <a href="https://github.com/Unity-Technologies/ml-agents">Unity MLAgents Team</a></figcaption>
</figure>
@@ -80,7 +80,7 @@ After every game:
So if A and B have rating Ra, and Rb, then the **expected scores are** given by:
<img src=https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/elo1.png alt=ELO Score/>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/elo1.png" alt="ELO Score"/>
Then, at the end of the game, we need to update the players actual Elo score, we use a linear adjustment **proportional to the amount by which the player over-performed or under-performed.**
@@ -91,7 +91,7 @@ We also define a maximum adjustment rating per game: K-factor.
If Player A has Ea points but scored Sa points, then the players rating is updated using the formula:
<img src=https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/elo2.png alt=ELO Score/>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/elo2.png" alt="ELO Score"/>
### Example