mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-03 10:38:27 +08:00
Add illustration links
This commit is contained in:
@@ -14,7 +14,7 @@ This worked great, and the single-agent system is useful for many applications.
|
||||
<figcaption>
|
||||
|
||||
A patchwork of all the environments you’ve trained your agents on since the beginning of the course
|
||||
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
But, as humans, **we live in a multi-agent world**. Our intelligence comes from interaction with other agents. And so, our **goal is to create agents that can interact with other humans and other agents**.
|
||||
|
||||
@@ -11,7 +11,12 @@ To design this multi-agents reinforcement learning system (MARL), we have two so
|
||||
|
||||
## Decentralized system
|
||||
|
||||
[ADD illustration decentralized approach]
|
||||
<figure>
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/decentralized.png" alt="Decentralized"/>
|
||||
<figcaption>
|
||||
Source: <a href="https://www.youtube.com/watch?v=qgb0gyrpiGk"> Introduction to Multi-Agent Reinforcement Learning </a>
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
In decentralized learning, **each agent is trained independently from others**. In the example given each vacuum learns to clean as much place it can **without caring about what other vacuums (agents) are doing**.
|
||||
|
||||
@@ -24,7 +29,12 @@ And this is problematic for many reinforcement Learning algorithms **that can't
|
||||
|
||||
## Centralized approach
|
||||
|
||||
[ADD illustration centralized approach]
|
||||
<figure>
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/centralized.png" alt="Centralized"/>
|
||||
<figcaption>
|
||||
Source: <a href="https://www.youtube.com/watch?v=qgb0gyrpiGk"> Introduction to Multi-Agent Reinforcement Learning </a>
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
In this architecture, **we have a high level process that collect agents experiences**: experience buffer. And we'll use these experience **to learn a common policy**.
|
||||
|
||||
|
||||
@@ -1,11 +1,11 @@
|
||||
# Self-Play: a classic technique to train competitive agents in adversarial games
|
||||
|
||||
Now that we studied the basics of multi-agents. We're ready to go deeper. As mentioned in the introduction, we're going to train agents in an adversarial games a Soccer 2vs2 game.
|
||||
Now that we studied the basics of multi-agents. We're ready to go deeper. As mentioned in the introduction, we're going **to train agents in an adversarial games with SoccerTwos a 2vs2 game.
|
||||
|
||||
<figure>
|
||||
<img src=”https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/soccertwos.gif” alt=”SoccerTwos”/>
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/soccertwos.gif" alt="SoccerTwos"/>
|
||||
|
||||
<figcaption>This environment was made by the <a href=”https://github.com/Unity-Technologies/ml-agents”>Unity MLAgents Team</a></figcaption>
|
||||
<figcaption>This environment was made by the <a href="https://github.com/Unity-Technologies/ml-agents">Unity MLAgents Team</a></figcaption>
|
||||
|
||||
</figure>
|
||||
|
||||
@@ -80,7 +80,7 @@ After every game:
|
||||
|
||||
So if A and B have rating Ra, and Rb, then the **expected scores are** given by:
|
||||
|
||||
<img src=”https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/elo1.png” alt=”ELO Score”/>
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/elo1.png" alt="ELO Score"/>
|
||||
|
||||
Then, at the end of the game, we need to update the player’s actual Elo score, we use a linear adjustment **proportional to the amount by which the player over-performed or under-performed.**
|
||||
|
||||
@@ -91,7 +91,7 @@ We also define a maximum adjustment rating per game: K-factor.
|
||||
|
||||
If Player A has Ea points but scored Sa points, then the player’s rating is updated using the formula:
|
||||
|
||||
<img src=”https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/elo2.png” alt=”ELO Score”/>
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/elo2.png" alt="ELO Score"/>
|
||||
|
||||
### Example
|
||||
|
||||
|
||||
Reference in New Issue
Block a user