mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-02-12 06:35:21 +08:00
@@ -180,6 +180,8 @@
|
||||
title: Self-Play
|
||||
- local: unit7/hands-on
|
||||
title: Let's train our soccer team to beat your classmates' teams (AI vs. AI)
|
||||
- local: unit7/quiz
|
||||
title: Quiz
|
||||
- local: unit7/conclusion
|
||||
title: Conclusion
|
||||
- local: unit7/additional-readings
|
||||
|
||||
139
units/en/unit7/quiz.mdx
Normal file
139
units/en/unit7/quiz.mdx
Normal file
@@ -0,0 +1,139 @@
|
||||
# Quiz
|
||||
|
||||
The best way to learn and [to avoid the illusion of competence](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf) **is to test yourself.** This will help you to find **where you need to reinforce your knowledge**.
|
||||
|
||||
|
||||
### Q1: Chose the option which fits better when comparing different types of multi-agent environments
|
||||
|
||||
- Your agents aim to maximize common benefits in ____ environments
|
||||
- Your agents aim to maximize common benefits while minimizing opponent's in ____ environments
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "competitive, cooperative",
|
||||
explain: "You maximize common benefit in cooperative, while in competitive you also aim to reduce opponent's score",
|
||||
correct: false,
|
||||
},
|
||||
{
|
||||
text: "cooperative, competitive",
|
||||
explain: "",
|
||||
correct: true,
|
||||
},
|
||||
]}
|
||||
/>
|
||||
|
||||
### Q2: Which of the following statements are true about `decentralized` learning?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "Each agent is trained independently from the others",
|
||||
explain: "",
|
||||
correct: true,
|
||||
},
|
||||
{
|
||||
text: "Inputs from other agents are just considered environment data",
|
||||
explain: "",
|
||||
correct: true,
|
||||
},
|
||||
{
|
||||
text: "Considering other agents part of the environment makes the environment stationary",
|
||||
explain: "In decentralized learning, agents ignore the existence of other agents and consider them part of the environment. However, this means the environment is in constant change, becoming non-stationary.",
|
||||
correct: false,
|
||||
},
|
||||
]}
|
||||
/>
|
||||
|
||||
|
||||
### Q3: Which of the following statements are true about `centralized` learning?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "It learns one common policy based on the learnings from all agents' interactions",
|
||||
explain: "",
|
||||
correct: true,
|
||||
},
|
||||
{
|
||||
text: "The reward is global",
|
||||
explain: "",
|
||||
correct: true,
|
||||
},
|
||||
{
|
||||
text: "The environment with this approach is stationary",
|
||||
explain: "",
|
||||
correct: true,
|
||||
},
|
||||
]}
|
||||
/>
|
||||
|
||||
### Q4: Explain in your own words what is the `Self-Play` approach
|
||||
|
||||
<details>
|
||||
<summary>Solution</summary>
|
||||
|
||||
`Self-play` is an approach to instantiate copies of agents with the same policy as your as opponents, so that your agent learns from agents with same training level.
|
||||
|
||||
</details>
|
||||
|
||||
### Q5: When configuring `Self-play`, several parameters are important. Could you identify, by their definition, which parameter are we talking about?
|
||||
|
||||
- The probability of playing against the current self vs an opponent from a pool
|
||||
- Variety (dispersion) of training levels of the opponents you can face
|
||||
- The number of training steps before spawning a new opponent
|
||||
- Opponent change rate
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "window, play_against_latest_model_ratio, save_steps, swap_steps+team_change",
|
||||
explain: "",
|
||||
correct: false,
|
||||
},
|
||||
{
|
||||
text: "play_against_latest_model_ratio, save_steps, window, swap_steps+team_change",
|
||||
explain: "",
|
||||
correct: false,
|
||||
},
|
||||
{
|
||||
text: "play_against_latest_model_ratio, window, save_steps, swap_steps+team_change",
|
||||
explain: "",
|
||||
correct: true,
|
||||
},
|
||||
{
|
||||
text: "swap_steps+team_change, save_steps, play_against_latest_model_ratio, window",
|
||||
explain: "",
|
||||
correct: false,
|
||||
},
|
||||
]}
|
||||
/>
|
||||
|
||||
### Q6: What are the main motivations to use a ELO rating Score?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "The score takes into account the different of skills between you and your opponent",
|
||||
explain: "",
|
||||
correct: true,
|
||||
},
|
||||
{
|
||||
text: "Although more points can be exchanged depending on the result of the match and given the levels of the agents, the sum is always the same",
|
||||
explain: "",
|
||||
correct: true,
|
||||
},
|
||||
{
|
||||
text: "It's easy for an agent to keep a high score rate",
|
||||
explain: "That is called the `Rating deflation`: keeping a high rate requires much skill over time",
|
||||
correct: false,
|
||||
},
|
||||
{
|
||||
text: "It works well calculating the individual contributions of each player in a team",
|
||||
explain: "ELO uses the score achieved by the whole team, but individual contributions are not calculated",
|
||||
correct: false,
|
||||
},
|
||||
]}
|
||||
/>
|
||||
|
||||
Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read the chapter again to reinforce (😏) your knowledge.
|
||||
Reference in New Issue
Block a user