mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-03-30 08:40:27 +08:00
140 lines
4.1 KiB
Plaintext
140 lines
4.1 KiB
Plaintext
# Quiz
|
|
|
|
The best way to learn and [to avoid the illusion of competence](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf) **is to test yourself.** This will help you to find **where you need to reinforce your knowledge**.
|
|
|
|
|
|
### Q1: Chose the option which fits better when comparing different types of multi-agent environments
|
|
|
|
- Your agents aim to maximize common benefits in ____ environments
|
|
- Your agents aim to maximize common benefits while minimizing opponent's in ____ environments
|
|
|
|
<Question
|
|
choices={[
|
|
{
|
|
text: "competitive, cooperative",
|
|
explain: "You maximize common benefit in cooperative, while in competitive you also aim to reduce opponent's score",
|
|
correct: false,
|
|
},
|
|
{
|
|
text: "cooperative, competitive",
|
|
explain: "",
|
|
correct: true,
|
|
},
|
|
]}
|
|
/>
|
|
|
|
### Q2: Which of the following statements are true about `decentralized` learning?
|
|
|
|
<Question
|
|
choices={[
|
|
{
|
|
text: "Each agent is trained independently from the others",
|
|
explain: "",
|
|
correct: true,
|
|
},
|
|
{
|
|
text: "Inputs from other agents are just considered environment data",
|
|
explain: "",
|
|
correct: true,
|
|
},
|
|
{
|
|
text: "Considering other agents part of the environment makes the environment stationary",
|
|
explain: "In decentralized learning, agents ignore the existence of other agents and consider them part of the environment. However, this means the environment is in constant change, becoming non-stationary.",
|
|
correct: false,
|
|
},
|
|
]}
|
|
/>
|
|
|
|
|
|
### Q3: Which of the following statements are true about `centralized` learning?
|
|
|
|
<Question
|
|
choices={[
|
|
{
|
|
text: "It learns one common policy based on the learnings from all agents' interactions",
|
|
explain: "",
|
|
correct: true,
|
|
},
|
|
{
|
|
text: "The reward is global",
|
|
explain: "",
|
|
correct: true,
|
|
},
|
|
{
|
|
text: "The environment with this approach is stationary",
|
|
explain: "",
|
|
correct: true,
|
|
},
|
|
]}
|
|
/>
|
|
|
|
### Q4: Explain in your own words what is the `Self-Play` approach
|
|
|
|
<details>
|
|
<summary>Solution</summary>
|
|
|
|
`Self-play` is an approach to instantiate copies of agents with the same policy as your as opponents, so that your agent learns from agents with same training level.
|
|
|
|
</details>
|
|
|
|
### Q5: When configuring `Self-play`, several parameters are important. Could you identify, by their definition, which parameter are we talking about?
|
|
|
|
- The probability of playing against the current self vs an opponent from a pool
|
|
- Variety (dispersion) of training levels of the opponents you can face
|
|
- The number of training steps before spawning a new opponent
|
|
- Opponent change rate
|
|
|
|
<Question
|
|
choices={[
|
|
{
|
|
text: "window, play_against_latest_model_ratio, save_steps, swap_steps+team_change",
|
|
explain: "",
|
|
correct: false,
|
|
},
|
|
{
|
|
text: "play_against_latest_model_ratio, save_steps, window, swap_steps+team_change",
|
|
explain: "",
|
|
correct: false,
|
|
},
|
|
{
|
|
text: "play_against_latest_model_ratio, window, save_steps, swap_steps+team_change",
|
|
explain: "",
|
|
correct: true,
|
|
},
|
|
{
|
|
text: "swap_steps+team_change, save_steps, play_against_latest_model_ratio, window",
|
|
explain: "",
|
|
correct: false,
|
|
},
|
|
]}
|
|
/>
|
|
|
|
### Q6: What are the main motivations to use a ELO rating Score?
|
|
|
|
<Question
|
|
choices={[
|
|
{
|
|
text: "The score takes into account the different of skills between you and your opponent",
|
|
explain: "",
|
|
correct: true,
|
|
},
|
|
{
|
|
text: "Although more points can be exchanged depending on the result of the match and given the levels of the agents, the sum is always the same",
|
|
explain: "",
|
|
correct: true,
|
|
},
|
|
{
|
|
text: "It's easy for an agent to keep a high score rate",
|
|
explain: "That is called the `Rating deflation`: keeping a high rate requires much skill over time",
|
|
correct: false,
|
|
},
|
|
{
|
|
text: "It works well calculating the individual contributions of each player in a team",
|
|
explain: "ELO uses the score achieved by the whole team, but individual contributions are not calculated",
|
|
correct: false,
|
|
},
|
|
]}
|
|
/>
|
|
|
|
Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read the chapter again to reinforce (😏) your knowledge.
|