mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-01 01:30:56 +08:00
169 lines
5.9 KiB
Plaintext
169 lines
5.9 KiB
Plaintext
# Quiz [[quiz]]
|
||
|
||
The best way to learn and [to avoid the illusion of competence](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf) **is to test yourself.** This will help you to find **where you need to reinforce your knowledge**.
|
||
|
||
### Q1: What is Reinforcement Learning?
|
||
|
||
<details>
|
||
<summary>Solution</summary>
|
||
|
||
Reinforcement learning is a **framework for solving control tasks (also called decision problems)** by building agents that learn from the environment by interacting with it through trial and error and **receiving rewards (positive or negative) as unique feedback**.
|
||
|
||
</details>
|
||
|
||
|
||
|
||
### Q2: Define the RL Loop
|
||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/rl-loop-ex.jpg" alt="Exercise RL Loop"/>
|
||
|
||
At every step:
|
||
- Our Agent receives ______ from the environment
|
||
- Based on that ______ the Agent takes an ______
|
||
- Our Agent will move to the right
|
||
- The Environment goes to a ______
|
||
- The Environment gives a ______ to the Agent
|
||
|
||
|
||
<Question
|
||
choices={[
|
||
{
|
||
text: "an action a0, action a0, state s0, state s1, reward r1",
|
||
explain: "At every step: Our Agent receives **state s0** from the environment. Based on that **state s0** the Agent takes an **action a0**. Our Agent will move to the right. The Environment goes to a **new state s1**. The Environment gives **a reward r1** to the Agent."
|
||
},
|
||
{
|
||
text: "state s0, state s0, action a0, new state s1, reward r1",
|
||
explain: "",
|
||
correct: true
|
||
},
|
||
{
|
||
text: "a state s0, state s0, action a0, state s1, action a1",
|
||
explain: "At every step: Our Agent receives **state s0** from the environment. Based on that **state s0** the Agent takes an **action a0**. Our Agent will move to the right. The Environment goes to a **new state s1**. The Environment gives **a reward r1** to the Agent."
|
||
}
|
||
]}
|
||
/>
|
||
|
||
### Q3: What's the difference between a state and an observation?
|
||
|
||
<Question
|
||
choices={[
|
||
{
|
||
text: "The state is a complete description of the state of the world (there is no hidden information)",
|
||
explain: "",
|
||
correct: true
|
||
},
|
||
{
|
||
text: "The state is a partial description of the state",
|
||
explain: ""
|
||
},
|
||
{
|
||
text: "The observation is a complete description of the state of the world (there is no hidden information)",
|
||
explain: ""
|
||
},
|
||
{
|
||
text: "The observation is a partial description of the state",
|
||
explain: "",
|
||
correct: true
|
||
},
|
||
{
|
||
text: "We receive a state when we play with chess environment",
|
||
explain: "Since we have access to the whole checkboard information.",
|
||
correct: true
|
||
},
|
||
{
|
||
text: "We receive an observation when we play with chess environment",
|
||
explain: "Since we have access to the whole checkboard information."
|
||
},
|
||
{
|
||
text: "We receive a state when we play with Super Mario Bros",
|
||
explain: "We only see a part of the level close to the player, so we receive an observation."
|
||
},
|
||
{
|
||
text: "We receive an observation when we play with Super Mario Bros",
|
||
explain: "We only see a part of the level close to the player.",
|
||
correct: true
|
||
}
|
||
]}
|
||
/>
|
||
|
||
### Q4: A task is an instance of a Reinforcement Learning problem. What are the two types of tasks?
|
||
|
||
<Question
|
||
choices={[
|
||
{
|
||
text: "Episodic",
|
||
explain: "In Episodic task, we have a starting point and an ending point (a terminal state). This creates an episode: a list of States, Actions, Rewards, and new States. For instance, think about Super Mario Bros: an episode begin at the launch of a new Mario Level and ending when you’re killed or you reached the end of the level.",
|
||
correct: true
|
||
},
|
||
{
|
||
text: "Recursive",
|
||
explain: ""
|
||
},
|
||
{
|
||
text: "Adversarial",
|
||
explain: ""
|
||
},
|
||
{
|
||
text: "Continuing",
|
||
explain: "Continuing tasks are tasks that continue forever (no terminal state). In this case, the agent must learn how to choose the best actions and simultaneously interact with the environment.",
|
||
correct: true
|
||
}
|
||
]}
|
||
/>
|
||
|
||
### Q5: What is the exploration/exploitation tradeoff?
|
||
|
||
<details>
|
||
<summary>Solution</summary>
|
||
|
||
In Reinforcement Learning, we need to **balance how much we explore the environment and how much we exploit what we know about the environment**.
|
||
|
||
- *Exploration* is exploring the environment by **trying random actions in order to find more information about the environment**.
|
||
|
||
- *Exploitation* is **exploiting known information to maximize the reward**.
|
||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/expexpltradeoff.jpg" alt="Exploration Exploitation Tradeoff" width="100%">
|
||
|
||
</details>
|
||
|
||
|
||
### Q6: What is a policy?
|
||
|
||
<details>
|
||
<summary>Solution</summary>
|
||
|
||
- The Policy π **is the brain of our Agent**. It’s the function that tells us what action to take given the state we are in. So it defines the agent’s behavior at a given time.
|
||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/policy_1.jpg" alt="Policy">
|
||
|
||
</details>
|
||
|
||
|
||
### Q7: What are value-based methods?
|
||
|
||
<details>
|
||
<summary>Solution</summary>
|
||
|
||
- Value-based methods is one of the main approaches for solving RL problems.
|
||
- In Value-based methods, instead of training a policy function, **we train a value function that maps a state to the expected value of being at that state**.
|
||
|
||
|
||
|
||
</details>
|
||
|
||
### Q8: What are policy-based methods?
|
||
|
||
<details>
|
||
<summary>Solution</summary>
|
||
|
||
- In *Policy-Based Methods*, we learn a **policy function directly**.
|
||
- This policy function will **map from each state to the best corresponding action at that state**. Or a **probability distribution over the set of possible actions at that state**.
|
||
|
||
|
||
|
||
|
||
</details>
|
||
|
||
|
||
Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the chapter to reinforce (😏) your knowledge, but **do not worry**: during the course we'll go over again of these concepts, and you'll **reinforce your theoretical knowledge with hands-on**.
|