Files
deep-rl-class/units/en/unit1/quiz.mdx
Alexander Feghali 2de0931089 Update quiz.mdx
fixed typo
2022-12-05 20:04:06 -05:00

169 lines
5.9 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Quiz [[quiz]]
The best way to learn and [to avoid the illusion of competence](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf) **is to test yourself.** This will help you to find **where you need to reinforce your knowledge**.
### Q1: What is Reinforcement Learning?
<details>
<summary>Solution</summary>
Reinforcement learning is a **framework for solving control tasks (also called decision problems)** by building agents that learn from the environment by interacting with it through trial and error and **receiving rewards (positive or negative) as unique feedback**.
</details>
### Q2: Define the RL Loop
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/rl-loop-ex.jpg" alt="Exercise RL Loop"/>
At every step:
- Our Agent receives ______ from the environment
- Based on that ______ the Agent takes an ______
- Our Agent will move to the right
- The Environment goes to a ______
- The Environment gives a ______ to the Agent
<Question
choices={[
{
text: "an action a0, action a0, state s0, state s1, reward r1",
explain: "At every step: Our Agent receives **state s0** from the environment. Based on that **state s0** the Agent takes an **action a0**. Our Agent will move to the right. The Environment goes to a **new state s1**. The Environment gives **a reward r1** to the Agent."
},
{
text: "state s0, state s0, action a0, new state s1, reward r1",
explain: "",
correct: true
},
{
text: "a state s0, state s0, action a0, state s1, action a1",
explain: "At every step: Our Agent receives **state s0** from the environment. Based on that **state s0** the Agent takes an **action a0**. Our Agent will move to the right. The Environment goes to a **new state s1**. The Environment gives **a reward r1** to the Agent."
}
]}
/>
### Q3: What's the difference between a state and an observation?
<Question
choices={[
{
text: "The state is a complete description of the state of the world (there is no hidden information)",
explain: "",
correct: true
},
{
text: "The state is a partial description of the state",
explain: ""
},
{
text: "The observation is a complete description of the state of the world (there is no hidden information)",
explain: ""
},
{
text: "The observation is a partial description of the state",
explain: "",
correct: true
},
{
text: "We receive a state when we play with chess environment",
explain: "Since we have access to the whole checkboard information.",
correct: true
},
{
text: "We receive an observation when we play with chess environment",
explain: "Since we have access to the whole checkboard information."
},
{
text: "We receive a state when we play with Super Mario Bros",
explain: "We only see a part of the level close to the player, so we receive an observation."
},
{
text: "We receive an observation when we play with Super Mario Bros",
explain: "We only see a part of the level close to the player.",
correct: true
}
]}
/>
### Q4: A task is an instance of a Reinforcement Learning problem. What are the two types of tasks?
<Question
choices={[
{
text: "Episodic",
explain: "In Episodic task, we have a starting point and an ending point (a terminal state). This creates an episode: a list of States, Actions, Rewards, and new States. For instance, think about Super Mario Bros: an episode begin at the launch of a new Mario Level and ending when youre killed or you reached the end of the level.",
correct: true
},
{
text: "Recursive",
explain: ""
},
{
text: "Adversarial",
explain: ""
},
{
text: "Continuing",
explain: "Continuing tasks are tasks that continue forever (no terminal state). In this case, the agent must learn how to choose the best actions and simultaneously interact with the environment.",
correct: true
}
]}
/>
### Q5: What is the exploration/exploitation tradeoff?
<details>
<summary>Solution</summary>
In Reinforcement Learning, we need to **balance how much we explore the environment and how much we exploit what we know about the environment**.
- *Exploration* is exploring the environment by **trying random actions in order to find more information about the environment**.
- *Exploitation* is **exploiting known information to maximize the reward**.
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/expexpltradeoff.jpg" alt="Exploration Exploitation Tradeoff" width="100%">
</details>
### Q6: What is a policy?
<details>
<summary>Solution</summary>
- The Policy π **is the brain of our Agent**. Its the function that tells us what action to take given the state we are in. So it defines the agents behavior at a given time.
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/policy_1.jpg" alt="Policy">
</details>
### Q7: What are value-based methods?
<details>
<summary>Solution</summary>
- Value-based methods is one of the main approaches for solving RL problems.
- In Value-based methods, instead of training a policy function, **we train a value function that maps a state to the expected value of being at that state**.
</details>
### Q8: What are policy-based methods?
<details>
<summary>Solution</summary>
- In *Policy-Based Methods*, we learn a **policy function directly**.
- This policy function will **map from each state to the best corresponding action at that state**. Or a **probability distribution over the set of possible actions at that state**.
</details>
Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the chapter to reinforce (😏) your knowledge, but **do not worry**: during the course we'll go over again of these concepts, and you'll **reinforce your theoretical knowledge with hands-on**.