deep-rl-class/units/en/unit1/quiz.mdx

# Quiz [[quiz]]

The best way to learn and [to avoid the illusion of competence](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf) **is to test yourself.** This will help you to find **where you need to reinforce your knowledge**.

### Q1: What is Reinforcement Learning?

<details>
<summary>Solution</summary>

Reinforcement learning is a **framework for solving control tasks (also called decision problems)** by building agents that learn from the environment by interacting with it through trial and error and **receiving rewards (positive or negative) as unique feedback**.

</details>


### Q2: Define the RL Loop

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/rl-loop-ex.jpg" alt="Exercise RL Loop"/>

At every step:
- Our Agent receives ______ from the environment
- Based on that ______ the Agent takes an ______
- Our Agent will move to the right
- The Environment goes to a ______
- The Environment gives a ______ to the Agent


<Question
	choices={[
		{
			text: "an action a0, action a0, state s0, state s1, reward r1",
			explain: "At every step: Our Agent receives **state s0** from the environment. Based on that **state s0** the Agent takes an **action a0**. Our Agent will move to the right. The Environment goes to a **new state s1**. The Environment gives **a reward r1** to the Agent."
		},
		{
			text: "state s0, state s0, action a0, new state s1, reward r1",
			explain: "",
      correct: true
		},
		{
			text: "a state s0, state s0, action a0, state s1, action a1",
      explain: "At every step: Our Agent receives **state s0** from the environment. Based on that **state s0** the Agent takes an **action a0**. Our Agent will move to the right. The Environment goes to a **new state s1**. The Environment gives **a reward r1** to the Agent."
		}
	]}
/>

### Q3: What's the difference between a state and an observation?

<Question
	choices={[
		{
			text: "The state is a complete description of the state of the world (there is no hidden information)",
			explain: "",
      correct: true
		},
    {
			text: "The state is a partial description of the state",
			explain: ""
		},
    {
      text: "The observation is a complete description of the state of the world (there is no hidden information)",
      explain: ""
    },
    {
      text: "The observation is a partial description of the state",
      explain: "",
      correct: true
    },
    {
      text: "We receive a state when we play with chess environment",
      explain: "Since we have access to the whole checkboard information.",
      correct: true
    },
    {
      text: "We receive an observation when we play with chess environment",
      explain: "Since we have access to the whole checkboard information."
    },
    {
      text: "We receive a state when we play with Super Mario Bros",
      explain: "We only see a part of the level close to the player, so we receive an observation."
    },
    {
      text: "We receive an observation when we play with Super Mario Bros",
      explain: "We only see a part of the level close to the player.",
      correct: true
    }
	]}
/>

### Q4: A task is an instance of a Reinforcement Learning problem. What are the two types of tasks?

<Question
	choices={[
		{
			text: "Episodic",
			explain: "In Episodic task, we have a starting point and an ending point (a terminal state). This creates an episode: a list of States, Actions, Rewards, and new States. For instance, think about Super Mario Bros: an episode begin at the launch of a new Mario Level and ending when you’re killed or you reached the end of the level.",
      correct: true
		},
    {
			text: "Recursive",
			explain: ""
		},
    {
			text: "Adversarial",
			explain: ""
		},
    {
      text: "Continuing",
      explain: "Continuing tasks are tasks that continue forever (no terminal state). In this case, the agent must learn how to choose the best actions and simultaneously interact with the environment.",
      correct: true
    }
	]}
/>

### Q5: What is the exploration/exploitation tradeoff?

<details>
<summary>Solution</summary>

In Reinforcement Learning, we need to **balance how much we explore the environment and how much we exploit what we know about the environment**.

- *Exploration* is exploring the environment by **trying random actions in order to find more information about the environment**.

- *Exploitation* is **exploiting known information to maximize the reward**.

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/expexpltradeoff.jpg" alt="Exploration Exploitation Tradeoff" width="100%">

</details>


### Q6: What is a policy?

<details>
<summary>Solution</summary>

- The Policy π **is the brain of our Agent**. It’s the function that tells us what action to take given the state we are in. So it defines the agent’s behavior at a given time.

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/policy_1.jpg" alt="Policy">

</details>


### Q7: What are value-based methods?

<details>
<summary>Solution</summary>

- Value-based methods is one of the main approaches for solving RL problems.
- In Value-based methods, instead of training a policy function, **we train a value function that maps a state to the expected value of being at that state**.


</details>

### Q8: What are policy-based methods?

<details>
<summary>Solution</summary>

- In *Policy-Based Methods*, we learn a **policy function directly**.
- This policy function will **map from each state to the best corresponding action at that state**. Or a **probability distribution over the set of possible actions at that state**.


</details>


Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the chapter to reinforce (😏) your knowledge, but **do not worry**: during the course we'll go over again of these concepts, and you'll **reinforce your theoretical knowledge with hands-on**.