deep-rl-class/units/en/unit2/quiz1.mdx

# First Quiz [[quiz1]]

The best way to learn and [to avoid the illusion of competence](https://fr.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf) **is to test yourself.** This will help you to find **where you need to reinforce your knowledge**.


### Q1: What are the two main approaches to find optimal policy?


<Question
	choices={[
		{
			text: "Policy-based methods",
			explain: "With Policy-Based methods, we train the policy directly to learn which action to take given a state.",
      correct: true
		},
		{
			text: "Random-based methods",
			explain: ""
		},
    {
			text: "Value-based methods",
			explain: "With Value-based methods, we train a value function to learn which state is more valuable and use this value function to take the action that leads to it.",
      correct: true
		},
		{
			text: "Evolution-strategies methods",
      explain: ""
		}
	]}
/>


### Q2: What is the Bellman Equation?

<details>
<summary>Solution</summary>

**The Bellman equation is a recursive equation** that works like this: instead of starting for each state from the beginning and calculating the return, we can consider the value of any state as:

Rt+1 + (gamma * V(St+1))
The immediate reward + the discounted value of the state that follows

</details>

### Q3: Define each part of the Bellman Equation

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/bellman4-quiz.jpg" alt="Bellman equation quiz"/>


<details>
<summary>Solution</summary>

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/bellman4.jpg" alt="Bellman equation solution"/>

</details>

### Q4: What is the difference between Monte Carlo and Temporal Difference learning methods?

<Question
	choices={[
		{
			text: "With Monte Carlo methods, we update the value function from a complete episode",
			explain: "",
      correct: true
		},
    {
			text: "With Monte Carlo methods, we update the value function from a step",
			explain: ""
		},
    {
			text: "With TD learning methods, we update the value function from a complete episode",
			explain: ""
		},
    {
			text: "With TD learning methods, we update the value function from a step",
			explain: "",
      correct: true
		},
	]}
/>

### Q5: Define each part of Temporal Difference learning formula

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/td-ex.jpg" alt="TD Learning exercise"/>

<details>
<summary>Solution</summary>

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/TD-1.jpg" alt="TD Exercise"/>

</details>


### Q6: Define each part of Monte Carlo learning formula

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/mc-ex.jpg" alt="MC Learning exercise"/>

<details>
<summary>Solution</summary>

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/monte-carlo-approach.jpg" alt="MC Exercise"/>

</details>

Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the chapter to reinforce (😏) your knowledge.