mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-27 12:21:41 +08:00
106 lines
3.2 KiB
Plaintext
106 lines
3.2 KiB
Plaintext
# First Quiz [[quiz1]]
|
|
|
|
The best way to learn and [to avoid the illusion of competence](https://fr.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf) **is to test yourself.** This will help you to find **where you need to reinforce your knowledge**.
|
|
|
|
|
|
### Q1: What are the two main approaches to find optimal policy?
|
|
|
|
|
|
<Question
|
|
choices={[
|
|
{
|
|
text: "Policy-based methods",
|
|
explain: "With Policy-Based methods, we train the policy directly to learn which action to take given a state.",
|
|
correct: true
|
|
},
|
|
{
|
|
text: "Random-based methods",
|
|
explain: ""
|
|
},
|
|
{
|
|
text: "Value-based methods",
|
|
explain: "With Value-based methods, we train a value function to learn which state is more valuable and use this value function to take the action that leads to it.",
|
|
correct: true
|
|
},
|
|
{
|
|
text: "Evolution-strategies methods",
|
|
explain: ""
|
|
}
|
|
]}
|
|
/>
|
|
|
|
|
|
### Q2: What is the Bellman Equation?
|
|
|
|
<details>
|
|
<summary>Solution</summary>
|
|
|
|
**The Bellman equation is a recursive equation** that works like this: instead of starting for each state from the beginning and calculating the return, we can consider the value of any state as:
|
|
|
|
Rt+1 + (gamma * V(St+1))
|
|
The immediate reward + the discounted value of the state that follows
|
|
|
|
</details>
|
|
|
|
### Q3: Define each part of the Bellman Equation
|
|
|
|
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/bellman4-quiz.jpg" alt="Bellman equation quiz"/>
|
|
|
|
|
|
<details>
|
|
<summary>Solution</summary>
|
|
|
|
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/bellman4.jpg" alt="Bellman equation solution"/>
|
|
|
|
</details>
|
|
|
|
### Q4: What is the difference between Monte Carlo and Temporal Difference learning methods?
|
|
|
|
<Question
|
|
choices={[
|
|
{
|
|
text: "With Monte Carlo methods, we update the value function from a complete episode",
|
|
explain: "",
|
|
correct: true
|
|
},
|
|
{
|
|
text: "With Monte Carlo methods, we update the value function from a step",
|
|
explain: ""
|
|
},
|
|
{
|
|
text: "With TD learning methods, we update the value function from a complete episode",
|
|
explain: ""
|
|
},
|
|
{
|
|
text: "With TD learning methods, we update the value function from a step",
|
|
explain: "",
|
|
correct: true
|
|
},
|
|
]}
|
|
/>
|
|
|
|
### Q5: Define each part of Temporal Difference learning formula
|
|
|
|
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/td-ex.jpg" alt="TD Learning exercise"/>
|
|
|
|
<details>
|
|
<summary>Solution</summary>
|
|
|
|
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/TD-1.jpg" alt="TD Exercise"/>
|
|
|
|
</details>
|
|
|
|
|
|
### Q6: Define each part of Monte Carlo learning formula
|
|
|
|
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/mc-ex.jpg" alt="MC Learning exercise"/>
|
|
|
|
<details>
|
|
<summary>Solution</summary>
|
|
|
|
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/monte-carlo-approach.jpg" alt="MC Exercise"/>
|
|
|
|
</details>
|
|
|
|
Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the chapter to reinforce (😏) your knowledge.
|