Add the Quiz

2026-04-02 10:10:15 +08:00 · 2023-01-04 09:07:09 +01:00
parent 5dbb460d90
commit 851b083fcf
1 changed files with 80 additions and 1 deletions
--- a/units/en/unit4/quiz.mdx
+++ b/units/en/unit4/quiz.mdx
@@ -1,3 +1,82 @@
 # Quiz

-This is the quiz
+The best way to learn and [to avoid the illusion of competence](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf) **is to test yourself.** This will help you to find **where you need to reinforce your knowledge**.
+
+
+### Q1: What are the advantages of policy-gradient over value-based methods? (Check all that apply)
+
+<Question
+	choices={[
+		{
+			text: "Policy-gradient methods can learn a stochastic policy",
+			explain: "",
+      correct: true
+		},
+		{
+			text: "Policy-gradient methods are more effective in high-dimensional action spaces and continuous actions spaces",
+			explain: "",
+      correct: true
+		},
+    {
+			text: "Policy-gradient converges most of the time on a global maximum.",
+			explain: "No, frequently, policy-gradient converges on a local maximum instead of a global optimum.",
+		},
+	]}
+/>
+
+### Q2: What is the Policy Gradient Theorem?
+
+<details>
+<summary>Solution</summary>
+
+*The Policy Gradient Theorem* is a formula that will help us to reformulate the objective function into a differentiable function that does not involve the differentiation of the state distribution.
+
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit6/policy_gradient_theorem.png" alt="Policy Gradient"/>
+
+</details>
+
+
+### Q3: What's the difference between policy-based methods and policy-gradient methods? (Check all that apply)
+
+<Question
+	choices={[
+    {
+      text: "Policy-based methods are a subset of policy-gradient methods.",
+      explain: "",
+    },
+    {
+      text: "Policy-gradient methods are a subset of policy-based methods.",
+      explain: "",
+      correct: true,
+    },
+    {
+      text: "In Policy-based methods, we can optimize the parameter θ **indirectly** by maximizing the local approximation of the objective function with techniques like hill climbing, simulated annealing, or evolution strategies.",
+      explain: "",
+      correct: true,
+    }
+		{
+			text: "In Policy-gradient methods, we optimize the parameter θ **directly** by performing the gradient ascent on the performance of the objective function.",
+			explain: "",
+      correct: true
+		},
+	]}
+/>
+
+
+### Q4: Why do we use gradient ascent instead of gradient descent to optimize J(θ)?
+
+<Question
+	choices={[
+    {
+      text: "We want to minimize J(θ) and gradient ascent gives us the gives the direction of the steepest increase of J(θ)",
+      explain: "",
+    }
+		{
+			text: "We want to maximize J(θ) and gradient ascent gives us the gives the direction of the steepest increase of J(θ)",
+			explain: "",
+      correct: true
+		},
+	]}
+/>
+
+Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read the chapter again to reinforce (😏) your knowledge.