Create quiz for unit 6

2026-06-15 06:27:24 +08:00 · 2023-12-03 18:58:37 +00:00
parent 93c5115ed6
commit a31043822e
1 changed files with 119 additions and 0 deletions
--- a/units/en/unit6/quiz.mdx
+++ b/units/en/unit6/quiz.mdx
@@ -0,0 +1,119 @@
+# Quiz
+
+The best way to learn and [to avoid the illusion of competence](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf) **is to test yourself.** This will help you to find **where you need to reinforce your knowledge**.
+
+
+### Q1: What of the following interpretations of bias-variance tradeoff is the most accurate in the field of Reinforcement Learning?
+
+<Question
+	choices={[
+		{
+			text: "The bias-variance tradeoff reflects how my model is able to generalize the knowledge to previously given tagged data we give to the model during training time.",
+			explain: "This is the traditional bias-variance tradeoff, but we don't have previously tagged data in Reinforcement Learning, but only a reward signal.",
+      correct: false,
+		},
+    {
+			text: "The bias-variance tradeoff reflects how well the reinforcement signal R (reward) reflects the true reward the agent should agent from the enviromment",
+			explain: "",
+      correct: true,
+		},		
+	]}
+/>
+
+### Q2: Which of the following statements are correct?
+<Question
+	choices={[
+		{
+			text: "A biased reward signal returns rewards similar to the real / expected ones from the environment",
+			explain: "",
+      correct: true,
+		},
+    {
+			text: "An unbiased reward signal returns rewards similar to the real / expected ones from the environment",
+			explain: "If a reward signal is unbiased, it means the reward signal we get is similar to the real reward we should be getting from an environment",
+      correct: false,
+		},
+     ,
+    {
+			text: "A reward signal with high variance has much noise in it and gets affected by, for example, stochastic (non constant) elements in the environment"
+			explain: "",
+      correct: true,
+		},		
+    {
+			text: "A reward signal with low variance has much noise in it and gets affected by, for example, stochastic (non constant) elements in the environment"
+			explain: "If a reward signal has low variance, then it's less affected by the noise of the environment, as elements appearing randomly in the trajectory",
+      correct: true,
+		},
+	]}
+/>
+
+
+### Q3: Which of the following statements are true about Monte-carlo method?
+<Question
+	choices={[
+		{
+			text: "It's a sampling mechanism, which means we don't consider analyze all the possible states, but a sample of those",
+			explain: "",
+      correct: true,
+		},
+    {
+			text: "It's very resistant to stochasticity (random elements in the trajectory)",
+			explain: "Monte-carlo randomly estimates everytime a sample of trajectories. However, even same trajectories can have different reward values if they contain stochastic elements",
+      correct: false,
+		},
+     ,
+    {
+			text: "To reduce the impact of stochastic elements in Monte-Carlo, we can take n strategies and average them, reducing their impact impact in case of noise"
+			explain: "",
+      correct: true,
+		},		    
+	]}
+/>
+
+### Q4: What is the Advanced Actor-Critic Method (A2C)?
+<details>
+<summary>Solution</summary>
+
+The idea behind Actor-Critic is the following - we learn two function approximations:
+1. A policy that controls how our agent acts (π)
+2. A value function to assist the policy update by measuring how good the action taken is (q)
+
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/step2.jpg" alt="Actor-Critic, step 2"/>
+
+</details>
+
+### Q5: Which of the following statemets are True about the Actor-Critic Method?
+<Question
+	choices={[
+    {
+			text: "The Critic does not learn from the training process",
+			explain: "Both the Actor and the Critic function parameters are updated during training time",
+      correct: false,
+		},
+		{
+			text: "The Actor learns a policy function, while the Critic learns a value function",
+			explain: "",
+      correct: true,
+		},
+    {
+			text: "It adds resistance to stochasticity and reduces high variance",
+			explain: "Monte-carlo randomly estimates everytime a sample of trajectories. However, even same trajectories can have different reward values if they contain stochastic elements",
+      correct: true,
+		},	    
+	]}
+/>
+
+
+
+### Q6: What is Advantege in the A2C method?
+<details>
+<summary>Solution</summary>
+
+Instead of using directly the Action-Value function of the Critic as it is, we calculate an Advantage function,  the relative advantage of an action compared to the others possible at a state.
+In other words: how taking that action at a state is better compared to the average value of the state
+
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/advantage1.jpg" alt="Advantage in A2C"/>
+
+</details>
+
+Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read the chapter again to reinforce (😏) your knowledge.