From 40cf7684e51765267c7c6f20f7da8b9858769780 Mon Sep 17 00:00:00 2001 From: Juan Martinez <36634572+josejuanmartinez@users.noreply.github.com> Date: Wed, 6 Dec 2023 11:10:43 +0000 Subject: [PATCH] Fixes typo and comma(s) --- units/en/unit6/quiz.mdx | 44 ++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 23 deletions(-) diff --git a/units/en/unit6/quiz.mdx b/units/en/unit6/quiz.mdx index b53a5ef..0fc9b38 100644 --- a/units/en/unit6/quiz.mdx +++ b/units/en/unit6/quiz.mdx @@ -10,12 +10,12 @@ The best way to learn and [to avoid the illusion of competence](https://www.cour { text: "The bias-variance tradeoff reflects how my model is able to generalize the knowledge to previously tagged data we give to the model during training time.", explain: "This is the traditional bias-variance tradeoff in Machine Learning. In our specific case of Reinforcement Learning, we don't have previously tagged data, but only a reward signal.", - correct: false, + correct: false, }, - { + { text: "The bias-variance tradeoff reflects how well the reinforcement signal reflects the true reward the agent should get from the enviromment", explain: "", - correct: true, + correct: true, }, ]} /> @@ -26,23 +26,22 @@ The best way to learn and [to avoid the illusion of competence](https://www.cour { text: "An unbiased reward signal returns rewards similar to the real / expected ones from the environment", explain: "", - correct: true, + correct: true, }, - { + { text: "A biased reward signal returns rewards similar to the real / expected ones from the environment", explain: "If a reward signal is biased, it means the reward signal we get differs from the real reward we should be getting from an environment", - correct: false, + correct: false, }, - , - { + { text: "A reward signal with high variance has much noise in it and gets affected by, for example, stochastic (non constant) elements in the environment" explain: "", - correct: true, + correct: true, }, - { + { text: "A reward signal with low variance has much noise in it and gets affected by, for example, stochastic (non constant) elements in the environment" explain: "If a reward signal has low variance, then it's less affected by the noise of the environment and produce similar values regardless the random elements in the environment", - correct: false, + correct: false, }, ]} /> @@ -54,18 +53,17 @@ The best way to learn and [to avoid the illusion of competence](https://www.cour { text: "It's a sampling mechanism, which means we don't consider analyze all the possible states, but a sample of those", explain: "", - correct: true, + correct: true, }, - { + { text: "It's very resistant to stochasticity (random elements in the trajectory)", explain: "Monte-carlo randomly estimates everytime a sample of trajectories. However, even same trajectories can have different reward values if they contain stochastic elements", - correct: false, + correct: false, }, - , - { + { text: "To reduce the impact of stochastic elements in Monte-Carlo, we can take `n` strategies and average them, reducing their impact impact in case of noise" explain: "", - correct: true, + correct: true, }, ]} /> @@ -85,27 +83,27 @@ The idea behind Actor-Critic is that we learn two function approximations: ### Q5: Which of the following statemets are True about the Actor-Critic Method? -### Q6: What is `Advantege` in the A2C method? +### Q6: What is `Advantage` in the A2C method?
Solution