mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-02 02:00:15 +08:00
Update quiz.mdx
This commit is contained in:
@@ -10,12 +10,12 @@ The best way to learn and [to avoid the illusion of competence](https://www.cour
|
||||
{
|
||||
text: "Policy-gradient methods can learn a stochastic policy",
|
||||
explain: "",
|
||||
correct: true
|
||||
correct: true,
|
||||
},
|
||||
{
|
||||
text: "Policy-gradient methods are more effective in high-dimensional action spaces and continuous actions spaces",
|
||||
explain: "",
|
||||
correct: true
|
||||
correct: true,
|
||||
},
|
||||
{
|
||||
text: "Policy-gradient converges most of the time on a global maximum.",
|
||||
@@ -53,12 +53,12 @@ The best way to learn and [to avoid the illusion of competence](https://www.cour
|
||||
text: "In Policy-based methods, we can optimize the parameter θ **indirectly** by maximizing the local approximation of the objective function with techniques like hill climbing, simulated annealing, or evolution strategies.",
|
||||
explain: "",
|
||||
correct: true,
|
||||
}
|
||||
{
|
||||
text: "In Policy-gradient methods, we optimize the parameter θ **directly** by performing the gradient ascent on the performance of the objective function.",
|
||||
explain: "",
|
||||
correct: true
|
||||
},
|
||||
},
|
||||
{
|
||||
text: "In Policy-gradient methods, we optimize the parameter θ **directly** by performing the gradient ascent on the performance of the objective function.",
|
||||
explain: "",
|
||||
correct: true,
|
||||
},
|
||||
]}
|
||||
/>
|
||||
|
||||
|
||||
Reference in New Issue
Block a user