diff --git a/units/en/unit4/advantages-disadvantages.mdx b/units/en/unit4/advantages-disadvantages.mdx index d9c3438..3739a72 100644 --- a/units/en/unit4/advantages-disadvantages.mdx +++ b/units/en/unit4/advantages-disadvantages.mdx @@ -38,7 +38,7 @@ Under a deterministic policy, the policy will either always move right when in a Under a value-based Reinforcement learning algorithm, we learn a **quasi-deterministic policy** ("greedy epsilon strategy"). Consequently, our agent can **spend a lot of time before finding the dust**. -On the other hand, an optimal stochastic policy **will randomly move left or right in rose states**. Consequently, **it will not be stuck and will reach the goal state with a high probability**. +On the other hand, an optimal stochastic policy **will randomly move left or right in red (colored) states**. Consequently, **it will not be stuck and will reach the goal state with a high probability**.
Hamster 1