diff --git a/units/en/unit2/q-learning.mdx b/units/en/unit2/q-learning.mdx index 48f01d2..e78a598 100644 --- a/units/en/unit2/q-learning.mdx +++ b/units/en/unit2/q-learning.mdx @@ -144,7 +144,7 @@ Is different from the policy we use during the training part: - *On-policy:* using the **same policy for acting and updating.** -For instance, with Sarsa, another value-based algorithm, **the epsilon-greedy Policy selects the next state-action pair, not a greedy policy.** +For instance, with Sarsa, another value-based algorithm, **the epsilon-greedy policy selects the next state-action pair, not a greedy policy.**