diff --git a/units/en/unit2/q-learning.mdx b/units/en/unit2/q-learning.mdx index e78a598..2dd7190 100644 --- a/units/en/unit2/q-learning.mdx +++ b/units/en/unit2/q-learning.mdx @@ -80,7 +80,7 @@ We need to initialize the Q-table for each state-action pair. **Most of the tim Epsilon greedy strategy is a policy that handles the exploration/exploitation trade-off. -The idea is that we define epsilon ɛ ≤ 1.0: +The idea is that we define the initial epsilon ɛ = 1.0: - *With probability 1 — ɛ* : we do **exploitation** (aka our agent selects the action with the highest state-action pair value). - With probability ɛ: **we do exploration** (trying random action).