Rephrasing for initial epsilon value

2026-04-04 02:57:58 +08:00 · 2022-12-17 22:33:02 +01:00
parent 96714cdb10
commit fc66ea7e4a
1 changed files with 1 additions and 1 deletions
--- a/units/en/unit2/q-learning.mdx
+++ b/units/en/unit2/q-learning.mdx
@@ -80,7 +80,7 @@ We need to initialize the Q-table for each state-action pair. **Most of the tim

 Epsilon greedy strategy is a policy that handles the exploration/exploitation trade-off.

-The idea is that we define epsilon ɛ ≤ 1.0:
+The idea is that we define the initial epsilon ɛ = 1.0:

 - *With probability 1 — ɛ* : we do **exploitation** (aka our agent selects the action with the highest state-action pair value).
 - With probability ɛ: **we do exploration** (trying random action).