From fc66ea7e4aa2b53c761367d55154566477a98c17 Mon Sep 17 00:00:00 2001 From: Artagon Date: Sat, 17 Dec 2022 22:33:02 +0100 Subject: [PATCH] Rephrasing for initial epsilon value --- units/en/unit2/q-learning.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/units/en/unit2/q-learning.mdx b/units/en/unit2/q-learning.mdx index e78a598..2dd7190 100644 --- a/units/en/unit2/q-learning.mdx +++ b/units/en/unit2/q-learning.mdx @@ -80,7 +80,7 @@ We need to initialize the Q-table for each state-action pair. **Most of the tim Epsilon greedy strategy is a policy that handles the exploration/exploitation trade-off. -The idea is that we define epsilon ɛ ≤ 1.0: +The idea is that we define the initial epsilon ɛ = 1.0: - *With probability 1 — ɛ* : we do **exploitation** (aka our agent selects the action with the highest state-action pair value). - With probability ɛ: **we do exploration** (trying random action).