From b41a9c0342292bbb66c011b4ff3ce7a8a9a7de70 Mon Sep 17 00:00:00 2001 From: Keshavram Kuduwa <131107576+kuduwa-keshavram@users.noreply.github.com> Date: Mon, 2 Jun 2025 16:58:16 +0530 Subject: [PATCH] Simple grammatical change in q-learning.mdx --- units/en/unit2/q-learning.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/units/en/unit2/q-learning.mdx b/units/en/unit2/q-learning.mdx index 1ff8456..1357163 100644 --- a/units/en/unit2/q-learning.mdx +++ b/units/en/unit2/q-learning.mdx @@ -19,7 +19,7 @@ The **Q comes from "the Quality" (the value) of that action at that state.** Let's recap the difference between value and reward: - The *value of a state*, or a *state-action pair* is the expected cumulative reward our agent gets if it starts at this state (or state-action pair) and then acts accordingly to its policy. -- The *reward* is the **feedback I get from the environment** after performing an action at a state. +- The *reward* is the **feedback it gets from the environment** after performing an action at a state. Internally, our Q-function is encoded by **a Q-table, a table where each cell corresponds to a state-action pair value.** Think of this Q-table as **the memory or cheat sheet of our Q-function.**