Simple grammatical change in q-learning.mdx

2026-06-30 01:36:25 +08:00 · 2025-06-02 16:58:16 +05:30
parent ab308e9034
commit b41a9c0342
1 changed files with 1 additions and 1 deletions
--- a/units/en/unit2/q-learning.mdx
+++ b/units/en/unit2/q-learning.mdx
@@ -19,7 +19,7 @@ The **Q comes from "the Quality" (the value) of that action at that state.**
 Let's recap the difference between value and reward:

 - The *value of a state*, or a *state-action pair* is the expected cumulative reward our agent gets if it starts at this state (or state-action pair) and then acts accordingly to its policy.
- The *reward* is the **feedback I get from the environment** after performing an action at a state.
+- The *reward* is the **feedback it gets from the environment** after performing an action at a state.

 Internally, our Q-function is encoded by **a Q-table, a table where each cell corresponds to a state-action pair value.** Think of this Q-table as **the memory or cheat sheet of our Q-function.**