diff --git a/units/en/unit1/rl-framework.mdx b/units/en/unit1/rl-framework.mdx index b6a01a1..b8e50d9 100644 --- a/units/en/unit1/rl-framework.mdx +++ b/units/en/unit1/rl-framework.mdx @@ -14,7 +14,7 @@ To understand the RL process, let’s imagine an agent learning to play a platfo \$\sqrt{2}\$ -- Our Agent receives **state $S_0$** from the **Environment** — we receive the first frame of our game (Environment). +- Our Agent receives **state \\(S_0\\)** from the **Environment** — we receive the first frame of our game (Environment). - Based on that **state \\(S_0\\),** the Agent takes **action \\(A_0\\)** — our Agent will move to the right. - Environment goes to a **new** **state \\(S_1\\)** — new frame. - The environment gives some **reward \\(R_1\\)** to the Agent — we’re not dead *(Positive Reward +1)*.