diff --git a/units/en/unit2/bellman-equation.mdx b/units/en/unit2/bellman-equation.mdx index 99d753a..6979d23 100644 --- a/units/en/unit2/bellman-equation.mdx +++ b/units/en/unit2/bellman-equation.mdx @@ -18,7 +18,7 @@ Then, to calculate the \\(V(S_{t+1})\\), we need to calculate the return startin
Bellman equation -
To calculate the value of State 2: the sum of rewards **if the agent started in that state**, and then followed the **policy for all the time steps.**
+
To calculate the value of State 2: the sum of rewards if the agent started in that state, and then followed the policy for all the time steps.
So you may have noticed, we're repeating the computation of the value of different states, which can be tedious if you need to do it for each state value or state-action value.