Small updates Unit 2

This commit is contained in:
simoninithomas
2022-12-12 03:57:37 +01:00
parent 5c8432379e
commit 7b9c1cf0a4

View File

@@ -41,7 +41,7 @@ If we go back to our example, we can say that the value of State 1 is equal to t
To calculate the value of State 1: the sum of rewards **if the agent started in that state 1** and then followed the **policy for all the time steps.**
This is equivalent to \\(V(S_{t})\\) = Immediate reward \\(R_{t+1}\\) + Discounted value of the next state \\(gamma * V(S_{t+1})\\)
This is equivalent to \\(V(S_{t})\\) = Immediate reward \\(R_{t+1}\\) + Discounted value of the next state \\(\gamma * V(S_{t+1})\\)
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/bellman6.jpg" alt="Bellman equation"/>