Update units/en/unit2/bellman-equation.mdx

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
This commit is contained in:
Thomas Simonini
2022-12-03 11:13:19 +01:00
committed by GitHub
parent cdb25393c4
commit 3e9e315e53

View File

@@ -47,7 +47,7 @@ Which is equivalent to \\(V(S_{t})\\) = Immediate reward \\(R_{t+1}\\) + Dis
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/bellman6.jpg" alt="Bellman equation"/>
For simplification, here we don't discount, so gamma = 1.
In the interest of simplicity, here we don't discount, so gamma = 1.
- The value of \\(V(S_{t+1}) \\) = Immediate reward \\(R_{t+2}\\) + Discounted value of the next state ( \\(gamma * V(S_{t+2})\\) ).
- And so on.