This commit is contained in:
simoninithomas
2023-01-03 10:07:58 +01:00
parent 8e0bbdb82e
commit b94cc104e1

View File

@@ -75,7 +75,7 @@ Since:
We can rewrite the gradient of the sum as the sum of gradients:
(\\ \nabla_\theta log P(\tau^{(i)};\theta)= \sum_{t=0}^{H} \nabla_\theta log \pi_\theta(a_{t}^{(i)}|s_{t}^{(i)}) \\)
\\( \nabla_\theta log P(\tau^{(i)};\theta)= \sum_{t=0}^{H} \nabla_\theta log \pi_\theta(a_{t}^{(i)}|s_{t}^{(i)}) \\)
So, the final formula for estimating the policy gradient is: