mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-13 18:00:45 +08:00
Typo
This commit is contained in:
@@ -75,7 +75,7 @@ Since:
|
||||
|
||||
We can rewrite the gradient of the sum as the sum of gradients:
|
||||
|
||||
(\\ \nabla_\theta log P(\tau^{(i)};\theta)= \sum_{t=0}^{H} \nabla_\theta log \pi_\theta(a_{t}^{(i)}|s_{t}^{(i)}) \\)
|
||||
\\( \nabla_\theta log P(\tau^{(i)};\theta)= \sum_{t=0}^{H} \nabla_\theta log \pi_\theta(a_{t}^{(i)}|s_{t}^{(i)}) \\)
|
||||
|
||||
So, the final formula for estimating the policy gradient is:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user