Typo

2026-04-13 18:00:45 +08:00 · 2023-01-03 10:07:58 +01:00
parent 8e0bbdb82e
commit b94cc104e1
1 changed files with 1 additions and 1 deletions
--- a/units/en/unit4/pg-theorem.mdx
+++ b/units/en/unit4/pg-theorem.mdx
@@ -75,7 +75,7 @@ Since:

 We can rewrite the gradient of the sum as the sum of gradients:

-(\\ \nabla_\theta log P(\tau^{(i)};\theta)=    \sum_{t=0}^{H} \nabla_\theta log \pi_\theta(a_{t}^{(i)}|s_{t}^{(i)}) \\)
+\\( \nabla_\theta log P(\tau^{(i)};\theta)=    \sum_{t=0}^{H} \nabla_\theta log \pi_\theta(a_{t}^{(i)}|s_{t}^{(i)}) \\)

 So, the final formula for estimating the policy gradient is: