diff --git a/units/en/unit4/pg-theorem.mdx b/units/en/unit4/pg-theorem.mdx index 9bfee23..2ed0392 100644 --- a/units/en/unit4/pg-theorem.mdx +++ b/units/en/unit4/pg-theorem.mdx @@ -75,7 +75,7 @@ Since: We can rewrite the gradient of the sum as the sum of gradients: -(\\ \nabla_\theta log P(\tau^{(i)};\theta)= \sum_{t=0}^{H} \nabla_\theta log \pi_\theta(a_{t}^{(i)}|s_{t}^{(i)}) \\) +\\( \nabla_\theta log P(\tau^{(i)};\theta)= \sum_{t=0}^{H} \nabla_\theta log \pi_\theta(a_{t}^{(i)}|s_{t}^{(i)}) \\) So, the final formula for estimating the policy gradient is: