From b94cc104e17165148a15d0a1530de2573889b4e5 Mon Sep 17 00:00:00 2001 From: simoninithomas Date: Tue, 3 Jan 2023 10:07:58 +0100 Subject: [PATCH] Typo --- units/en/unit4/pg-theorem.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/units/en/unit4/pg-theorem.mdx b/units/en/unit4/pg-theorem.mdx index 9bfee23..2ed0392 100644 --- a/units/en/unit4/pg-theorem.mdx +++ b/units/en/unit4/pg-theorem.mdx @@ -75,7 +75,7 @@ Since: We can rewrite the gradient of the sum as the sum of gradients: -(\\ \nabla_\theta log P(\tau^{(i)};\theta)= \sum_{t=0}^{H} \nabla_\theta log \pi_\theta(a_{t}^{(i)}|s_{t}^{(i)}) \\) +\\( \nabla_\theta log P(\tau^{(i)};\theta)= \sum_{t=0}^{H} \nabla_\theta log \pi_\theta(a_{t}^{(i)}|s_{t}^{(i)}) \\) So, the final formula for estimating the policy gradient is: