From 9d777a01b0fa94d5492dc8ca350162e64cb3bfb8 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Tue, 5 Mar 2024 10:40:03 +0100 Subject: [PATCH 1/2] Update pg-theorem.mdx --- units/en/unit4/pg-theorem.mdx | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/units/en/unit4/pg-theorem.mdx b/units/en/unit4/pg-theorem.mdx index 9db62d9..dc7a320 100644 --- a/units/en/unit4/pg-theorem.mdx +++ b/units/en/unit4/pg-theorem.mdx @@ -27,9 +27,13 @@ We then multiply every term in the sum by \\(\frac{P(\tau;\theta)}{P(\tau;\theta \\( = \sum_{\tau} \frac{P(\tau;\theta)}{P(\tau;\theta)}\nabla_\theta P(\tau;\theta)R(\tau) \\) -We can simplify further this since \\( \frac{P(\tau;\theta)}{P(\tau;\theta)}\nabla_\theta P(\tau;\theta) = P(\tau;\theta)\frac{\nabla_\theta P(\tau;\theta)}{P(\tau;\theta)} \\) +We can simplify further this since -\\(= \sum_{\tau} P(\tau;\theta) \frac{\nabla_\theta P(\tau;\theta)}{P(\tau;\theta)}R(\tau) \\) +\\( \frac{P(\tau;\theta)}{P(\tau;\theta)}\nabla_\theta P(\tau;\theta) = P(\tau;\theta)\frac{\nabla_\theta P(\tau;\theta)}{P(\tau;\theta)} \\) + + + +\\ (P(\tau;\theta)\frac{\nabla_\theta P(\tau;\theta)}{P(\tau;\theta)}= \sum_{\tau} P(\tau;\theta) \frac{\nabla_\theta P(\tau;\theta)}{P(\tau;\theta)}R(\tau) \\) We can then use the *derivative log trick* (also called *likelihood ratio trick* or *REINFORCE trick*), a simple rule in calculus that implies that \\( \nabla_x log f(x) = \frac{\nabla_x f(x)}{f(x)} \\) From 72473f08a804333e01160ec62136e8635bd97412 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Tue, 5 Mar 2024 10:45:12 +0100 Subject: [PATCH 2/2] Update pg-theorem.mdx --- units/en/unit4/pg-theorem.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/units/en/unit4/pg-theorem.mdx b/units/en/unit4/pg-theorem.mdx index dc7a320..602ff69 100644 --- a/units/en/unit4/pg-theorem.mdx +++ b/units/en/unit4/pg-theorem.mdx @@ -33,7 +33,7 @@ We can simplify further this since -\\ (P(\tau;\theta)\frac{\nabla_\theta P(\tau;\theta)}{P(\tau;\theta)}= \sum_{\tau} P(\tau;\theta) \frac{\nabla_\theta P(\tau;\theta)}{P(\tau;\theta)}R(\tau) \\) +\\( P(\tau;\theta)\frac{\nabla_\theta P(\tau;\theta)}{P(\tau;\theta)}= \sum_{\tau} P(\tau;\theta) \frac{\nabla_\theta P(\tau;\theta)}{P(\tau;\theta)}R(\tau) \\) We can then use the *derivative log trick* (also called *likelihood ratio trick* or *REINFORCE trick*), a simple rule in calculus that implies that \\( \nabla_x log f(x) = \frac{\nabla_x f(x)}{f(x)} \\)