From 59bce06bea7a92529088db5404c5b991420c27d0 Mon Sep 17 00:00:00 2001 From: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Date: Wed, 8 Nov 2023 12:49:55 +0800 Subject: [PATCH] Update policy-gradient.mdx --- units/en/unit4/policy-gradient.mdx | 1 + 1 file changed, 1 insertion(+) diff --git a/units/en/unit4/policy-gradient.mdx b/units/en/unit4/policy-gradient.mdx index 1a178d6..ccc34cb 100644 --- a/units/en/unit4/policy-gradient.mdx +++ b/units/en/unit4/policy-gradient.mdx @@ -54,6 +54,7 @@ Let's give some more details on this formula: - \\(R(\tau)\\) : Return from an arbitrary trajectory. To take this quantity and use it to calculate the expected return, we need to multiply it by the probability of each possible trajectory. + - \\(P(\tau;\theta)\\) : Probability of each possible trajectory \\(\tau\\) (that probability depends on \\( \theta\\) since it defines the policy that it uses to select the actions of the trajectory which has an impact of the states visited). Probability