Merge pull request #420 from fzyzcjy/patch-1

Super tiny fix format
2026-06-18 09:38:18 +08:00 · 2024-03-04 16:59:57 +01:00
parent 262cc0c608 59bce06bea
commit bf5a72ad6c
1 changed files with 1 additions and 0 deletions
--- a/units/en/unit4/policy-gradient.mdx
+++ b/units/en/unit4/policy-gradient.mdx
@@ -54,6 +54,7 @@ Let's give some more details on this formula:


 - \\(R(\tau)\\) :  Return from an arbitrary trajectory. To take this quantity and use it to calculate the expected return, we need to multiply it by the probability of each possible trajectory.
+
 - \\(P(\tau;\theta)\\) : Probability of each possible trajectory \\(\tau\\) (that probability depends on \\( \theta\\) since it defines the policy that it uses to select the actions of the trajectory which has an impact of the states visited).

 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit6/probability.png" alt="Probability"/>