requested change

2026-06-15 06:27:24 +08:00 · 2023-05-02 08:39:07 -05:00
parent 59d95f5825
commit afb42f18bd
1 changed files with 1 additions and 1 deletions
--- a/units/en/unit4/introduction.mdx
+++ b/units/en/unit4/introduction.mdx
@@ -8,7 +8,7 @@ Since the beginning of the course, we have only studied value-based methods, **

 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/link-value-policy.jpg" alt="Link value policy" />

-In value-based methods, the policy ** \\(π\\) is determined by the action value estimates by a function** (for instance, the greedy-policy, which selects the action with the highest value given a state).
+In value-based methods, the policy ** \(π\) only exists because of the action value estimates since the policy is just a function** (for instance, greedy-policy) that will select the action with the highest value given a state.

 With policy-based methods, we want to optimize the policy directly **without having an intermediate step of learning a value function.**