requested change

This commit is contained in:
dylwil3
2023-05-02 08:39:07 -05:00
committed by GitHub
parent 59d95f5825
commit afb42f18bd

View File

@@ -8,7 +8,7 @@ Since the beginning of the course, we have only studied value-based methods, **
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/link-value-policy.jpg" alt="Link value policy" />
In value-based methods, the policy ** \\(π\\) is determined by the action value estimates by a function** (for instance, the greedy-policy, which selects the action with the highest value given a state).
In value-based methods, the policy ** \(π\) only exists because of the action value estimates since the policy is just a function** (for instance, greedy-policy) that will select the action with the highest value given a state.
With policy-based methods, we want to optimize the policy directly **without having an intermediate step of learning a value function.**