From afb42f18bd30938e00bf6f3ba4bb9a32b277890b Mon Sep 17 00:00:00 2001 From: dylwil3 <53534755+dylwil3@users.noreply.github.com> Date: Tue, 2 May 2023 08:39:07 -0500 Subject: [PATCH] requested change --- units/en/unit4/introduction.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/units/en/unit4/introduction.mdx b/units/en/unit4/introduction.mdx index 6dc4998..c087059 100644 --- a/units/en/unit4/introduction.mdx +++ b/units/en/unit4/introduction.mdx @@ -8,7 +8,7 @@ Since the beginning of the course, we have only studied value-based methods, ** Link value policy -In value-based methods, the policy ** \\(π\\) is determined by the action value estimates by a function** (for instance, the greedy-policy, which selects the action with the highest value given a state). +In value-based methods, the policy ** \(π\) only exists because of the action value estimates since the policy is just a function** (for instance, greedy-policy) that will select the action with the highest value given a state. With policy-based methods, we want to optimize the policy directly **without having an intermediate step of learning a value function.**