From 0744d542ada257bfe7def5a3cd635b5ef8f67322 Mon Sep 17 00:00:00 2001 From: Artagon Date: Fri, 16 Dec 2022 20:31:49 +0100 Subject: [PATCH] =?UTF-8?q?Properly=20display=20=CF=80*?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- units/en/unit2/two-types-value-based-methods.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/units/en/unit2/two-types-value-based-methods.mdx b/units/en/unit2/two-types-value-based-methods.mdx index 47a17e2..3422e7d 100644 --- a/units/en/unit2/two-types-value-based-methods.mdx +++ b/units/en/unit2/two-types-value-based-methods.mdx @@ -10,7 +10,7 @@ The value of a state is the **expected discounted return** the agent can get i But what does it mean to act according to our policy? After all, we don't have a policy in value-based methods since we train a value function and not a policy. -Remember that the goal of an **RL agent is to have an optimal policy π.** +Remember that the goal of an **RL agent is to have an optimal policy π\*.** To find the optimal policy, we learned about two different methods: @@ -35,8 +35,8 @@ Consequently, whatever method you use to solve your problem, **you will have a So the difference is: -- In policy-based, **the optimal policy (denoted π*) is found by training the policy directly.** -- In value-based, **finding an optimal value function (denoted Q* or V*, we'll study the difference after) in our leads to having an optimal policy.** +- In policy-based, **the optimal policy (denoted π\*) is found by training the policy directly.** +- In value-based, **finding an optimal value function (denoted Q\* or V\*, we'll study the difference after) in our leads to having an optimal policy.** Link between value and policy