From cdb25393c43b8d8fe23ce25ae1d9a9b43a1a0d2e Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 3 Dec 2022 11:13:04 +0100 Subject: [PATCH] Update units/en/unit2/two-types-value-based-methods.mdx Co-authored-by: Sayak Paul --- units/en/unit2/two-types-value-based-methods.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/units/en/unit2/two-types-value-based-methods.mdx b/units/en/unit2/two-types-value-based-methods.mdx index 5cf12da..78b9195 100644 --- a/units/en/unit2/two-types-value-based-methods.mdx +++ b/units/en/unit2/two-types-value-based-methods.mdx @@ -12,7 +12,7 @@ But what does it mean to act according to our policy? After all, we don't have a Remember that the goal of an **RL agent is to have an optimal policy π.** -To find it, we learned that there are two different methods: +To find the optimal policy, we learned about two different methods: - *Policy-based methods:* **Directly train the policy** to select what action to take given a state (or a probability distribution over actions at that state). In this case, we **don't have a value function.**