From 9479050de0b01c7576f638446afd6fb0a3cdbc0d Mon Sep 17 00:00:00 2001 From: Maxim Bonnaerens Date: Thu, 26 Jan 2023 11:34:06 +0100 Subject: [PATCH] Minor text fixes --- units/en/unit1/tasks.mdx | 2 +- units/en/unit1/two-methods.mdx | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/units/en/unit1/tasks.mdx b/units/en/unit1/tasks.mdx index 1be4fea..9eb83a2 100644 --- a/units/en/unit1/tasks.mdx +++ b/units/en/unit1/tasks.mdx @@ -17,7 +17,7 @@ For instance, think about Super Mario Bros: an episode begin at the launch of a ## Continuing tasks [[continuing-tasks]] -These are tasks that continue forever (no terminal state). In this case, the agent must **learn how to choose the best actions and simultaneously interact with the environment.** +These are tasks that continue forever (**no terminal state**). In this case, the agent must **learn how to choose the best actions and simultaneously interact with the environment.** For instance, an agent that does automated stock trading. For this task, there is no starting point and terminal state. **The agent keeps running until we decide to stop it.** diff --git a/units/en/unit1/two-methods.mdx b/units/en/unit1/two-methods.mdx index e6459c2..5818e5e 100644 --- a/units/en/unit1/two-methods.mdx +++ b/units/en/unit1/two-methods.mdx @@ -8,7 +8,7 @@ In other terms, how to build an RL agent that can **select the actions that ma ## The Policy π: the agent’s brain [[policy]] -The Policy **π** is the **brain of our Agent**, it’s the function that tells us what **action to take given the state we are.** So it **defines the agent’s behavior** at a given time. +The Policy **π** is the **brain of our Agent**, it’s the function that tells us what **action to take given the state we are in.** So it **defines the agent’s behavior** at a given time.
Policy @@ -67,7 +67,7 @@ If we recap: ## Value-based methods [[value-based]] -In value-based methods, instead of training a policy function, we **train a value function** that maps a state to the expected value **of being at that state.** +In value-based methods, instead of learning a policy function, we **learn a value function** that maps a state to the expected value **of being at that state.** The value of a state is the **expected discounted return** the agent can get if it **starts in that state, and then act according to our policy.**