Merge pull request #206 from Mxbonn/unit1

Minor text improvements unit 1.
2026-06-15 06:27:24 +08:00 · 2023-02-25 10:23:14 +01:00
parent 1a2374cbd0 9479050de0
commit 4a0ed7d562
2 changed files with 3 additions and 3 deletions
--- a/units/en/unit1/tasks.mdx
+++ b/units/en/unit1/tasks.mdx
@@ -17,7 +17,7 @@ For instance, think about Super Mario Bros: an episode begin at the launch of a

 ## Continuing tasks [[continuing-tasks]]

-These are tasks that continue forever (no terminal state). In this case, the agent must **learn how to choose the best actions and simultaneously interact with the environment.**
+These are tasks that continue forever (**no terminal state**). In this case, the agent must **learn how to choose the best actions and simultaneously interact with the environment.**

 For instance, an agent that does automated stock trading. For this task, there is no starting point and terminal state. **The agent keeps running until we decide to stop it.**

--- a/units/en/unit1/two-methods.mdx
+++ b/units/en/unit1/two-methods.mdx
@@ -8,7 +8,7 @@ In other terms, how to build an RL agent that can **select the actions that ma

 ## The Policy π: the agent’s brain [[policy]]

-The Policy **π** is the **brain of our Agent**, it’s the function that tells us what **action to take given the state we are.** So it **defines the agent’s behavior** at a given time.
+The Policy **π** is the **brain of our Agent**, it’s the function that tells us what **action to take given the state we are in.** So it **defines the agent’s behavior** at a given time.

 <figure>
 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/policy_1.jpg" alt="Policy" />
@@ -67,7 +67,7 @@ If we recap:

 ## Value-based methods [[value-based]]

-In value-based methods, instead of training a policy function, we **train a value function** that maps a state to the expected value **of being at that state.**
+In value-based methods, instead of learning a policy function, we **learn a value function** that maps a state to the expected value **of being at that state.**

 The value of a state is the **expected discounted return** the agent can get if it **starts in that state, and then act according to our policy.**