Merge pull request #206 from Mxbonn/unit1

Minor text improvements unit 1.
This commit is contained in:
Thomas Simonini
2023-02-25 10:23:14 +01:00
committed by GitHub
2 changed files with 3 additions and 3 deletions

View File

@@ -17,7 +17,7 @@ For instance, think about Super Mario Bros: an episode begin at the launch of a
## Continuing tasks [[continuing-tasks]]
These are tasks that continue forever (no terminal state). In this case, the agent must **learn how to choose the best actions and simultaneously interact with the environment.**
These are tasks that continue forever (**no terminal state**). In this case, the agent must **learn how to choose the best actions and simultaneously interact with the environment.**
For instance, an agent that does automated stock trading. For this task, there is no starting point and terminal state. **The agent keeps running until we decide to stop it.**

View File

@@ -8,7 +8,7 @@ In other terms, how to build an RL agent that can **select the actions that ma
## The Policy π: the agents brain [[policy]]
The Policy **π** is the **brain of our Agent**, its the function that tells us what **action to take given the state we are.** So it **defines the agents behavior** at a given time.
The Policy **π** is the **brain of our Agent**, its the function that tells us what **action to take given the state we are in.** So it **defines the agents behavior** at a given time.
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/policy_1.jpg" alt="Policy" />
@@ -67,7 +67,7 @@ If we recap:
## Value-based methods [[value-based]]
In value-based methods, instead of training a policy function, we **train a value function** that maps a state to the expected value **of being at that state.**
In value-based methods, instead of learning a policy function, we **learn a value function** that maps a state to the expected value **of being at that state.**
The value of a state is the **expected discounted return** the agent can get if it **starts in that state, and then act according to our policy.**