mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-05 11:38:43 +08:00
Merge pull request #206 from Mxbonn/unit1
Minor text improvements unit 1.
This commit is contained in:
@@ -17,7 +17,7 @@ For instance, think about Super Mario Bros: an episode begin at the launch of a
|
||||
|
||||
## Continuing tasks [[continuing-tasks]]
|
||||
|
||||
These are tasks that continue forever (no terminal state). In this case, the agent must **learn how to choose the best actions and simultaneously interact with the environment.**
|
||||
These are tasks that continue forever (**no terminal state**). In this case, the agent must **learn how to choose the best actions and simultaneously interact with the environment.**
|
||||
|
||||
For instance, an agent that does automated stock trading. For this task, there is no starting point and terminal state. **The agent keeps running until we decide to stop it.**
|
||||
|
||||
|
||||
@@ -8,7 +8,7 @@ In other terms, how to build an RL agent that can **select the actions that ma
|
||||
|
||||
## The Policy π: the agent’s brain [[policy]]
|
||||
|
||||
The Policy **π** is the **brain of our Agent**, it’s the function that tells us what **action to take given the state we are.** So it **defines the agent’s behavior** at a given time.
|
||||
The Policy **π** is the **brain of our Agent**, it’s the function that tells us what **action to take given the state we are in.** So it **defines the agent’s behavior** at a given time.
|
||||
|
||||
<figure>
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/policy_1.jpg" alt="Policy" />
|
||||
@@ -67,7 +67,7 @@ If we recap:
|
||||
|
||||
## Value-based methods [[value-based]]
|
||||
|
||||
In value-based methods, instead of training a policy function, we **train a value function** that maps a state to the expected value **of being at that state.**
|
||||
In value-based methods, instead of learning a policy function, we **learn a value function** that maps a state to the expected value **of being at that state.**
|
||||
|
||||
The value of a state is the **expected discounted return** the agent can get if it **starts in that state, and then act according to our policy.**
|
||||
|
||||
|
||||
Reference in New Issue
Block a user