mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-05 03:28:05 +08:00
Revert "Removing duplicate text directly below captions"
This reverts commit ec8973296a.
This commit is contained in:
@@ -61,6 +61,8 @@ In a chess game, we have access to the whole board information, so we receive a
|
||||
<figcaption>In Super Mario Bros, we only see the part of the level close to the player, so we receive an observation.</figcaption>
|
||||
</figure>
|
||||
|
||||
In Super Mario Bros, we only see the part of the level close to the player, so we receive an observation.
|
||||
|
||||
In Super Mario Bros, we are in a partially observed environment. We receive an observation **since we only see a part of the level.**
|
||||
|
||||
<Tip>
|
||||
@@ -85,6 +87,8 @@ The actions can come from a *discrete* or *continuous space*:
|
||||
|
||||
</figure>
|
||||
|
||||
Again, in Super Mario Bros, we have a finite set of actions since we have only 4 directions.
|
||||
|
||||
- *Continuous space*: the number of possible actions is **infinite**.
|
||||
|
||||
<figure>
|
||||
|
||||
@@ -82,6 +82,8 @@ Here we see that our value function **defined values for each possible state.**
|
||||
<figcaption>Thanks to our value function, at each step our policy will select the state with the biggest value defined by the value function: -7, then -6, then -5 (and so on) to attain the goal.</figcaption>
|
||||
</figure>
|
||||
|
||||
Thanks to our value function, at each step our policy will select the state with the biggest value defined by the value function: -7, then -6, then -5 (and so on) to attain the goal.
|
||||
|
||||
If we recap:
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/vbm_1.jpg" alt="Vbm recap" width="100%" />
|
||||
|
||||
Reference in New Issue
Block a user