mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-10 06:08:31 +08:00
Removing duplicate text directly below captions
This commit is contained in:
@@ -61,8 +61,6 @@ In a chess game, we have access to the whole board information, so we receive a
|
||||
<figcaption>In Super Mario Bros, we only see the part of the level close to the player, so we receive an observation.</figcaption>
|
||||
</figure>
|
||||
|
||||
In Super Mario Bros, we only see the part of the level close to the player, so we receive an observation.
|
||||
|
||||
In Super Mario Bros, we are in a partially observed environment. We receive an observation **since we only see a part of the level.**
|
||||
|
||||
<Tip>
|
||||
@@ -87,8 +85,6 @@ The actions can come from a *discrete* or *continuous space*:
|
||||
|
||||
</figure>
|
||||
|
||||
Again, in Super Mario Bros, we have a finite set of actions since we have only 4 directions.
|
||||
|
||||
- *Continuous space*: the number of possible actions is **infinite**.
|
||||
|
||||
<figure>
|
||||
|
||||
@@ -82,8 +82,6 @@ Here we see that our value function **defined values for each possible state.**
|
||||
<figcaption>Thanks to our value function, at each step our policy will select the state with the biggest value defined by the value function: -7, then -6, then -5 (and so on) to attain the goal.</figcaption>
|
||||
</figure>
|
||||
|
||||
Thanks to our value function, at each step our policy will select the state with the biggest value defined by the value function: -7, then -6, then -5 (and so on) to attain the goal.
|
||||
|
||||
If we recap:
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/vbm_1.jpg" alt="Vbm recap" width="100%" />
|
||||
|
||||
Reference in New Issue
Block a user