Update pyramids.mdx

This commit is contained in:
Thomas Simonini
2023-01-07 17:54:05 +01:00
committed by GitHub
parent cd118ad2cc
commit 19c3876657

View File

@@ -11,6 +11,9 @@ The reward function is:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/pyramids-reward.png" alt="Pyramids Environment"/>
In terms of code it looks like this
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/pyramids-reward-code.png" alt="Pyramids Reward"/>
To train this new agent that seeks that button and then the Pyramid to destroy, well use a combination of two types of rewards:
- The *extrinsic one* given by the environment (illustration above).
@@ -26,11 +29,11 @@ In terms of observation, we **use 148 raycasts that can each detect objects** (s
We also use a **boolean variable indicating the switch state** (did we turn on or not the switch to spawn the Pyramid) and a vector that **contains the agents speed**.
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/pyramids-obs-code.png" alt="Pyramids obs code"/>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/pyramids-obs-code.png" alt="Pyramids obs code"/>
## The action space
The action space is **discrete** with four possible actions:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/pyramids-action.png" alt="Pyramids Environment"/>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/pyramids-action.png" alt="Pyramids Environment"/>