mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-13 18:00:45 +08:00
Update pyramids.mdx
This commit is contained in:
@@ -2,14 +2,14 @@
|
||||
|
||||
The goal in this environment is to train our agent to **get the gold brick on the top of the Pyramid. In order to do that, it needs to press a button to spawn a pyramid, navigate to the Pyramid, knock it over, and move to the gold brick at the top**.
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/pyramids.png" alt="Pyramids Environment"/>
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/pyramids.png" alt="Pyramids Environment"/>
|
||||
|
||||
|
||||
## The reward function
|
||||
|
||||
The reward function is:
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit5/pyramids-reward.png" alt="Pyramids Environment"/>
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/pyramids-reward.png" alt="Pyramids Environment"/>
|
||||
|
||||
To train this new agent that seeks that button and then the Pyramid to destroy, we’ll use a combination of two types of rewards:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user