mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-05 11:38:43 +08:00
Add illustrations
This commit is contained in:
@@ -10,7 +10,10 @@ The goal in this environment is that Julien the bear **hit as many targets as po
|
||||
|
||||
In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep), **Julien the bear has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again).
|
||||
|
||||
ADD GIF COOLOFF
|
||||
<figure>
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/cooloffsystem.gif" alt="Cool Off System"/>
|
||||
<figcaption>The agent needs to wait 0.5s before being able to shoot a snowball again</figcaption>
|
||||
|
||||
|
||||
## The reward function and the reward engineering problem
|
||||
|
||||
|
||||
Reference in New Issue
Block a user