Add illustrations

This commit is contained in:
simoninithomas
2023-01-07 10:46:46 +01:00
parent 759bf0d113
commit 92dc5ce8eb

View File

@@ -10,7 +10,10 @@ The goal in this environment is that Julien the bear **hit as many targets as po
In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep), **Julien the bear has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again).
ADD GIF COOLOFF
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit7/cooloffsystem.gif" alt="Cool Off System"/>
<figcaption>The agent needs to wait 0.5s before being able to shoot a snowball again</figcaption>
## The reward function and the reward engineering problem