Update snowball-target.mdx

2026-04-01 09:40:26 +08:00 · 2023-01-06 18:01:33 +01:00
parent 583462ff23
commit fb12b509ef
1 changed files with 5 additions and 4 deletions
--- a/units/en/unit5/snowball-target.mdx
+++ b/units/en/unit5/snowball-target.mdx
@@ -1,17 +1,18 @@
 # The SnowballTarget Environment

 ## The Agent's Goal
+
 The first agent you're going to train is Julien the bear (the name is based after our [CTO Julien Chaumond](https://twitter.com/julien_c)) **to hit targets with snowballs**.

-The goal in this environment is that Julien the bear **hit as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot.
-**. In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep),**Julien the bear has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again).
+The goal in this environment is that Julien the bear **hit as many targets as possible in the limited time** (1000 timesteps). To do that, it will need **to place itself correctly from the target and shoot**. In addition, to avoid "snowball spamming" (aka shooting a snowball every timestep),**Julien the bear has a "cool off" system** (it needs to wait 0.5 seconds after a shoot to be able to shoot again).

 ## The reward function and the reward engineering problem
-The reward function is simple. The environment gives a +1 reward every time the agent hits a target.
+
+The reward function is simple. **The environment gives a +1 reward every time the agent hits a target**.
 Because the agent's goal is to maximize the expected cumulative reward, it will try to hit as many targets as possible.

 We could have a more complex reward function (with a penalty to push the agent to go faster, etc.). But when you design an environment, you need to avoid the *reward engineering problem*, which is having a too complex reward function to force your agent to behave as you want it to do.
-Why? Because by doing that, you might miss interesting strategies that the agent will find with a simpler reward function.
+Why? Because by doing that, **you might miss interesting strategies that the agent will find with a simpler reward function**.

 TODO ADD IMAGE REWARD