diff --git a/units/en/unit5/snowball-target.mdx b/units/en/unit5/snowball-target.mdx index 5101716..145d741 100644 --- a/units/en/unit5/snowball-target.mdx +++ b/units/en/unit5/snowball-target.mdx @@ -19,14 +19,15 @@ In addition, to avoid "snowball spamming" (aka shooting a snowball every timeste The reward function is simple. **The environment gives a +1 reward every time the agent's snowball hits a target** and because the agent's goal is to maximize the expected cumulative reward, **it will try to hit as many targets as possible**. -In terms of code it looks like this: - -Reward +Reward system We could have a more complex reward function (with a penalty to push the agent to go faster, etc.). But when you design an environment, you need to avoid the *reward engineering problem*, which is having a too complex reward function to force your agent to behave as you want it to do. Why? Because by doing that, **you might miss interesting strategies that the agent will find with a simpler reward function**. -Reward system +In terms of code it looks like this: + +Reward + ## The observation space