diff --git a/units/en/unit5/pyramids.mdx b/units/en/unit5/pyramids.mdx index 28941a7..2b6cdf9 100644 --- a/units/en/unit5/pyramids.mdx +++ b/units/en/unit5/pyramids.mdx @@ -23,6 +23,8 @@ If you want to know more about curiosity, the next section (optional) will expla In terms of observation, we **use 148 raycasts that can each detect objects** (switch, bricks, golden brick, and walls.) + + We also use a **boolean variable indicating the switch state** (did we turn on or not the switch to spawn the Pyramid) and a vector that **contains the agent’s speed**. ADD SCREENSHOT CODE diff --git a/units/en/unit5/snowball-target.mdx b/units/en/unit5/snowball-target.mdx index e1061a5..a277b5d 100644 --- a/units/en/unit5/snowball-target.mdx +++ b/units/en/unit5/snowball-target.mdx @@ -13,7 +13,7 @@ In addition, to avoid "snowball spamming" (aka shooting a snowball every timeste
Cool Off System
The agent needs to wait 0.5s before being able to shoot a snowball again
- +
## The reward function and the reward engineering problem @@ -39,10 +39,7 @@ Think of raycasts as lasers that will detect if it passes through an object. In this environment our agent have multiple set of raycasts: - - - - TOOD ADD raycasts that can each detect objects (target, walls) and how much we have - +Raycasts ## The action space