diff --git a/units/en/unit5/pyramids.mdx b/units/en/unit5/pyramids.mdx index 8983692..c5d23f6 100644 --- a/units/en/unit5/pyramids.mdx +++ b/units/en/unit5/pyramids.mdx @@ -11,6 +11,9 @@ The reward function is: Pyramids Environment +In terms of code it looks like this +Pyramids Reward + To train this new agent that seeks that button and then the Pyramid to destroy, we’ll use a combination of two types of rewards: - The *extrinsic one* given by the environment (illustration above). @@ -26,11 +29,11 @@ In terms of observation, we **use 148 raycasts that can each detect objects** (s We also use a **boolean variable indicating the switch state** (did we turn on or not the switch to spawn the Pyramid) and a vector that **contains the agent’s speed**. -Pyramids obs code +Pyramids obs code ## The action space The action space is **discrete** with four possible actions: -Pyramids Environment +Pyramids Environment