# The Pyramid environment The goal in this environment is to train our agent to **get the gold brick on the top of the Pyramid. To do that, it needs to press a button to spawn a Pyramid, navigate to the Pyramid, knock it over, and move to the gold brick at the top**. Pyramids Environment ## The reward function The reward function is: Pyramids Environment In terms of code, it looks like this Pyramids Reward To train this new agent that seeks that button and then the Pyramid to destroy, we’ll use a combination of two types of rewards: - The *extrinsic one* given by the environment (illustration above). - But also an *intrinsic* one called **curiosity**. This second will **push our agent to be curious, or in other terms, to better explore its environment**. If you want to know more about curiosity, the next section (optional) will explain the basics. ## The observation space In terms of observation, we **use 148 raycasts that can each detect objects** (switch, bricks, golden brick, and walls.) We also use a **boolean variable indicating the switch state** (did we turn on or off the switch to spawn the Pyramid) and a vector that **contains the agent’s speed**. Pyramids obs code ## The action space The action space is **discrete** with four possible actions: Pyramids Environment