# The Pyramid environment
The goal in this environment is to train our agent to **get the gold brick on the top of the Pyramid. To do that, it needs to press a button to spawn a Pyramid, navigate to the Pyramid, knock it over, and move to the gold brick at the top**.
## The reward function
The reward function is:
In terms of code, it looks like this
To train this new agent that seeks that button and then the Pyramid to destroy, we’ll use a combination of two types of rewards:
- The *extrinsic one* given by the environment (illustration above).
- But also an *intrinsic* one called **curiosity**. This second will **push our agent to be curious, or in other terms, to better explore its environment**.
If you want to know more about curiosity, the next section (optional) will explain the basics.
## The observation space
In terms of observation, we **use 148 raycasts that can each detect objects** (switch, bricks, golden brick, and walls.)
We also use a **boolean variable indicating the switch state** (did we turn on or off the switch to spawn the Pyramid) and a vector that **contains the agent’s speed**.
## The action space
The action space is **discrete** with four possible actions: