From 19c387665718d1752fccf67ed79bc4017bde768c Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 7 Jan 2023 17:54:05 +0100 Subject: [PATCH] Update pyramids.mdx --- units/en/unit5/pyramids.mdx | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/units/en/unit5/pyramids.mdx b/units/en/unit5/pyramids.mdx index 8983692..c5d23f6 100644 --- a/units/en/unit5/pyramids.mdx +++ b/units/en/unit5/pyramids.mdx @@ -11,6 +11,9 @@ The reward function is: Pyramids Environment +In terms of code it looks like this +Pyramids Reward + To train this new agent that seeks that button and then the Pyramid to destroy, we’ll use a combination of two types of rewards: - The *extrinsic one* given by the environment (illustration above). @@ -26,11 +29,11 @@ In terms of observation, we **use 148 raycasts that can each detect objects** (s We also use a **boolean variable indicating the switch state** (did we turn on or not the switch to spawn the Pyramid) and a vector that **contains the agent’s speed**. -Pyramids obs code +Pyramids obs code ## The action space The action space is **discrete** with four possible actions: -Pyramids Environment +Pyramids Environment