diff --git a/units/en/unit8/introduction.mdx b/units/en/unit8/introduction.mdx index 08a091a..9843948 100644 --- a/units/en/unit8/introduction.mdx +++ b/units/en/unit8/introduction.mdx @@ -1,6 +1,8 @@ # Introduction [[introduction]] -In the last Unit, we learned about Advantage Actor Critic (A2C), a hybrid architecture combining value-based and policy-based methods that help to stabilize the training by reducing the variance with: +Unit 8 + +In Unit 6, we learned about Advantage Actor Critic (A2C), a hybrid architecture combining value-based and policy-based methods that help to stabilize the training by reducing the variance with: - *An Actor* that controls **how our agent behaves** (policy-based method). - *A Critic* that measures **how good the action taken is** (value-based method). @@ -10,13 +12,16 @@ Today we'll learn about Proximal Policy Optimization (PPO), an architecture that Doing this will ensure **that our policy update will not be too large and that the training is more stable.** This Unit is in two parts: -- In this first part, you'll learn the theory behind PPO and use [CleanRL](https://github.com/vwxyzjn/cleanrl) to train your agent on TODO ADD -- In the second part, we'll get deeper into PPO optimization by using [Sample-Factory](https://samplefactory.dev/). +- In this first part, you'll learn the theory behind PPO and use [CleanRL](https://github.com/vwxyzjn/cleanrl) to train an agent to learn to jump on platforms. +- In the second part, we'll get deeper into PPO optimization by using [Sample-Factory](https://samplefactory.dev/) and train an agent playing vizdoom (an open source version of Doom). -TODO ADD IMAGE TWO PARTS +
+Environment +
This is the environments you're going to use to train your agents: VizDoom and GodotRL environments
+
-And then, after the theory, we'll train a PPO agent using . TODO ADD +And then, after the theory, we'll train a PPO agent using CleanRL to jump on platforms. -TODO: ADD ENVIRONMENTS +Jump Hard Sounds exciting? Let's get started! 🚀