Update introduction

2026-05-08 22:54:22 +08:00 · 2023-02-05 19:18:39 +01:00
parent f45d499947
commit 83b4b30e5d
1 changed files with 11 additions and 6 deletions
--- a/units/en/unit8/introduction.mdx
+++ b/units/en/unit8/introduction.mdx
@@ -1,6 +1,8 @@
 # Introduction [[introduction]]

-In the last Unit, we learned about Advantage Actor Critic (A2C), a hybrid architecture combining value-based and policy-based methods that help to stabilize the training by reducing the variance with:
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/thumbnail.png" alt="Unit 8"/>
+
+In Unit 6, we learned about Advantage Actor Critic (A2C), a hybrid architecture combining value-based and policy-based methods that help to stabilize the training by reducing the variance with:

 - *An Actor* that controls **how our agent behaves** (policy-based method).
 - *A Critic* that measures **how good the action taken is** (value-based method).
@@ -10,13 +12,16 @@ Today we'll learn about Proximal Policy Optimization (PPO), an architecture that
 Doing this will ensure **that our policy update will not be too large and that the training is more stable.**

 This Unit is in two parts:
- In this first part, you'll learn the theory behind PPO and use [CleanRL](https://github.com/vwxyzjn/cleanrl) to train your agent on TODO ADD
- In the second part, we'll get deeper into PPO optimization by using [Sample-Factory](https://samplefactory.dev/).
+- In this first part, you'll learn the theory behind PPO and use [CleanRL](https://github.com/vwxyzjn/cleanrl) to train an agent to learn to jump on platforms.
+- In the second part, we'll get deeper into PPO optimization by using [Sample-Factory](https://samplefactory.dev/) and train an agent playing vizdoom (an open source version of Doom).

-TODO ADD IMAGE TWO PARTS
+<figure>
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/environments.png" alt="Environment"/>
+<figcaption>This is the environments you're going to use to train your agents: VizDoom and GodotRL environments</figcaption>
+</figure>

-And then, after the theory, we'll train a PPO agent using . TODO ADD
+And then, after the theory, we'll train a PPO agent using CleanRL to jump on platforms.

-TODO: ADD ENVIRONMENTS
+<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/jumphard.gif" alt="Jump Hard"/>

 Sounds exciting? Let's get started! 🚀