Update introduction

This commit is contained in:
simoninithomas
2023-02-05 19:18:39 +01:00
parent f45d499947
commit 83b4b30e5d

View File

@@ -1,6 +1,8 @@
# Introduction [[introduction]]
In the last Unit, we learned about Advantage Actor Critic (A2C), a hybrid architecture combining value-based and policy-based methods that help to stabilize the training by reducing the variance with:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/thumbnail.png" alt="Unit 8"/>
In Unit 6, we learned about Advantage Actor Critic (A2C), a hybrid architecture combining value-based and policy-based methods that help to stabilize the training by reducing the variance with:
- *An Actor* that controls **how our agent behaves** (policy-based method).
- *A Critic* that measures **how good the action taken is** (value-based method).
@@ -10,13 +12,16 @@ Today we'll learn about Proximal Policy Optimization (PPO), an architecture that
Doing this will ensure **that our policy update will not be too large and that the training is more stable.**
This Unit is in two parts:
- In this first part, you'll learn the theory behind PPO and use [CleanRL](https://github.com/vwxyzjn/cleanrl) to train your agent on TODO ADD
- In the second part, we'll get deeper into PPO optimization by using [Sample-Factory](https://samplefactory.dev/).
- In this first part, you'll learn the theory behind PPO and use [CleanRL](https://github.com/vwxyzjn/cleanrl) to train an agent to learn to jump on platforms.
- In the second part, we'll get deeper into PPO optimization by using [Sample-Factory](https://samplefactory.dev/) and train an agent playing vizdoom (an open source version of Doom).
TODO ADD IMAGE TWO PARTS
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/environments.png" alt="Environment"/>
<figcaption>This is the environments you're going to use to train your agents: VizDoom and GodotRL environments</figcaption>
</figure>
And then, after the theory, we'll train a PPO agent using . TODO ADD
And then, after the theory, we'll train a PPO agent using CleanRL to jump on platforms.
TODO: ADD ENVIRONMENTS
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/jumphard.gif" alt="Jump Hard"/>
Sounds exciting? Let's get started! 🚀