mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-03-31 17:21:01 +08:00
24 lines
1.9 KiB
Plaintext
24 lines
1.9 KiB
Plaintext
# Introduction [[introduction]]
|
|
|
|
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/thumbnail.png" alt="Unit 8"/>
|
|
|
|
In Unit 6, we learned about Advantage Actor Critic (A2C), a hybrid architecture combining value-based and policy-based methods that helps to stabilize the training by reducing the variance with:
|
|
|
|
- *An Actor* that controls **how our agent behaves** (policy-based method).
|
|
- *A Critic* that measures **how good the action taken is** (value-based method).
|
|
|
|
Today we'll learn about Proximal Policy Optimization (PPO), an architecture that **improves our agent's training stability by avoiding policy updates that are too large**. To do that, we use a ratio that indicates the difference between our current and old policy and clip this ratio to a specific range \\( [1 - \epsilon, 1 + \epsilon] \\) .
|
|
|
|
Doing this will ensure **that our policy update will not be too large and that the training is more stable.**
|
|
|
|
This Unit is in two parts:
|
|
- In this first part, you'll learn the theory behind PPO and code your PPO agent from scratch using the [CleanRL](https://github.com/vwxyzjn/cleanrl) implementation. To test its robustness you'll use LunarLander-v2. LunarLander-v2 **is the first environment you used when you started this course**. At that time, you didn't know how PPO worked, and now, **you can code it from scratch and train it. How incredible is that 🤩**.
|
|
- In the second part, we'll get deeper into PPO optimization by using [Sample-Factory](https://samplefactory.dev/) and train an agent playing vizdoom (an open source version of Doom).
|
|
|
|
<figure>
|
|
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/environments.png" alt="Environment"/>
|
|
<figcaption>These are the environments you're going to use to train your agents: VizDoom environments</figcaption>
|
|
</figure>
|
|
|
|
Sound exciting? Let's get started! 🚀
|