mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-05-08 22:54:22 +08:00
Update introduction
This commit is contained in:
@@ -1,6 +1,8 @@
|
||||
# Introduction [[introduction]]
|
||||
|
||||
In the last Unit, we learned about Advantage Actor Critic (A2C), a hybrid architecture combining value-based and policy-based methods that help to stabilize the training by reducing the variance with:
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/thumbnail.png" alt="Unit 8"/>
|
||||
|
||||
In Unit 6, we learned about Advantage Actor Critic (A2C), a hybrid architecture combining value-based and policy-based methods that help to stabilize the training by reducing the variance with:
|
||||
|
||||
- *An Actor* that controls **how our agent behaves** (policy-based method).
|
||||
- *A Critic* that measures **how good the action taken is** (value-based method).
|
||||
@@ -10,13 +12,16 @@ Today we'll learn about Proximal Policy Optimization (PPO), an architecture that
|
||||
Doing this will ensure **that our policy update will not be too large and that the training is more stable.**
|
||||
|
||||
This Unit is in two parts:
|
||||
- In this first part, you'll learn the theory behind PPO and use [CleanRL](https://github.com/vwxyzjn/cleanrl) to train your agent on TODO ADD
|
||||
- In the second part, we'll get deeper into PPO optimization by using [Sample-Factory](https://samplefactory.dev/).
|
||||
- In this first part, you'll learn the theory behind PPO and use [CleanRL](https://github.com/vwxyzjn/cleanrl) to train an agent to learn to jump on platforms.
|
||||
- In the second part, we'll get deeper into PPO optimization by using [Sample-Factory](https://samplefactory.dev/) and train an agent playing vizdoom (an open source version of Doom).
|
||||
|
||||
TODO ADD IMAGE TWO PARTS
|
||||
<figure>
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit10/environments.png" alt="Environment"/>
|
||||
<figcaption>This is the environments you're going to use to train your agents: VizDoom and GodotRL environments</figcaption>
|
||||
</figure>
|
||||
|
||||
And then, after the theory, we'll train a PPO agent using . TODO ADD
|
||||
And then, after the theory, we'll train a PPO agent using CleanRL to jump on platforms.
|
||||
|
||||
TODO: ADD ENVIRONMENTS
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/jumphard.gif" alt="Jump Hard"/>
|
||||
|
||||
Sounds exciting? Let's get started! 🚀
|
||||
|
||||
Reference in New Issue
Block a user