mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-27 04:11:37 +08:00
64 lines
3.8 KiB
Plaintext
64 lines
3.8 KiB
Plaintext
# Introduction [[introduction]]
|
||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit6/thumbnail.png" alt="thumbnail"/>
|
||
|
||
In the last unit, we learned about Deep Q-Learning. In this value-based deep reinforcement learning algorithm, we **used a deep neural network to approximate the different Q-values for each possible action at a state.**
|
||
|
||
Indeed, since the beginning of the course, we only studied value-based methods, **where we estimate a value function as an intermediate step towards finding an optimal policy.**
|
||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/link-value-policy.jpg" alt="Link value policy" />
|
||
|
||
Because, in value-based, ** \\(π\\) exists only because of the action value estimates, since policy is just a function** (for instance, greedy-policy) that will select the action with the highest value given a state.
|
||
|
||
But, with policy-based methods, we want to optimize the policy directly **without having an intermediate step of learning a value function.**
|
||
|
||
So today, **we'll learn about policy-based methods, and we'll study a subset of these methods called Policy Gradients**. Then we'll implement our first policy gradient algorithm called Monte Carlo **Reinforce** from scratch using PyTorch. Before testing its robustness using CartPole-v1, PixelCopter, and Pong.
|
||
|
||
<figure class="image table text-center m-0 w-full">
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit6/envs.gif" alt="Environments"/>
|
||
</figure>
|
||
|
||
Let's get started,
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Now that we have seen the big picture of Policy-Gradient and its advantages and disadvantages, **let's study and implement one of them**: Reinforce.
|
||
|
||
## Reinforce (Monte Carlo Policy Gradient)
|
||
|
||
Reinforce, also called Monte-Carlo Policy Gradient, **uses an estimated return from an entire episode to update the policy parameter** \\(\theta\\).
|
||
|
||
|
||
Now that we studied the theory behind Reinforce, **you’re ready to code your Reinforce agent with PyTorch**. And you'll test its robustness using CartPole-v1, PixelCopter, and Pong.
|
||
|
||
Start the tutorial here 👉 https://colab.research.google.com/github/huggingface/deep-rl-class/blob/main/unit5/unit5.ipynb
|
||
|
||
The leaderboard to compare your results with your classmates 🏆 👉 https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard
|
||
|
||
<figure class="image table text-center m-0 w-full">
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit6/envs.gif" alt="Environments"/>
|
||
</figure>
|
||
---
|
||
|
||
Congrats on finishing this chapter! There was a lot of information. And congrats on finishing the tutorial. You’ve just coded your first Deep Reinforcement Learning agent from scratch using PyTorch and shared it on the Hub 🥳.
|
||
|
||
It's **normal if you still feel confused** with all these elements. **This was the same for me and for all people who studied RL.**
|
||
|
||
Take time to really grasp the material before continuing.
|
||
|
||
Don't hesitate to train your agent in other environments. The **best way to learn is to try things on your own!**
|
||
|
||
We published additional readings in the syllabus if you want to go deeper 👉 **[https://github.com/huggingface/deep-rl-class/blob/main/unit5/README.md](https://github.com/huggingface/deep-rl-class/blob/main/unit5/README.md)**
|
||
|
||
In the next unit, we’re going to learn about a combination of Policy-Based and value-based methods called Actor Critic Methods.
|
||
|
||
And don't forget to share with your friends who want to learn 🤗!
|
||
|
||
Finally, we want **to improve and update the course iteratively with your feedback**. If you have some, please fill this form 👉 **[https://forms.gle/3HgA7bEHwAmmLfwh9](https://forms.gle/3HgA7bEHwAmmLfwh9)**
|
||
|
||
### **Keep learning, stay awesome 🤗,**
|