mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-03 10:38:27 +08:00
Update PG and add hands-on
This commit is contained in:
@@ -120,8 +120,6 @@
|
||||
title: The advantages and disadvantages of Policy-based methods
|
||||
- local: unit4/policy-gradient
|
||||
title: Diving deeper into Policy-gradient methods
|
||||
- local: unit4/reinforce
|
||||
title: The Reinforce algorithm
|
||||
- local: unit4/hands-on
|
||||
title: Hands-on
|
||||
- local: unit4/quiz
|
||||
|
||||
@@ -1 +1,33 @@
|
||||
# Hands on
|
||||
|
||||
|
||||
|
||||
<CourseFloatingBanner classNames="absolute z-10 right-0 top-0"
|
||||
notebooks={[
|
||||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/deep-rl-class/blob/main/notebooks/unit4/unit4.ipynb"}
|
||||
]}
|
||||
askForHelpUrl="http://hf.co/join/discord" />
|
||||
|
||||
|
||||
|
||||
Now that we studied the theory behind Reinforce, **you’re ready to code your Reinforce agent with PyTorch**. And you'll test its robustness using CartPole-v1 and PixelCopter,.
|
||||
|
||||
You'll then be able to iterate and improve this implementation for more advanced environments.
|
||||
|
||||
<figure class="image table text-center m-0 w-full">
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit6/envs.gif" alt="Environments"/>
|
||||
</figure>
|
||||
|
||||
|
||||
To validate this hands-on for the certification process, you need to push your trained models to the Hub.
|
||||
|
||||
- Get a result of >= 450 for `Cartpole-v1`.
|
||||
- Get a result of >= 5 for `PixelCopter`.
|
||||
|
||||
To find your result, go to the leaderboard and find your model, **the result = mean_reward - std of reward**. **If you don't see your model on the leaderboard, go at the bottom of the leaderboard page and click on the refresh button**.
|
||||
|
||||
For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process
|
||||
|
||||
**To start the hands-on click on Open In Colab button** 👇 :
|
||||
|
||||
[](https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/unit4/unit4.ipynb)
|
||||
|
||||
@@ -85,8 +85,8 @@ Fortunately we're going to use a solution called the Policy Gradient Theorem tha
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit6/policy_gradient_theorem.png" alt="Policy Gradient"/>
|
||||
|
||||
## The Reinforce algorithm
|
||||
The Reinforce algorithm is a policy-gradient algorithm that works like this:
|
||||
## The Reinforce algorithm (Monte Carlo Reinforce)
|
||||
The Reinforce algorithm also called Monte-Carlo policy-gradient is a policy-gradient algorithm that **uses an estimated return from an entire episode to update the policy parameter** \\(\theta\\):
|
||||
|
||||
In a loop:
|
||||
- Use the policy \\(\pi_\theta\\) to collect an episode \\(\tau\\)
|
||||
|
||||
@@ -1,20 +0,0 @@
|
||||
# Monte Carlo Policy Gradient (Reinforce)
|
||||
|
||||
|
||||
|
||||
Now that we have seen the big picture of Policy-Gradient and its advantages and disadvantages, **let's study and implement one of them**: Reinforce.
|
||||
|
||||
## Reinforce (Monte Carlo Policy Gradient)
|
||||
|
||||
Reinforce, also called Monte-Carlo Policy Gradient, **uses an estimated return from an entire episode to update the policy parameter** \\(\theta\\).
|
||||
|
||||
|
||||
Now that we studied the theory behind Reinforce, **you’re ready to code your Reinforce agent with PyTorch**. And you'll test its robustness using CartPole-v1, PixelCopter, and Pong.
|
||||
|
||||
Start the tutorial here 👉 https://colab.research.google.com/github/huggingface/deep-rl-class/blob/main/unit5/unit5.ipynb
|
||||
|
||||
The leaderboard to compare your results with your classmates 🏆 👉 https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard
|
||||
|
||||
<figure class="image table text-center m-0 w-full">
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit6/envs.gif" alt="Environments"/>
|
||||
</figure>
|
||||
Reference in New Issue
Block a user