mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-02-03 02:14:53 +08:00
Various updates
This commit is contained in:
@@ -1,6 +1,8 @@
|
||||
# [The Hugging Face Deep Reinforcement Learning Course 🤗 (v2.0)](https://huggingface.co/deep-rl-course/unit0/introduction)
|
||||
|
||||
This repository contains the Deep Reinforcement Learning Course mdx files and notebooks. The website is here: https://huggingface.co/deep-rl-course/unit0/introduction?fw=pt
|
||||
If you like the course, don't hesitate to **⭐ star this repository. This helps us 🤗**.
|
||||
|
||||
This repository contains the Deep Reinforcement Learning Course mdx files and notebooks. **The website is here**: https://huggingface.co/deep-rl-course/unit0/introduction?fw=pt
|
||||
|
||||
- The syllabus 📚: https://simoninithomas.github.io/deep-rl-course
|
||||
|
||||
|
||||
@@ -17,6 +17,7 @@
|
||||
},
|
||||
"source": [
|
||||
"# DEPRECIATED NOTEBOOK, THE NEW UNIT 1 IS HERE: https://huggingface.co/deep-rl-course/unit1/introduction",
|
||||
"\n",
|
||||
"**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit1/introduction",
|
||||
"\n",
|
||||
"\n",
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
"# DEPRECIATED UNIT, THE NEW UNIT 2 IS HERE: https://huggingface.co/deep-rl-course/unit2/introduction
|
||||
# DEPRECIATED UNIT, THE NEW UNIT 2 IS HERE: https://huggingface.co/deep-rl-course/unit2/introduction
|
||||
**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit2/introduction"
|
||||
|
||||
|
||||
|
||||
@@ -16,8 +16,11 @@
|
||||
"id": "njb_ProuHiOe"
|
||||
},
|
||||
"source": [
|
||||
"# DEPRECIATED NOTEBOOK, THE NEW UNIT 2 IS HERE: https://huggingface.co/deep-rl-course/unit2/introduction
|
||||
**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit2/introduction"
|
||||
"# DEPRECIATED NOTEBOOK, THE NEW UNIT 2 IS HERE: https://huggingface.co/deep-rl-course/unit2/introduction",
|
||||
"\n",
|
||||
"**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit2/introduction",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Unit 2: Q-Learning with FrozenLake-v1 ⛄ and Taxi-v3 🚕\n",
|
||||
"\n",
|
||||
"In this notebook, **you'll code from scratch your first Reinforcement Learning agent** playing FrozenLake ❄️ using Q-Learning, share it to the community, and experiment with different configurations\n",
|
||||
|
||||
@@ -1,6 +1,10 @@
|
||||
# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit3/introduction
|
||||
**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit3/introduction
|
||||
|
||||
|
||||
# Unit 3: Deep Q-Learning with Atari Games 👾
|
||||
|
||||
In this Unit, **we'll study our first Deep Reinforcement Learning agent**: Deep Q-Learning.
|
||||
In this Unit, **we'll study our first Deep Reinforcement Learning agent**: Deep Q-Learning.
|
||||
|
||||
And **we'll train it to play Space Invaders and other Atari environments using [RL-Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)**, a training framework for RL using Stable-Baselines that provides scripts for training, evaluating agents, tuning hyperparameters, plotting results, and recording videos.
|
||||
|
||||
|
||||
@@ -16,6 +16,11 @@
|
||||
"id": "k7xBVPzoXxOg"
|
||||
},
|
||||
"source": [
|
||||
"# DEPRECIATED NOTEBOOK, THE NEW UNIT 3 IS HERE: https://huggingface.co/deep-rl-course/unit3/introduction",
|
||||
"\n",
|
||||
"**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit3/introduction",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Unit 3: Deep Q-Learning with Atari Games 👾 using RL Baselines3 Zoo\n",
|
||||
"\n",
|
||||
"In this notebook, **you'll train a Deep Q-Learning agent** playing Space Invaders using [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), a training framework based on [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) that provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.\n",
|
||||
|
||||
@@ -1,3 +1,7 @@
|
||||
# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit5/introduction
|
||||
**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit5/introduction
|
||||
|
||||
|
||||
# Unit 4: An Introduction to Unity MLAgents with Hugging Face 🤗
|
||||

|
||||
|
||||
|
||||
@@ -16,6 +16,11 @@
|
||||
"id": "2D3NL_e4crQv"
|
||||
},
|
||||
"source": [
|
||||
"# DEPRECIATED NOTEBOOK, THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit5/introduction",
|
||||
"\n",
|
||||
"**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit5/introduction",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Unit 4: Let's learn about Unity ML-Agents with Hugging Face 🤗\n",
|
||||
"\n"
|
||||
]
|
||||
@@ -561,4 +566,4 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,6 +1,10 @@
|
||||
# Unit 5: Policy Gradient with PyTorch
|
||||
# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit4/introduction
|
||||
**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit4/introduction
|
||||
|
||||
In this Unit, **we'll study Policy Gradient Methods**.
|
||||
|
||||
# Unit 5: Policy Gradient with PyTorch
|
||||
|
||||
In this Unit, **we'll study Policy Gradient Methods**.
|
||||
|
||||
And we'll **implement Reinforce (a policy gradient method) from scratch using PyTorch**. Before testing its robustness using CartPole-v1, PixelCopter, and Pong.
|
||||
|
||||
|
||||
@@ -16,6 +16,11 @@
|
||||
"id": "CjRWziAVU2lZ"
|
||||
},
|
||||
"source": [
|
||||
"# DEPRECIATED NOTEBOOK, THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit4/introduction",
|
||||
"\n",
|
||||
"**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit4/introduction",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Unit 5: Code your first Deep Reinforcement Learning Algorithm with PyTorch: Reinforce. And test its robustness 💪\n",
|
||||
"In this notebook, you'll code your first Deep Reinforcement Learning algorithm from scratch: Reinforce (also called Monte Carlo Policy Gradient).\n",
|
||||
"\n",
|
||||
|
||||
@@ -1,3 +1,7 @@
|
||||
# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit6/introduction
|
||||
**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit6/introduction
|
||||
|
||||
|
||||
# Unit 7: Advantage Actor Critic (A2C) using Robotics Simulations with PyBullet 🤖
|
||||
|
||||
One of the major industries that use Reinforcement Learning is robotics. Unfortunately, **having access to robot equipment is very expensive**. Fortunately, some simulations exist to train Robots:
|
||||
@@ -32,7 +36,7 @@ Thanks to a leaderboard, you'll be able to compare your results with other class
|
||||
The leaderboard 👉 https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard
|
||||
|
||||
## Additional readings 📚
|
||||
- [Making Sense of the Bias / Variance Trade-off in (Deep) Reinforcement Learning](https://blog.mlreview.com/making-sense-of-the-bias-variance-trade-off-in-deep-reinforcement-learning-79cf1e83d565)
|
||||
- [Making Sense of the Bias / Variance Trade-off in (Deep) Reinforcement Learning](https://blog.mlreview.com/making-sense-of-the-bias-variance-trade-off-in-deep-reinforcement-learning-79cf1e83d565)
|
||||
- [Bias-variance Tradeoff in Reinforcement Learning](https://www.endtoend.ai/blog/bias-variance-tradeoff-in-reinforcement-learning/)
|
||||
- [Foundations of Deep RL Series, L3 Policy Gradients and Advantage Estimation by Pieter Abbeel](https://youtu.be/AKbX1Zvo7r8)
|
||||
|
||||
|
||||
@@ -34,6 +34,11 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# DEPRECIATED NOTEBOOK, THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit6/introduction",
|
||||
"\n",
|
||||
"**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit6/introduction",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Unit 7: Advantage Actor Critic (A2C) using Robotics Simulations with PyBullet 🤖\n",
|
||||
"In this small notebook you'll learn to use A2C with PyBullet. And train an agent to walk. More precisely a spider (they say Ant but come on... it's a spider 😆) 🕸️\n",
|
||||
"\n",
|
||||
@@ -533,4 +538,4 @@
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,3 +1,7 @@
|
||||
# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit8/introduction
|
||||
**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit8/introduction
|
||||
|
||||
|
||||
# Unit 8: Proximal Policy Optimization (PPO) with PyTorch
|
||||
|
||||
Today we'll learn about Proximal Policy Optimization (PPO), an architecture that improves our agent's training stability by avoiding too large policy updates. To do that, we use a ratio that will indicates the difference between our current and old policy and clip this ratio from a specific range $[1 - \epsilon, 1 + \epsilon]$. Doing this will ensure that our policy update will not be too large and that the training is more stable.
|
||||
@@ -29,7 +33,7 @@ Thanks to a leaderboard, you'll be able to compare your results with other class
|
||||
The leaderboard 👉 https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard
|
||||
|
||||
## Additional readings 📚
|
||||
- [Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization by Daniel Bick](https://fse.studenttheses.ub.rug.nl/25709/1/mAI_2021_BickD.pdf)
|
||||
- [Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization by Daniel Bick](https://fse.studenttheses.ub.rug.nl/25709/1/mAI_2021_BickD.pdf)
|
||||
- [What is the way to understand Proximal Policy Optimization Algorithm in RL?](https://stackoverflow.com/questions/46422845/what-is-the-way-to-understand-proximal-policy-optimization-algorithm-in-rl)
|
||||
- [Foundations of Deep RL Series, L4 TRPO and PPO by Pieter Abbeel](https://youtu.be/KjWF8VIMGiY)
|
||||
- [OpenAI PPO Blogpost](https://openai.com/blog/openai-baselines-ppo/)
|
||||
|
||||
@@ -16,6 +16,11 @@
|
||||
"id": "-cf5-oDPjwf8"
|
||||
},
|
||||
"source": [
|
||||
"# DEPRECIATED NOTEBOOK, THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit8/introduction",
|
||||
"\n",
|
||||
"**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unit8/introduction",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Unit 8: Proximal Policy Gradient (PPO) with PyTorch 🤖\n",
|
||||
"\n",
|
||||
"In this unit, you'll learn to **code your PPO agent from scratch with PyTorch**.\n",
|
||||
|
||||
@@ -1,8 +1,13 @@
|
||||
# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unitbonus3/decision-transformers
|
||||
**Everything under is depreciated** 👇, the new version of the course is here: https://huggingface.co/deep-rl-course/unitbonus3/decision-transformers
|
||||
|
||||
|
||||
|
||||
# Unit 9: Decision Transformers and offline Reinforcement Learning 🤖
|
||||
|
||||

|
||||
|
||||
In this Unit, you'll learn what is Decision Transformer and Offline Reinforcement Learning. And then, you’ll train your first Offline Decision Transformer model from scratch to make a half-cheetah run.
|
||||
In this Unit, you'll learn what is Decision Transformer and Offline Reinforcement Learning. And then, you’ll train your first Offline Decision Transformer model from scratch to make a half-cheetah run.
|
||||
|
||||
This course is **self-paced**, you can start whenever you want.
|
||||
|
||||
@@ -18,12 +23,12 @@ Here are the steps for this Unit:
|
||||
|
||||
2️⃣ 👩💻 Then dive on the first hands-on.
|
||||
👩💻 The hands-on 👉 [](https://colab.research.google.com/drive/1K3UuajwoPY1MzRKNkONNRS3gS5DxZ-qF?usp=sharing)
|
||||
|
||||
|
||||
3️⃣ 📖 Read [Train your first Decision Transformer](https://huggingface.co/blog/train-decision-transformers)
|
||||
|
||||
4️⃣ 👩💻 Then dive on the hands-on, where **you’ll train your first Offline Decision Transformer model from scratch to make a half-cheetah run**.
|
||||
4️⃣ 👩💻 Then dive on the hands-on, where **you’ll train your first Offline Decision Transformer model from scratch to make a half-cheetah run**.
|
||||
👩💻 The hands-on 👉 https://github.com/huggingface/blog/blob/main/notebooks/101_train-decision-transformers.ipynb
|
||||
|
||||
|
||||
## How to make the most of this course
|
||||
|
||||
To make the most of the course, my advice is to:
|
||||
|
||||
@@ -9,11 +9,11 @@ Discord is a free chat platform. If you've used Slack, **it's quite similar**. T
|
||||
|
||||
Starting in Discord can be a bit intimidating, so let me take you through it.
|
||||
|
||||
When you sign-up to our Discord server, you'll need to specify which topics you're interested in by **clicking #role-assignment at the left**.
|
||||
When you sign-up to our Discord server, you'll need to specify which topics you're interested in by **clicking #role-assignment at the left**.
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/discord1.jpg" alt="Discord"/>
|
||||
|
||||
In #role-assignment, you can pick different categories. Make sure to **click "Reinforcement Learning"**. You'll then get to **introduce yourself in the `#introduction-yourself` channel**.
|
||||
In #role-assignment, you can pick different categories. Make sure to **click "Reinforcement Learning"**. You'll then get to **introduce yourself in the `#introduce-yourself` channel**.
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/discord2.jpg" alt="Discord"/>
|
||||
|
||||
|
||||
@@ -43,13 +43,6 @@ You can either do this hands-on by reading the notebook or following it with the
|
||||
|
||||
In this notebook, you'll train your **first Deep Reinforcement Learning agent** a Lunar Lander agent that will learn to **land correctly on the Moon 🌕**. Using [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) a Deep Reinforcement Learning library, share them with the community, and experiment with different configurations
|
||||
|
||||
⬇️ Here is an example of what **you will achieve in just a couple of minutes.** ⬇️
|
||||
|
||||
```python
|
||||
%%html
|
||||
<video controls autoplay><source src="https://huggingface.co/ThomasSimonini/ppo-LunarLander-v2/resolve/main/replay.mp4" type="video/mp4"></video>
|
||||
```
|
||||
|
||||
### The environment 🎮
|
||||
|
||||
- [LunarLander-v2](https://www.gymlibrary.dev/environments/box2d/lunar_lander/)
|
||||
@@ -92,7 +85,7 @@ Before diving into the notebook, you need to:
|
||||
|
||||
🔲 📝 **Read Unit 0** that gives you all the **information about the course and help you to onboard** 🤗
|
||||
|
||||
🔲 📚 **Develop an understanding of the foundations of Reinforcement learning** (MC, TD, Rewards hypothesis...) by doing Unit 1
|
||||
🔲 📚 **Develop an understanding of the foundations of Reinforcement learning** by reading Unit 1
|
||||
|
||||
## A small recap of what is Deep Reinforcement Learning 📚
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/RL_process_game.jpg" alt="The RL process" width="100%">
|
||||
|
||||
@@ -22,6 +22,6 @@ It's essential **to master these elements** before diving into implementing Dee
|
||||
|
||||
After this unit, in a bonus unit, you'll be **able to train Huggy the Dog 🐶 to fetch the stick and play with him 🤗**.
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/huggy.jpg" alt="Huggy"/>
|
||||
<video src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/huggy.mp4" type="video/mp4" controls autoplay loop />
|
||||
|
||||
So let's get started! 🚀
|
||||
|
||||
Reference in New Issue
Block a user