diff --git a/README.md b/README.md index ee3f53c..d9cd483 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,8 @@ # [The Hugging Face Deep Reinforcement Learning Course ๐Ÿค— (v2.0)](https://huggingface.co/deep-rl-course/unit0/introduction) -This repository contains the Deep Reinforcement Learning Course mdx files and notebooks. The website is here: https://huggingface.co/deep-rl-course/unit0/introduction?fw=pt +If you like the course, don't hesitate to **โญ star this repository. This helps us ๐Ÿค—**. + +This repository contains the Deep Reinforcement Learning Course mdx files and notebooks. **The website is here**: https://huggingface.co/deep-rl-course/unit0/introduction?fw=pt - The syllabus ๐Ÿ“š: https://simoninithomas.github.io/deep-rl-course diff --git a/unit1/unit1.ipynb b/unit1/unit1.ipynb index 34d6a09..49eb1b0 100644 --- a/unit1/unit1.ipynb +++ b/unit1/unit1.ipynb @@ -17,6 +17,7 @@ }, "source": [ "# DEPRECIATED NOTEBOOK, THE NEW UNIT 1 IS HERE: https://huggingface.co/deep-rl-course/unit1/introduction", + "\n", "**Everything under is depreciated** ๐Ÿ‘‡, the new version of the course is here: https://huggingface.co/deep-rl-course/unit1/introduction", "\n", "\n", diff --git a/unit2/README.md b/unit2/README.md index d546e7f..ff6682d 100644 --- a/unit2/README.md +++ b/unit2/README.md @@ -1,4 +1,4 @@ -"# DEPRECIATED UNIT, THE NEW UNIT 2 IS HERE: https://huggingface.co/deep-rl-course/unit2/introduction +# DEPRECIATED UNIT, THE NEW UNIT 2 IS HERE: https://huggingface.co/deep-rl-course/unit2/introduction **Everything under is depreciated** ๐Ÿ‘‡, the new version of the course is here: https://huggingface.co/deep-rl-course/unit2/introduction" diff --git a/unit2/unit2.ipynb b/unit2/unit2.ipynb index c11b5bb..c6119cb 100644 --- a/unit2/unit2.ipynb +++ b/unit2/unit2.ipynb @@ -16,8 +16,11 @@ "id": "njb_ProuHiOe" }, "source": [ - "# DEPRECIATED NOTEBOOK, THE NEW UNIT 2 IS HERE: https://huggingface.co/deep-rl-course/unit2/introduction - **Everything under is depreciated** ๐Ÿ‘‡, the new version of the course is here: https://huggingface.co/deep-rl-course/unit2/introduction" + "# DEPRECIATED NOTEBOOK, THE NEW UNIT 2 IS HERE: https://huggingface.co/deep-rl-course/unit2/introduction", + "\n", + "**Everything under is depreciated** ๐Ÿ‘‡, the new version of the course is here: https://huggingface.co/deep-rl-course/unit2/introduction", + "\n", + "\n", "# Unit 2: Q-Learning with FrozenLake-v1 โ›„ and Taxi-v3 ๐Ÿš•\n", "\n", "In this notebook, **you'll code from scratch your first Reinforcement Learning agent** playing FrozenLake โ„๏ธ using Q-Learning, share it to the community, and experiment with different configurations\n", diff --git a/unit3/README.md b/unit3/README.md index 07f17c3..897d8f5 100644 --- a/unit3/README.md +++ b/unit3/README.md @@ -1,6 +1,10 @@ +# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit3/introduction +**Everything under is depreciated** ๐Ÿ‘‡, the new version of the course is here: https://huggingface.co/deep-rl-course/unit3/introduction + + # Unit 3: Deep Q-Learning with Atari Games ๐Ÿ‘พ -In this Unit, **we'll study our first Deep Reinforcement Learning agent**: Deep Q-Learning. +In this Unit, **we'll study our first Deep Reinforcement Learning agent**: Deep Q-Learning. And **we'll train it to play Space Invaders and other Atari environments using [RL-Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)**, a training framework for RL using Stable-Baselines that provides scripts for training, evaluating agents, tuning hyperparameters, plotting results, and recording videos. diff --git a/unit3/unit3.ipynb b/unit3/unit3.ipynb index 9657cce..fead1ad 100644 --- a/unit3/unit3.ipynb +++ b/unit3/unit3.ipynb @@ -16,6 +16,11 @@ "id": "k7xBVPzoXxOg" }, "source": [ + "# DEPRECIATED NOTEBOOK, THE NEW UNIT 3 IS HERE: https://huggingface.co/deep-rl-course/unit3/introduction", + "\n", + "**Everything under is depreciated** ๐Ÿ‘‡, the new version of the course is here: https://huggingface.co/deep-rl-course/unit3/introduction", + "\n", + "\n", "# Unit 3: Deep Q-Learning with Atari Games ๐Ÿ‘พ using RL Baselines3 Zoo\n", "\n", "In this notebook, **you'll train a Deep Q-Learning agent** playing Space Invaders using [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), a training framework based on [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) that provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.\n", diff --git a/unit4/README.md b/unit4/README.md index 2de68c8..9bf4fca 100644 --- a/unit4/README.md +++ b/unit4/README.md @@ -1,3 +1,7 @@ +# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit5/introduction +**Everything under is depreciated** ๐Ÿ‘‡, the new version of the course is here: https://huggingface.co/deep-rl-course/unit5/introduction + + # Unit 4: An Introduction to Unity MLAgents with Hugging Face ๐Ÿค— ![cover](https://miro.medium.com/max/1400/1*8DV9EFl-vdijvcTHilHuEw.png) diff --git a/unit4/unit4.ipynb b/unit4/unit4.ipynb index 79e53aa..9232ad6 100644 --- a/unit4/unit4.ipynb +++ b/unit4/unit4.ipynb @@ -16,6 +16,11 @@ "id": "2D3NL_e4crQv" }, "source": [ + "# DEPRECIATED NOTEBOOK, THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit5/introduction", + "\n", + "**Everything under is depreciated** ๐Ÿ‘‡, the new version of the course is here: https://huggingface.co/deep-rl-course/unit5/introduction", + "\n", + "\n", "# Unit 4: Let's learn about Unity ML-Agents with Hugging Face ๐Ÿค—\n", "\n" ] @@ -561,4 +566,4 @@ }, "nbformat": 4, "nbformat_minor": 0 -} \ No newline at end of file +} diff --git a/unit5/README.md b/unit5/README.md index 2b50dbe..757f089 100644 --- a/unit5/README.md +++ b/unit5/README.md @@ -1,6 +1,10 @@ -# Unit 5: Policy Gradient with PyTorch +# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit4/introduction +**Everything under is depreciated** ๐Ÿ‘‡, the new version of the course is here: https://huggingface.co/deep-rl-course/unit4/introduction -In this Unit, **we'll study Policy Gradient Methods**. + +# Unit 5: Policy Gradient with PyTorch + +In this Unit, **we'll study Policy Gradient Methods**. And we'll **implement Reinforce (a policy gradient method) from scratch using PyTorch**. Before testing its robustness using CartPole-v1, PixelCopter, and Pong. diff --git a/unit5/unit5.ipynb b/unit5/unit5.ipynb index b23e2e5..85de7b5 100644 --- a/unit5/unit5.ipynb +++ b/unit5/unit5.ipynb @@ -16,6 +16,11 @@ "id": "CjRWziAVU2lZ" }, "source": [ + "# DEPRECIATED NOTEBOOK, THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit4/introduction", + "\n", + "**Everything under is depreciated** ๐Ÿ‘‡, the new version of the course is here: https://huggingface.co/deep-rl-course/unit4/introduction", + "\n", + "\n", "# Unit 5: Code your first Deep Reinforcement Learning Algorithm with PyTorch: Reinforce. And test its robustness ๐Ÿ’ช\n", "In this notebook, you'll code your first Deep Reinforcement Learning algorithm from scratch: Reinforce (also called Monte Carlo Policy Gradient).\n", "\n", diff --git a/unit7/README.md b/unit7/README.md index ded4fdc..d09bbdb 100644 --- a/unit7/README.md +++ b/unit7/README.md @@ -1,3 +1,7 @@ +# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit6/introduction +**Everything under is depreciated** ๐Ÿ‘‡, the new version of the course is here: https://huggingface.co/deep-rl-course/unit6/introduction + + # Unit 7: Advantage Actor Critic (A2C) using Robotics Simulations with PyBullet ๐Ÿค– One of the major industries that use Reinforcement Learning is robotics. Unfortunately, **having access to robot equipment is very expensive**. Fortunately, some simulations exist to train Robots: @@ -32,7 +36,7 @@ Thanks to a leaderboard, you'll be able to compare your results with other class The leaderboard ๐Ÿ‘‰ https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard ## Additional readings ๐Ÿ“š -- [Making Sense of the Bias / Variance Trade-off in (Deep) Reinforcement Learning](https://blog.mlreview.com/making-sense-of-the-bias-variance-trade-off-in-deep-reinforcement-learning-79cf1e83d565) +- [Making Sense of the Bias / Variance Trade-off in (Deep) Reinforcement Learning](https://blog.mlreview.com/making-sense-of-the-bias-variance-trade-off-in-deep-reinforcement-learning-79cf1e83d565) - [Bias-variance Tradeoff in Reinforcement Learning](https://www.endtoend.ai/blog/bias-variance-tradeoff-in-reinforcement-learning/) - [Foundations of Deep RL Series, L3 Policy Gradients and Advantage Estimation by Pieter Abbeel](https://youtu.be/AKbX1Zvo7r8) diff --git a/unit7/unit7.ipynb b/unit7/unit7.ipynb index 31e9f2e..f87fa02 100644 --- a/unit7/unit7.ipynb +++ b/unit7/unit7.ipynb @@ -34,6 +34,11 @@ { "cell_type": "markdown", "source": [ + "# DEPRECIATED NOTEBOOK, THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit6/introduction", + "\n", + "**Everything under is depreciated** ๐Ÿ‘‡, the new version of the course is here: https://huggingface.co/deep-rl-course/unit6/introduction", + "\n", + "\n", "# Unit 7: Advantage Actor Critic (A2C) using Robotics Simulations with PyBullet ๐Ÿค–\n", "In this small notebook you'll learn to use A2C with PyBullet. And train an agent to walk. More precisely a spider (they say Ant but come on... it's a spider ๐Ÿ˜†) ๐Ÿ•ธ๏ธ\n", "\n", @@ -533,4 +538,4 @@ } } ] -} \ No newline at end of file +} diff --git a/unit8/README.md b/unit8/README.md index 61664da..1280536 100644 --- a/unit8/README.md +++ b/unit8/README.md @@ -1,3 +1,7 @@ +# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit8/introduction +**Everything under is depreciated** ๐Ÿ‘‡, the new version of the course is here: https://huggingface.co/deep-rl-course/unit8/introduction + + # Unit 8: Proximal Policy Optimization (PPO) with PyTorch Today we'll learn about Proximal Policy Optimization (PPO), an architecture that improves our agent's training stability by avoiding too large policy updates. To do that, we use a ratio that will indicates the difference between our current and old policy and clip this ratio from a specific range $[1 - \epsilon, 1 + \epsilon]$. Doing this will ensure that our policy update will not be too large and that the training is more stable. @@ -29,7 +33,7 @@ Thanks to a leaderboard, you'll be able to compare your results with other class The leaderboard ๐Ÿ‘‰ https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard ## Additional readings ๐Ÿ“š -- [Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization by Daniel Bick](https://fse.studenttheses.ub.rug.nl/25709/1/mAI_2021_BickD.pdf) +- [Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization by Daniel Bick](https://fse.studenttheses.ub.rug.nl/25709/1/mAI_2021_BickD.pdf) - [What is the way to understand Proximal Policy Optimization Algorithm in RL?](https://stackoverflow.com/questions/46422845/what-is-the-way-to-understand-proximal-policy-optimization-algorithm-in-rl) - [Foundations of Deep RL Series, L4 TRPO and PPO by Pieter Abbeel](https://youtu.be/KjWF8VIMGiY) - [OpenAI PPO Blogpost](https://openai.com/blog/openai-baselines-ppo/) diff --git a/unit8/unit8.ipynb b/unit8/unit8.ipynb index 2bbd7e1..5a177dd 100644 --- a/unit8/unit8.ipynb +++ b/unit8/unit8.ipynb @@ -16,6 +16,11 @@ "id": "-cf5-oDPjwf8" }, "source": [ + "# DEPRECIATED NOTEBOOK, THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unit8/introduction", + "\n", + "**Everything under is depreciated** ๐Ÿ‘‡, the new version of the course is here: https://huggingface.co/deep-rl-course/unit8/introduction", + "\n", + "\n", "# Unit 8: Proximal Policy Gradient (PPO) with PyTorch ๐Ÿค–\n", "\n", "In this unit, you'll learn to **code your PPO agent from scratch with PyTorch**.\n", diff --git a/unit9/README.md b/unit9/README.md index 051fe94..7770e3c 100644 --- a/unit9/README.md +++ b/unit9/README.md @@ -1,8 +1,13 @@ +# DEPRECIATED THE NEW VERSION OF THIS UNIT IS HERE: https://huggingface.co/deep-rl-course/unitbonus3/decision-transformers +**Everything under is depreciated** ๐Ÿ‘‡, the new version of the course is here: https://huggingface.co/deep-rl-course/unitbonus3/decision-transformers + + + # Unit 9: Decision Transformers and offline Reinforcement Learning ๐Ÿค– ![cover](assets/img/thumbnail.gif) -In this Unit, you'll learn what is Decision Transformer and Offline Reinforcement Learning. And then, youโ€™ll train your first Offline Decision Transformer model from scratch to make a half-cheetah run. +In this Unit, you'll learn what is Decision Transformer and Offline Reinforcement Learning. And then, youโ€™ll train your first Offline Decision Transformer model from scratch to make a half-cheetah run. This course is **self-paced**, you can start whenever you want. @@ -18,12 +23,12 @@ Here are the steps for this Unit: 2๏ธโƒฃ ๐Ÿ‘ฉโ€๐Ÿ’ป Then dive on the first hands-on. ๐Ÿ‘ฉโ€๐Ÿ’ป The hands-on ๐Ÿ‘‰ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1K3UuajwoPY1MzRKNkONNRS3gS5DxZ-qF?usp=sharing) - + 3๏ธโƒฃ ๐Ÿ“– Read [Train your first Decision Transformer](https://huggingface.co/blog/train-decision-transformers) -4๏ธโƒฃ ๐Ÿ‘ฉโ€๐Ÿ’ป Then dive on the hands-on, where **youโ€™ll train your first Offline Decision Transformer model from scratch to make a half-cheetah run**. +4๏ธโƒฃ ๐Ÿ‘ฉโ€๐Ÿ’ป Then dive on the hands-on, where **youโ€™ll train your first Offline Decision Transformer model from scratch to make a half-cheetah run**. ๐Ÿ‘ฉโ€๐Ÿ’ป The hands-on ๐Ÿ‘‰ https://github.com/huggingface/blog/blob/main/notebooks/101_train-decision-transformers.ipynb - + ## How to make the most of this course To make the most of the course, my advice is to: diff --git a/units/en/unit0/discord101.mdx b/units/en/unit0/discord101.mdx index 9904168..e3bda2c 100644 --- a/units/en/unit0/discord101.mdx +++ b/units/en/unit0/discord101.mdx @@ -9,11 +9,11 @@ Discord is a free chat platform. If you've used Slack, **it's quite similar**. T Starting in Discord can be a bit intimidating, so let me take you through it. -When you sign-up to our Discord server, you'll need to specify which topics you're interested in by **clicking #role-assignment at the left**. +When you sign-up to our Discord server, you'll need to specify which topics you're interested in by **clicking #role-assignment at the left**. Discord -In #role-assignment, you can pick different categories. Make sure to **click "Reinforcement Learning"**. You'll then get to **introduce yourself in the `#introduction-yourself` channel**. +In #role-assignment, you can pick different categories. Make sure to **click "Reinforcement Learning"**. You'll then get to **introduce yourself in the `#introduce-yourself` channel**. Discord diff --git a/units/en/unit1/hands-on.mdx b/units/en/unit1/hands-on.mdx index c2dc4cd..e36ad53 100644 --- a/units/en/unit1/hands-on.mdx +++ b/units/en/unit1/hands-on.mdx @@ -43,13 +43,6 @@ You can either do this hands-on by reading the notebook or following it with the In this notebook, you'll train your **first Deep Reinforcement Learning agent** a Lunar Lander agent that will learn to **land correctly on the Moon ๐ŸŒ•**. Using [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) a Deep Reinforcement Learning library, share them with the community, and experiment with different configurations -โฌ‡๏ธ Here is an example of what **you will achieve in just a couple of minutes.** โฌ‡๏ธ - -```python -%%html - -``` - ### The environment ๐ŸŽฎ - [LunarLander-v2](https://www.gymlibrary.dev/environments/box2d/lunar_lander/) @@ -92,7 +85,7 @@ Before diving into the notebook, you need to: ๐Ÿ”ฒ ๐Ÿ“ **Read Unit 0** that gives you all the **information about the course and help you to onboard** ๐Ÿค— -๐Ÿ”ฒ ๐Ÿ“š **Develop an understanding of the foundations of Reinforcement learning** (MC, TD, Rewards hypothesis...) by doing Unit 1 +๐Ÿ”ฒ ๐Ÿ“š **Develop an understanding of the foundations of Reinforcement learning** by reading Unit 1 ## A small recap of what is Deep Reinforcement Learning ๐Ÿ“š The RL process diff --git a/units/en/unit1/introduction.mdx b/units/en/unit1/introduction.mdx index f8017cd..e72ee9e 100644 --- a/units/en/unit1/introduction.mdx +++ b/units/en/unit1/introduction.mdx @@ -22,6 +22,6 @@ It's essential **to master these elements**ย before diving into implementing Dee After this unit, in a bonus unit, you'll be **able to train Huggy the Dog ๐Ÿถ to fetch the stick and play with him ๐Ÿค—**. -Huggy +