diff --git a/chapters/en/_toctree.yml b/chapters/en/_toctree.yml new file mode 100644 index 0000000..d5541fc --- /dev/null +++ b/chapters/en/_toctree.yml @@ -0,0 +1,90 @@ +- title: Unit 0. Welcome to the course + sections: + - local: unit0/introduction + title: Welcome to the course đŸ€— + - local: unit0/setup + title: Setup + - local: unit0/discord101 + title: Discord 101 +- title: Unit 1. Introduction to Deep Reinforcement Learning + sections: + - local: unit1/introduction + title: Introduction + - local: unit1/what-is-rl + title: What is Reinforcement Learning? + - local: unit1/rl-framework + title: The Reinforcement Learning Framework + - local: unit1/tasks + title: The type of tasks + - local: unit1/exp-exp-tradeoff + title: The Exploration/ Exploitation tradeoff + - local: unit1/two-methods + title: The two main approaches for solving RL problems + - local: unit1/deep-rl + title: The “Deep” in Deep Reinforcement Learning + - local: unit1/summary + title: Summary + - local: unit1/hands-on + title: Hands-on + - local: unit1/quiz + title: Quiz + - local: unit1/conclusion + title: Conclusion +- title: Bonus Unit 1. Introduction to Deep Reinforcement Learning with Huggy + sections: + - local: unitbonus1/introduction + title: Introduction +- title: Unit 2. Introduction to Q-Learning + sections: + - local: unit2/introduction + title: Introduction + - local: unit2/what-is-rl + title: What is RL? A short recap + - local: unit2/two-types-value-based-methods + title: The two types of value-based methods + - local: unit2/bellman-equation + title: The Bellman Equation, simplify our value estimation + - local: unit2/mc-vs-td + title: Monte Carlo vs Temporal Difference Learning + - local: unit2/summary1 + title: Summary + - local: unit2/quiz1 + title: First Quiz + - local: unit2/q-learning + title: Introducing Q-Learning + - local: unit2/q-learning-example + title: A Q-Learning example + - local: unit2/hands-on + title: Hands-on + - local: unit2/quiz2 + title: Second Quiz + - local: unit2/conclusion + title: Conclusion + - local: unit2/additional-reading + title: Additional Reading +- title: Unit 3. Deep Q-Learning with Atari Games + sections: + - local: unit3/introduction + title: Introduction + - local: unit3/from-q-to-dqn + title: From Q-Learning to Deep Q-Learning + - local: unit3/deep-q-network + title: The Deep Q-Network (DQN) + - local: unit3/deep-q-algorithm + title: The Deep Q Algorithm + - local: unit3/hands-on + title: Hands-on + - local: unit3/quiz + title: Quiz + - local: unit3/conclusion + title: Conclusion + - local: unit3/additional-reading + title: Additional Reading +- title: Unit Bonus 2. Automatic Hyperparameter Tuning with Optuna + sections: + - local: unitbonus2/introduction + title: Introduction + - local: unitbonus2/optuna + title: Optuna + - local: unitbonus2/hands-on + title: Hands-on diff --git a/chapters/en/unit0/discord101.mdx b/chapters/en/unit0/discord101.mdx new file mode 100644 index 0000000..bf93ac1 --- /dev/null +++ b/chapters/en/unit0/discord101.mdx @@ -0,0 +1,33 @@ +# Discord 101 [[discord-101]] + +Hey there! My name is Huggy, the dog 🐕, and I'm looking forward to train with you during this RL Course! +Although I don't know much about bringing sticks (yet), I know one or two things about Discord. So I wrote this guide to help you learn about it! + +Huggy Logo + +Discord is a free chat platform. If you've used Slack, **it's quite similar**. There is a Hugging Face Community Discord server with 8000 members you can join with a single click here. So many humans to play with! + +Starting in Discord can be a bit intimidating, so let me take you through it. + +When you sign-up to our discord server, you need to **introduce yourself in `introduction-yourself` channel**. Then, you need to sign up to which channel groups might interest you by looking at `role-assignment`. + +## So which channels are interesting to me? [[channels]] + +They are in the reinforcement learning lounge. **Don't forget to sign up to these channels** by clicking on đŸ€– Reinforcement Learning in `role-assigment`. +- `rl-announcements`: where we give the **lastest information about the course**. +- `rl-discussions`: where you can **exchange about RL and share information**. +- `rl-study-group`: where you can **create and join study groups**. + +The HF Community Server has a thriving community of humans interested in many areas, so you can also learn from those. There are paper discussions, events, and many other things. Here are other channels in the community + +Was this useful? There are a couple of tips I can share with you: + +- There are **voice channels** you can use as well, although most people prefer text chat. +- You can **use markdown style**. So if you're writing code, you can use that style. Sadly this does not work as well for links. +- You can open threads as well! It's a good idea when **it's a long conversation**. + +I hope this is useful! And if you have questions, just ask! + +See you later! + +Huggy diff --git a/chapters/en/unit0/introduction.mdx b/chapters/en/unit0/introduction.mdx new file mode 100644 index 0000000..4558e39 --- /dev/null +++ b/chapters/en/unit0/introduction.mdx @@ -0,0 +1,123 @@ +# Welcome to the đŸ€— Deep Reinforcement Learning Course [[introduction]] + +Deep RL Course thumbnail + +Welcome to the most fascinating topic in Artificial Intelligence: Deep Reinforcement Learning. + +This course will **teach you about Deep Reinforcement Learning from beginner to expert**. It’s completely free. + +In this unit you’ll: + +- Learn more about the **course content**. +- **Define the path** you’re going to take (either self-audit or certification process) +- Learn more about the **AI vs. AI challenges** you're going to participate to. +- Learn more **about us**. +- **Create your Hugging Face account** (it’s free). +- **Sign-up our discord server**, the place where you can exchange with our classmates and us. + +Let’s get started! + +## What to expect? [[expect]] + +In this course, you will: + +- 📖 Study Deep Reinforcement Learning in **theory and practice.** +- đŸ§‘â€đŸ’» Learn to **use famous Deep RL libraries** such as Stable Baselines3, RL Baselines3 Zoo, Sample Factory and CleanRL. +- đŸ€– **Train agents in unique environments** such as SnowballFight, Huggy the Doggo đŸ¶, MineRL (Minecraft ⛏), VizDoom (Doom) and classical ones such as Space Invaders and PyBullet. +- đŸ’Ÿ Publish your **trained agents in one line of code to the Hub**. But also download powerful agents from the community. +- 🏆 Participate in challenges where you will **evaluate your agents against other teams. But also play against AI you'll train.** + +And more! + +At the end of this course, **you’ll get a solid foundation from the basics to the SOTA (state-of-the-art) methods**. + +You can find the syllabus on our website 👉 here + +Don’t forget to **sign up to the course** (we are collecting your email to be able to **send you the links when each Unit is published and give you information about the challenges and updates).** + +Sign up 👉 here + + +## What the course look like? [[course-look-like]] +The course is composed of: + +- *A theory part*: where you learn a **concept in theory (article)**. +- *A hands-on*: with a **weekly live hands-on session** in ADD DATE every week at ADD TIME. where you'll learn to use famous Deep RL libraries such as Stable Baselines3, RL Baselines3 Zoo, and RLlib to train your agents in unique environments such as SnowballFight, Huggy the Doggo dog, and classical ones such as Space Invaders and PyBullet. +We strongly advise you to participate to the live so that you can ask questions but if you can't participate in the live, the sessions are recorded and will be posted. +- *Challenges* such AI vs. AI and leaderboard. + + +## Two paths: choose your own adventure [[two-paths]] + +Two paths + +You can choose to follow this course either: + +- *To get a certificate of completion*: you need to complete 80% of the assignments before the end of March 2023. +- *As a simple audit*: you can participate in all challenges and do assignments if you want, but you have no deadlines. + +Whatever path you choose, we advise you **to follow the recommended pace to enjoy the course and challenges with the most classmates.** +You don't need to tell us which path you choose. At the end of March, when we verify the assignments **if you get more than 80% of the assignments done, you'll get a certificate.** + + + +## How to get most of the course? [[advice]] + +To get most of the course, we have some advice: + +1. Join or create study groups in Discord : studying in groups is always easier. To do that, you need to join our discord server. +2. **Do the quizzes and assignments**: the best way to learn is to do and test yourself. +3. **Define a schedule to stay in sync: you can use our recommended pace schedule below or create yours.** + +Course advice + +## What tools do I need? [[tools]] + +You need only 3 things: + +- A computer with an internet connection. +- Google Colab (free version): most of our hands-on will use Google Colab, the **free version is enough.** +- A Hugging Face Account: to push and load models. If you don’t have an account yet you can create one here (it’s free). + +Course tools needed + + +## What is the recommended pace? [[recommended-pace]] + +We defined a planning that you can follow to keep up the pace of the course. + +Course advice +Course advice + + +Each chapter in this course is designed **to be completed in 1 week, with approximately 3-4 hours of work per week**. However, you can take as much time as you need to complete the course. + + +## Who are we [[who-are-we]] +About the authors: + +Thomas Simonini is a Developer Advocate at Hugging Face đŸ€— specializing in Deep Reinforcement Learning. He founded Deep Reinforcement Learning Course in 2018, which became one of the most used courses in Deep RL. + +ADD OMAR + +## When do the challenges start? [[challenges]] + +In this new version of the course, you have two types of challenges: +- A leaderboard to compare your agent's performance to other classmates'. +- AI vs. AI challenges where you can train your agent and compete against other classmates' agents. + +Challenges + +These AI vs.AI challenges will be announced **later in December**. + + +## I found a bug, or I want to improve the course [[contribute]] + +Contributions are welcomed đŸ€— + +- If you *found a bug 🐛 in a notebook*, please open an issue and **describe the problem**. +- If you *want to improve the course*, you can open a Pull Request. + +## I still have questions [[questions]] + +In that case, check our FAQ. And if the question is not in it, ask your question in our discord server #rl-discussions. diff --git a/chapters/en/unit0/setup.mdx b/chapters/en/unit0/setup.mdx new file mode 100644 index 0000000..0ad292c --- /dev/null +++ b/chapters/en/unit0/setup.mdx @@ -0,0 +1,30 @@ +# Setup [[setup]] + +After all this information, it's time to get started. We're going to do two things: + +1. Create your Hugging Face account if it's not already done +2. Sign up to Discord and introduce yourself (don't be shy đŸ€—) + +### Let's create my Hugging Face account + +(If it's not already done) create an account to HF here + +### Let's join our Discord server + +You can now sign up for our Discord Server. This is the place where you **can exchange with the community and with us, create and join study groups to grow each other and more** + +đŸ‘‰đŸ» Join our discord server here. + +When you join, remember to introduce yourself in #introduce-yourself and sign-up for reinforcement channels in #role-assignments. + +We have multiple RL-related channels: +- `rl-announcements`: where we give the last information about the course. +- `rl-discussions`: where you can exchange about RL and share information. +- `rl-study-group`: where you can create and join study groups. + +If this is your first time using Discord, we wrote a discord 101 to get the best practices. Check the next section. + +Congratulations! **You've just finished the on-boarding**. You're now ready to start to learn Deep Reinforcement Learning. Have fun! + + +### Keep Learning, stay awesome đŸ€— diff --git a/chapters/en/unit1/conclusion.mdx b/chapters/en/unit1/conclusion.mdx new file mode 100644 index 0000000..f41ef14 --- /dev/null +++ b/chapters/en/unit1/conclusion.mdx @@ -0,0 +1,15 @@ +# Conclusion [[conclusion]] + +Congrats on finishing this chapter! **That was the biggest one**, and there was a lot of information. And congrats on finishing the tutorial. You’ve just trained your first Deep RL agents and shared it on the Hub đŸ„ł. + +That’s **normal if you still feel confused with all these elements**. This was the same for me and for all people who studied RL. + +**Take time to really grasp the material** before continuing. It’s important to master these elements and having a solid foundations before entering the fun part. + +Naturally, during the course, we’re going to use and explain these terms again, but it’s better to understand them before diving into the next chapters. + +In the next chapter, we’re going to reinforce what we just learn by **training Huggy the Dog to fetch the stick**. + +You will be able then to play with him đŸ€—. + +ADD GIF HUGGY diff --git a/chapters/en/unit1/deep-rl.mdx b/chapters/en/unit1/deep-rl.mdx new file mode 100644 index 0000000..fab6091 --- /dev/null +++ b/chapters/en/unit1/deep-rl.mdx @@ -0,0 +1,21 @@ +# The “Deep” in Reinforcement Learning [[deep-rl]] + + +What we've talked about so far is Reinforcement Learning. But where does the "Deep" come into play? + + +Deep Reinforcement Learning introduces **deep neural networks to solve Reinforcement Learning problems** — hence the name “deep”. + +For instance, in the next article, we’ll work on Q-Learning (classic Reinforcement Learning) and then Deep Q-Learning both are value-based RL algorithms. + +You’ll see the difference is that in the first approach, **we use a traditional algorithm** to create a Q table that helps us find what action to take for each state. + +In the second approach, **we will use a Neural Network** (to approximate the q value). + +
+Value based RL +
Schema inspired by the Q learning notebook by Udacity +
+
+ +If you are not familiar with Deep Learning you definitely should watch the fastai Practical Deep Learning for Coders (Free) diff --git a/chapters/en/unit1/exp-exp-tradeoff.mdx b/chapters/en/unit1/exp-exp-tradeoff.mdx new file mode 100644 index 0000000..b616f81 --- /dev/null +++ b/chapters/en/unit1/exp-exp-tradeoff.mdx @@ -0,0 +1,36 @@ +# The Exploration/ Exploitation tradeoff [[exp-exp-tradeoff]] + +Finally, before looking at the different methods to solve Reinforcement Learning problems, we must cover one more very important topic: *the exploration/exploitation trade-off.* + +- *Exploration* is exploring the environment by trying random actions in order to **find more information about the environment.** +- *Exploitation* is **exploiting known information to maximize the reward.** + +Remember, the goal of our RL agent is to maximize the expected cumulative reward. However, **we can fall into a common trap**. + +Let’s take an example: + +Exploration + +In this game, our mouse can have an **infinite amount of small cheese** (+1 each). But at the top of the maze, there is a gigantic sum of cheese (+1000). + +However, if we only focus on exploitation, our agent will never reach the gigantic sum of cheese. Instead, it will only exploit **the nearest source of rewards,** even if this source is small (exploitation). + +But if our agent does a little bit of exploration, it can **discover the big reward** (the pile of big cheese). + +This is what we call the exploration/exploitation trade-off. We need to balance how much we **explore the environment** and how much we **exploit what we know about the environment.** + +Therefore, we must **define a rule that helps to handle this trade-off**. We’ll see in future chapters different ways to handle it. + +If it’s still confusing, **think of a real problem: the choice of a restaurant:** + +
+Exploration +
Source: Berkley AI Course +
+
+ +- *Exploitation*: You go every day to the same one that you know is good and **take the risk to miss another better restaurant.** +- *Exploration*: Try restaurants you never went to before, with the risk of having a bad experience **but the probable opportunity of a fantastic experience.** + +To recap: +Exploration Exploitation Tradeoff diff --git a/chapters/en/unit1/hands-on.mdx b/chapters/en/unit1/hands-on.mdx new file mode 100644 index 0000000..624e134 --- /dev/null +++ b/chapters/en/unit1/hands-on.mdx @@ -0,0 +1,13 @@ +# Hands on [[hands-on]] + +Now that you've studied the bases of Reinforcement Learning, you’re ready to train your first two agents and share it with the community through the Hub đŸ”„: + +- A Lunar Lander agent that will learn to land correctly on the Moon 🌕 +- A car that needs to reach the top of the mountain ⛰ . + +TODO: Add illustration MountainCar and MoonLanding + + +Thanks to our leaderboard, you'll be able to compare your results with other classmates and exchange the best practices to improve your agent's scores Who will win the challenge for Unit 1 🏆? + +So let's get started! 🚀 diff --git a/chapters/en/unit1/introduction.mdx b/chapters/en/unit1/introduction.mdx new file mode 100644 index 0000000..7f9860e --- /dev/null +++ b/chapters/en/unit1/introduction.mdx @@ -0,0 +1,26 @@ +# Introduction to Deep Reinforcement Learning [[introduction-to-deep-reinforcement-learning]] + + +TODO: ADD IMAGE THUMBNAIL + + +Welcome to the most fascinating topic in Artificial Intelligence: **Deep Reinforcement Learning.** + +Deep RL is a type of Machine Learning where an agent learns **how to behave** in an environment **by performing actions** and **seeing the results.** + +So in this first chapter, **you'll learn the foundations of Deep Reinforcement Learning.** + +Then, you'll **train your first two Deep Reinforcement Learning agents** using Stable-Baselines3 a Deep Reinforcement Learning library.: + +1. A Lunar Lander agent that will learn to **land correctly on the Moon 🌕** +2. A car that needs **to reach the top of the mountain ⛰ **. + +TODO: Add illustration MountainCar and MoonLanding + +And finally, you'll **upload it to the Hugging Face Hub đŸ€—, a free, open platform where people can share ML models, datasets, and demos.** + +TODO: ADD model card illustration + +It's essential **to master these elements** before diving into implementing Deep Reinforcement Learning agents. The goal of this chapter is to give you solid foundations. + +So let's get started! 🚀 diff --git a/chapters/en/unit1/quiz.mdx b/chapters/en/unit1/quiz.mdx new file mode 100644 index 0000000..1f55843 --- /dev/null +++ b/chapters/en/unit1/quiz.mdx @@ -0,0 +1,168 @@ +# Quiz [[quiz]] + +The best way to learn and [to avoid the illusion of competence](https://fr.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf) **is to test yourself.** This will help you to find **where you need to reinforce your knowledge**. + +### Q1: What is Reinforcement Learning? + +
+Solution + +Reinforcement learning is a **framework for solving control tasks (also called decision problems)** by building agents that learn from the environment by interacting with it through trial and error and **receiving rewards (positive or negative) as unique feedback**. + +
+ + + +### Q2: Define the RL Loop + +Exercise RL Loop + +At every step: +- Our Agent receives ______ from the environment +- Based on that ______ the Agent takes an ______ +- Our Agent will move to the right +- The Environment goes to a ______ +- The Environment gives a ______ to the Agent + + + + +### Q3: What's the difference between a state and an observation? + +