Merge pull request #99 from huggingface/v2.0

Publish Unit 0, Unit 1 and Bonus Unit 1
This commit is contained in:
Thomas Simonini
2022-12-05 19:05:05 +01:00
committed by GitHub
26 changed files with 6131 additions and 1 deletions

View File

@@ -1,4 +1,19 @@
# The Hugging Face Deep Reinforcement Learning Class 🤗
# The Hugging Face Deep Reinforcement Learning Class 🤗
We're launching a **new version (v2.0) of the course starting December the 5th,**
- The syllabus 📚: https://simoninithomas.github.io/deep-rl-course
- The course 📚:
- **Sign up here** ➡️➡️➡️ http://eepurl.com/ic5ZUD
<br>
<br>
<br>
<br>
# The documentation below is for v1.0 (depreciated)
We're launching a **new version (v2.0) of the course starting December the 5th,**

View File

@@ -0,0 +1,557 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "2D3NL_e4crQv"
},
"source": [
"# Bonus Unit 1: Let's train Huggy the Dog 🐶 to fetch a stick"
]
},
{
"cell_type": "markdown",
"source": [
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit2/thumbnail.png\" alt=\"Bonus Unit 1Thumbnail\">\n",
"\n",
"In this notebook, we'll reinforce what we learned in the first Unit by **teaching Huggy the Dog to fetch the stick and then play with it directly in your browser**\n",
"\n",
"⬇️ Here is an example of what **you will achieve at the end of the unit.** ⬇️ (launch ▶ to see)"
],
"metadata": {
"id": "FMYrDriDujzX"
}
},
{
"cell_type": "code",
"source": [
"%%html\n",
"<video controls autoplay><source src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/huggy.mp4\" type=\"video/mp4\"></video>"
],
"metadata": {
"id": "PnVhs1yYNyUF"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### The environment 🎮\n",
"\n",
"- Huggy the Dog, an environment created by [Thomas Simonini](https://twitter.com/ThomasSimonini) based on [Puppo The Corgi](https://blog.unity.com/technology/puppo-the-corgi-cuteness-overload-with-the-unity-ml-agents-toolkit)\n",
"\n",
"### The library used 📚\n",
"\n",
"- [MLAgents (Hugging Face version)](https://github.com/huggingface/ml-agents)"
],
"metadata": {
"id": "x7oR6R-ZIbeS"
}
},
{
"cell_type": "markdown",
"source": [
"We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the Github Repo](https://github.com/huggingface/deep-rl-class/issues)."
],
"metadata": {
"id": "60yACvZwO0Cy"
}
},
{
"cell_type": "markdown",
"source": [
"## Objectives of this notebook 🏆\n",
"\n",
"At the end of the notebook, you will:\n",
"\n",
"- Understand **the state space, action space and reward function used to train Huggy**.\n",
"- **Train your own Huggy** to fetch the stick.\n",
"- Be able to play **with your trained Huggy directly in your browser**.\n",
"\n",
"\n"
],
"metadata": {
"id": "Oks-ETYdO2Dc"
}
},
{
"cell_type": "markdown",
"source": [
"## This notebook is from Deep Reinforcement Learning Course\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/deep-rl-course-illustration.jpg\" alt=\"Deep RL Course illustration\"/>"
],
"metadata": {
"id": "mUlVrqnBv2o1"
}
},
{
"cell_type": "markdown",
"source": [
"In this free course, you will:\n",
"\n",
"- 📖 Study Deep Reinforcement Learning in **theory and practice**.\n",
"- 🧑‍💻 Learn to **use famous Deep RL libraries** such as Stable Baselines3, RL Baselines3 Zoo, CleanRL and Sample Factory 2.0.\n",
"- 🤖 Train **agents in unique environments** \n",
"\n",
"And more check 📚 the syllabus 👉 https://simoninithomas.github.io/deep-rl-course\n",
"\n",
"Dont forget to **<a href=\"http://eepurl.com/ic5ZUD\">sign up to the course</a>** (we are collecting your email to be able to **send you the links when each Unit is published and give you information about the challenges and updates).**\n",
"\n",
"\n",
"The best way to keep in touch is to join our discord server to exchange with the community and with us 👉🏻 https://discord.gg/ydHrjt3WP5"
],
"metadata": {
"id": "pAMjaQpHwB_s"
}
},
{
"cell_type": "markdown",
"source": [
"## Prerequisites 🏗️\n",
"\n",
"Before diving into the notebook, you need to:\n",
"\n",
"🔲 📚 **Develop an understanding of the foundations of Reinforcement learning** (MC, TD, Rewards hypothesis...) by doing Unit 1\n",
"\n",
"🔲 📚 **Read the introduction to Huggy** by doing Bonus Unit 1"
],
"metadata": {
"id": "6r7Hl0uywFSO"
}
},
{
"cell_type": "markdown",
"source": [
"## Set the GPU 💪\n",
"- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`\n",
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step1.jpg\" alt=\"GPU Step 1\">"
],
"metadata": {
"id": "DssdIjk_8vZE"
}
},
{
"cell_type": "markdown",
"source": [
"- `Hardware Accelerator > GPU`\n",
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step2.jpg\" alt=\"GPU Step 2\">"
],
"metadata": {
"id": "sTfCXHy68xBv"
}
},
{
"cell_type": "markdown",
"metadata": {
"id": "an3ByrXYQ4iK"
},
"source": [
"## Clone the repository and install the dependencies 🔽\n",
"\n",
"- We need to clone the repository, that **contains the experimental version of the library that allows you to push your trained agent to the Hub.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6WNoL04M7rTa"
},
"outputs": [],
"source": [
"%%capture\n",
"# Clone this specific repository (can take 3min)\n",
"!git clone https://github.com/huggingface/ml-agents/"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "d8wmVcMk7xKo"
},
"outputs": [],
"source": [
"%%capture\n",
"# Go inside the repository and install the package (can take 3min)\n",
"%cd ml-agents\n",
"!pip3 install -e ./ml-agents-envs\n",
"!pip3 install -e ./ml-agents"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HRY5ufKUKfhI"
},
"source": [
"## Download and move the environment zip file in `./trained-envs-executables/linux/`\n",
"\n",
"- Our environment executable is in a zip file.\n",
"- We need to download it and place it to `./trained-envs-executables/linux/`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "C9Ls6_6eOKiA"
},
"outputs": [],
"source": [
"!mkdir ./trained-envs-executables\n",
"!mkdir ./trained-envs-executables/linux"
]
},
{
"cell_type": "code",
"source": [
"!wget --load-cookies /tmp/cookies.txt \"https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1zv3M95ZJTWHUVOWT6ckq_cm98nft8gdF' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\\1\\n/p')&id=1zv3M95ZJTWHUVOWT6ckq_cm98nft8gdF\" -O ./trained-envs-executables/linux/Huggy.zip && rm -rf /tmp/cookies.txt"
],
"metadata": {
"id": "EB-G-80GsxYN"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "jsoZGxr1MIXY"
},
"source": [
"Download the file Huggy.zip from https://drive.google.com/uc?export=download&id=1zv3M95ZJTWHUVOWT6ckq_cm98nft8gdF using `wget`. Check out the full solution to download large files from GDrive [here](https://bcrf.biochem.wisc.edu/2021/02/05/download-google-drive-files-using-wget/)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "8FPx0an9IAwO"
},
"outputs": [],
"source": [
"%%capture\n",
"!unzip -d ./trained-envs-executables/linux/ ./trained-envs-executables/linux/Huggy.zip"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nyumV5XfPKzu"
},
"source": [
"Make sure your file is accessible "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "EdFsLJ11JvQf"
},
"outputs": [],
"source": [
"!chmod -R 755 ./trained-envs-executables/linux/Huggy"
]
},
{
"cell_type": "markdown",
"source": [
"## Let's recap how this environment works\n",
"\n",
"### The State Space: what Huggy \"perceives.\"\n",
"\n",
"Huggy doesn't \"see\" his environment. Instead, we provide him information about the environment:\n",
"\n",
"- The target (stick) position\n",
"- The relative position between himself and the target\n",
"- The orientation of his legs.\n",
"\n",
"Given all this information, Huggy **can decide which action to take next to fulfill his goal**.\n",
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/huggy.jpg\" alt=\"Huggy\" width=\"100%\">\n",
"\n",
"\n",
"### The Action Space: what moves Huggy can do\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/huggy-action.jpg\" alt=\"Huggy action\" width=\"100%\">\n",
"\n",
"**Joint motors drive huggy legs**. It means that to get the target, Huggy needs to **learn to rotate the joint motors of each of his legs correctly so he can move**.\n",
"\n",
"### The Reward Function\n",
"\n",
"The reward function is designed so that **Huggy will fulfill his goal** : fetch the stick.\n",
"\n",
"Remember that one of the foundations of Reinforcement Learning is the *reward hypothesis*: a goal can be described as the **maximization of the expected cumulative reward**.\n",
"\n",
"Here, our goal is that Huggy **goes towards the stick but without spinning too much**. Hence, our reward function must translate this goal.\n",
"\n",
"Our reward function:\n",
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/reward.jpg\" alt=\"Huggy reward function\" width=\"100%\">\n",
"\n",
"- *Orientation bonus*: we **reward him for getting close to the target**.\n",
"- *Time penalty*: a fixed-time penalty given at every action to **force him to get to the stick as fast as possible**.\n",
"- *Rotation penalty*: we penalize Huggy if **he spins too much and turns too quickly**.\n",
"- *Getting to the target reward*: we reward Huggy for **reaching the target**."
],
"metadata": {
"id": "dYKVj8yUvj55"
}
},
{
"cell_type": "markdown",
"source": [
"## Check the Huggy config file\n",
"\n",
"- In ML-Agents, you define the **training hyperparameters into config.yaml files.**\n",
"\n",
"- For the scope of this notebook, we're not going to modify the hyperparameters, but if you want to try as an experiment, you should also try to modify some other hyperparameters, Unity provides very [good documentation explaining each of them here](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-Configuration-File.md)."
],
"metadata": {
"id": "NAuEq32Mwvtz"
}
},
{
"cell_type": "markdown",
"source": [
"- **In the case you want to modify the hyperparameters**, in Google Colab notebook, you can click here to open the config.yaml: `/content/ml-agents/config/ppo/Huggy.yaml`\n",
"\n",
"\n",
"Were now ready to train our agent 🔥."
],
"metadata": {
"id": "r9wv5NYGw-05"
}
},
{
"cell_type": "markdown",
"metadata": {
"id": "f9fI555bO12v"
},
"source": [
"## Train our agent\n",
"\n",
"To train our agent, we just need to **launch mlagents-learn and select the executable containing the environment.**\n",
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/mllearn.png\" alt=\"ml learn function\" width=\"100%\">\n",
"\n",
"With ML Agents, we run a training script. We define four parameters:\n",
"\n",
"1. `mlagents-learn <config>`: the path where the hyperparameter config file is.\n",
"2. `--env`: where the environment executable is.\n",
"3. `--run_id`: the name you want to give to your training run id.\n",
"4. `--no-graphics`: to not launch the visualization during the training.\n",
"\n",
"Train the model and use the `--resume` flag to continue training in case of interruption. \n",
"\n",
"> It will fail first time when you use `--resume`, try running the block again to bypass the error. \n",
"\n"
]
},
{
"cell_type": "markdown",
"source": [
"The training will take 30 to 45min depending on your machine (don't forget to **set up a GPU**), go take a ☕you deserve it 🤗."
],
"metadata": {
"id": "lN32oWF8zPjs"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "bS-Yh1UdHfzy"
},
"outputs": [],
"source": [
"!mlagents-learn ./config/ppo/Huggy.yaml --env=./trained-envs-executables/linux/Huggy/Huggy --run-id=\"Huggy\" --no-graphics"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5Vue94AzPy1t"
},
"source": [
"## Push the agent to the 🤗 Hub\n",
"\n",
"- Now that we trained our agent, were **ready to push it to the Hub to be able to play with Huggy on your browser🔥.**"
]
},
{
"cell_type": "markdown",
"source": [
"To be able to share your model with the community there are three more steps to follow:\n",
"\n",
"1⃣ (If it's not already done) create an account to HF ➡ https://huggingface.co/join\n",
"\n",
"2⃣ Sign in and then, you need to store your authentication token from the Hugging Face website.\n",
"- Create a new token (https://huggingface.co/settings/tokens) **with write role**\n",
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/create-token.jpg\" alt=\"Create HF Token\">\n",
"\n",
"- Copy the token \n",
"- Run the cell below and paste the token"
],
"metadata": {
"id": "izT6FpgNzZ6R"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "rKt2vsYoK56o"
},
"outputs": [],
"source": [
"from huggingface_hub import notebook_login\n",
"notebook_login()"
]
},
{
"cell_type": "markdown",
"source": [
"If you don't want to use a Google Colab or a Jupyter Notebook, you need to use this command instead: `huggingface-cli login`"
],
"metadata": {
"id": "ew59mK19zjtN"
}
},
{
"cell_type": "markdown",
"metadata": {
"id": "Xi0y_VASRzJU"
},
"source": [
"Then, we simply need to run `mlagents-push-to-hf`.\n",
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/mlpush.png\" alt=\"ml learn function\" width=\"100%\">"
]
},
{
"cell_type": "markdown",
"source": [
"And we define 4 parameters:\n",
"\n",
"1. `--run-id`: the name of the training run id.\n",
"2. `--local-dir`: where the agent was saved, its results/<run_id name>, so in my case results/First Training.\n",
"3. `--repo-id`: the name of the Hugging Face repo you want to create or update. Its always <your huggingface username>/<the repo name>\n",
"If the repo does not exist **it will be created automatically**\n",
"4. `--commit-message`: since HF repos are git repository you need to define a commit message."
],
"metadata": {
"id": "KK4fPfnczunT"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "dGEFAIboLVc6"
},
"outputs": [],
"source": [
"!mlagents-push-to-hf --run-id=\"HuggyTraining\" --local-dir=\"./results/Huggy\" --repo-id=\"ThomasSimonini/ppo-Huggy\" --commit-message=\"Huggy\""
]
},
{
"cell_type": "markdown",
"source": [
"Else, if everything worked you should have this at the end of the process(but with a different url 😆) :\n",
"\n",
"\n",
"\n",
"```\n",
"Your model is pushed to the hub. You can view your model here: https://huggingface.co/ThomasSimonini/ppo-Huggy\n",
"```\n",
"\n",
"Its the link to your model repository. The repository contains a model card that explains how to use the model, your Tensorboard logs and your config file. **Whats awesome is that its a git repository, which means you can have different commits, update your repository with a new push, open Pull Requests, etc.**\n",
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/modelcard.png\" alt=\"ml learn function\" width=\"100%\">"
],
"metadata": {
"id": "yborB0850FTM"
}
},
{
"cell_type": "markdown",
"source": [
"But now comes the best: **being able to play with Huggy online 👀.**"
],
"metadata": {
"id": "5Uaon2cg0NrL"
}
},
{
"cell_type": "markdown",
"source": [
"## Play with your Huggy 🐕\n",
"\n",
"This step is the simplest:\n",
"\n",
"- Open the game Huggy in your browser: https://huggingface.co/spaces/ThomasSimonini/Huggy\n",
"\n",
"- Click on Play with my Huggy model\n",
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/load-huggy.jpg\" alt=\"load-huggy\" width=\"100%\">"
],
"metadata": {
"id": "VMc4oOsE0QiZ"
}
},
{
"cell_type": "markdown",
"source": [
"1. In step 1, choose your model repository which is the model id (in my case ThomasSimonini/ppo-Huggy).\n",
"\n",
"2. In step 2, **choose what model you want to replay**:\n",
" - I have multiple ones, since we saved a model every 500000 timesteps. \n",
" - But since I want the more recent, I choose `Huggy.onnx`\n",
"\n",
"👉 Whats nice **is to try with different models steps to see the improvement of the agent.**"
],
"metadata": {
"id": "Djs8c5rR0Z8a"
}
},
{
"cell_type": "markdown",
"source": [
"Congrats on finishing this bonus unit!\n",
"\n",
"You can now sit and enjoy playing with your Huggy 🐶. And don't **forget to spread the love by sharing Huggy with your friends 🤗**. And if you share about it on social media, **please tag us @huggingface and me @simoninithomas**\n",
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/huggy-cover.jpeg\" alt=\"Huggy cover\" width=\"100%\">\n",
"\n",
"\n",
"## Keep Learning, Stay awesome 🤗"
],
"metadata": {
"id": "PI6dPWmh064H"
}
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"provenance": [],
"private_outputs": true
},
"gpuClass": "standard",
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

1144
notebooks/unit1/unit1.ipynb Normal file

File diff suppressed because one or more lines are too long

1743
notebooks/unit2/unit2.ipynb Normal file

File diff suppressed because it is too large Load Diff

794
notebooks/unit3/unit3.ipynb Normal file

File diff suppressed because one or more lines are too long

46
units/en/_toctree.yml Normal file
View File

@@ -0,0 +1,46 @@
- title: Unit 0. Welcome to the course
sections:
- local: unit0/introduction
title: Welcome to the course 🤗
- local: unit0/setup
title: Setup
- local: unit0/discord101
title: Discord 101
- title: Unit 1. Introduction to Deep Reinforcement Learning
sections:
- local: unit1/introduction
title: Introduction
- local: unit1/what-is-rl
title: What is Reinforcement Learning?
- local: unit1/rl-framework
title: The Reinforcement Learning Framework
- local: unit1/tasks
title: The type of tasks
- local: unit1/exp-exp-tradeoff
title: The Exploration/ Exploitation tradeoff
- local: unit1/two-methods
title: The two main approaches for solving RL problems
- local: unit1/deep-rl
title: The “Deep” in Deep Reinforcement Learning
- local: unit1/summary
title: Summary
- local: unit1/hands-on
title: Hands-on
- local: unit1/quiz
title: Quiz
- local: unit1/conclusion
title: Conclusion
- local: unit1/additional-readings
title: Additional Readings
- title: Bonus Unit 1. Introduction to Deep Reinforcement Learning with Huggy
sections:
- local: unitbonus1/introduction
title: Introduction
- local: unitbonus1/how-huggy-works
title: How Huggy works?
- local: unitbonus1/train
title: Train Huggy
- local: unitbonus1/play
title: Play with Huggy
- local: unitbonus1/conclusion
title: Conclusion

View File

@@ -0,0 +1,33 @@
# Discord 101 [[discord-101]]
Hey there! My name is Huggy, the dog 🐕, and I'm looking forward to train with you during this RL Course!
Although I don't know much about bringing sticks (yet), I know one or two things about Discord. So I wrote this guide to help you learn about it!
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/huggy-logo.jpg" alt="Huggy Logo"/>
Discord is a free chat platform. If you've used Slack, **it's quite similar**. There is a Hugging Face Community Discord server with 18000 members you can <a href="https://discord.gg/ydHrjt3WP5">join with a single click here</a>. So many humans to play with!
Starting in Discord can be a bit intimidating, so let me take you through it.
When you sign-up to our Discord server, you'll need to specify which topics you're interested in by **clicking #role-assignment at the left**. Here, you can pick different categories. Make sure to **click "Reinforcement Learning"**! :fire:. You'll then get to **introduce yourself in the `#introduction-yourself` channel**.
## So which channels are interesting to me? [[channels]]
They are in the reinforcement learning lounge. **Don't forget to sign up to these channels** by clicking on 🤖 Reinforcement Learning in `role-assigment`.
- `rl-announcements`: where we give the **lastest information about the course**.
- `rl-discussions`: where you can **exchange about RL and share information**.
- `rl-study-group`: where you can **create and join study groups**.
The HF Community Server has a thriving community of human beings interested in many areas, so you can also learn from those. There are paper discussions, events, and many other things.
Was this useful? There are a couple of tips I can share with you:
- There are **voice channels** you can use as well, although most people prefer text chat.
- You can **use markdown style** for text chats. So if you're writing code, you can use that style. Sadly this does not work as well for links.
- You can open threads as well! It's a good idea when **it's a long conversation**.
I hope this is useful! And if you have questions, just ask!
See you later!
Huggy 🐶

View File

@@ -0,0 +1,126 @@
# Welcome to the 🤗 Deep Reinforcement Learning Course [[introduction]]
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/thumbnail.jpg" alt="Deep RL Course thumbnail" width="100%"/>
Welcome to the most fascinating topic in Artificial Intelligence: Deep Reinforcement Learning.
This course will **teach you about Deep Reinforcement Learning from beginner to expert**. Its completely free and open-source!
In this introduction unit youll:
- Learn more about the **course content**.
- **Define the path** youre going to take (either self-audit or certification process)
- Learn more about the **AI vs. AI challenges** you're going to participate to.
- Learn more **about us**.
- **Create your Hugging Face account** (its free).
- **Sign-up our Discord server**, the place where you can exchange with your classmates and us (the Hugging Face team).
Lets get started!
## What to expect? [[expect]]
In this course, you will:
- 📖 Study Deep Reinforcement Learning in **theory and practice.**
- 🧑‍💻 Learn to **use famous Deep RL libraries** such as [Stable Baselines3](https://stable-baselines3.readthedocs.io/en/master/), [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), [Sample Factory](https://samplefactory.dev/) and [CleanRL](https://github.com/vwxyzjn/cleanrl).
- 🤖 **Train agents in unique environments** such as [SnowballFight](https://huggingface.co/spaces/ThomasSimonini/SnowballFight), [Huggy the Doggo 🐶](https://huggingface.co/spaces/ThomasSimonini/Huggy), [MineRL (Minecraft )](https://minerl.io/), [VizDoom (Doom)](https://vizdoom.cs.put.edu.pl/) and classical ones such as [Space Invaders](https://www.gymlibrary.dev/environments/atari/) and [PyBullet](https://pybullet.org/wordpress/).
- 💾 Share your **trained agents with one line of code to the Hub** and also download powerful agents from the community.
- 🏆 Participate in challenges where you will **evaluate your agents against other teams. You'll also get to play against the agents you'll train.**
And more!
At the end of this course, **youll get a solid foundation from the basics to the SOTA (state-of-the-art) methods**.
You can find the syllabus on our website 👉 <a href="https://simoninithomas.github.io/deep-rl-course/">here</a>
Dont forget to **<a href="http://eepurl.com/ic5ZUD">sign up to the course</a>** (we are collecting your email to be able to **send you the links when each Unit is published and give you information about the challenges and updates).**
Sign up 👉 <a href="http://eepurl.com/ic5ZUD">here</a>
## What does the course look like? [[course-look-like]]
The course is composed of:
- *A theory part*: where you learn a **concept in theory (article)**.
- *A hands-on*: where youll learn **to use famous Deep RL libraries** to train your agents in unique environments. These hands-on will be **Google Colab notebooks with companion tutorial videos** if you prefer learning with video format!
- *Challenges*: you'll get to put your agent to compete against other agents in different challenges. There will also be leaderboards for you to compare the agents' performance.
## Two paths: choose your own adventure [[two-paths]]
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/two-paths.jpg" alt="Two paths" width="100%"/>
You can choose to follow this course either:
- *To get a certificate of completion*: you need to complete 80% of the assignments before the end of March 2023.
- *As a simple audit*: you can participate in all challenges and do assignments if you want, but you have no deadlines.
Both paths **are completely free**.
Whatever path you choose, we advise you **to follow the recommended pace to enjoy the course and challenges with your fellow classmates.**
You don't need to tell us which path you choose. At the end of March, when we verify the assignments **if you get more than 80% of the assignments done, you'll get a certificate.**
## How to get most of the course? [[advice]]
To get most of the course, we have some advice:
1. <a href="https://discord.gg/ydHrjt3WP5">Join or create study groups in Discord </a>: studying in groups is always easier. To do that, you need to join our discord server. If you're new to Discord, no worries! We have some tools that will help you learn about it.
2. **Do the quizzes and assignments**: the best way to learn is to do and test yourself.
3. **Define a schedule to stay in sync**: you can use our recommended pace schedule below or create yours.
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/advice.jpg" alt="Course advice" width="100%"/>
## What tools do I need? [[tools]]
You need only 3 things:
- *A computer* with an internet connection.
- *Google Colab (free version)*: most of our hands-on will use Google Colab, the **free version is enough.**
- A *Hugging Face Account*: to push and load models. If you dont have an account yet, you can create one **[here](https://hf.co/join)** (its free).
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/tools.jpg" alt="Course tools needed" width="100%"/>
## What is the recommended pace? [[recommended-pace]]
We defined a planning that you can follow to keep up the pace of the course.
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/pace1.jpg" alt="Course advice" width="100%"/>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/pace2.jpg" alt="Course advice" width="100%"/>
Each chapter in this course is designed **to be completed in 1 week, with approximately 3-4 hours of work per week**. However, you can take as much time as necessary to complete the course. If you want to dive into a topic more in-depth, we'll provide additional resources to help you achieve that.
## Who are we [[who-are-we]]
About the author:
- <a href="https://twitter.com/ThomasSimonini">Thomas Simonini</a> is a Developer Advocate at Hugging Face 🤗 specializing in Deep Reinforcement Learning. He founded the Deep Reinforcement Learning Course in 2018, which became one of the most used courses in Deep RL.
About the team:
- <a href="https://twitter.com/osanseviero">Omar Sanseviero</a> is a Machine Learning Engineer at Hugging Face where he works in the intersection of ML, Community and Open Source. Previously, Omar worked as a Software Engineer at Google in the teams of Assistant and TensorFlow Graphics. He is from Peru and likes llamas 🦙.
- <a href="https://twitter.com/RisingSayak"> Sayak Paul</a> is a Developer Advocate Engineer at Hugging Face. He's interested in the area of representation learning (self-supervision, semi-supervision, model robustness). And he loves watching crime and action thrillers 🔪.
## When do the challenges start? [[challenges]]
In this new version of the course, you have two types of challenges:
- A leaderboard to compare your agent's performance to other classmates'.
- AI vs. AI challenges where you can train your agent and compete against other classmates' agents.
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/challenges.jpg" alt="Challenges" width="100%"/>
These AI vs.AI challenges will be announced **later in December**.
## I found a bug, or I want to improve the course [[contribute]]
Contributions are welcomed 🤗
- If you *found a bug 🐛 in a notebook*, please <a href="https://github.com/huggingface/deep-rl-class/issues">open an issue</a> and **describe the problem**.
- If you *want to improve the course*, you can <a href="https://github.com/huggingface/deep-rl-class/pulls">open a Pull Request.</a>
## I still have questions [[questions]]
In that case, <a href="https://simoninithomas.github.io/deep-rl-course/#faq">check our FAQ</a>. And if the question is not in it, ask your question in our <a href="https://discord.gg/ydHrjt3WP5">discord server #rl-discussions.</a>

30
units/en/unit0/setup.mdx Normal file
View File

@@ -0,0 +1,30 @@
# Setup [[setup]]
After all this information, it's time to get started. We're going to do two things:
1. **Create your Hugging Face account** if it's not already done
2. **Sign up to Discord and introduce yourself** (don't be shy 🤗)
### Let's create my Hugging Face account
(If it's not already done) create an account to HF <a href="https://huggingface.co/join">here</a>
### Let's join our Discord server
You can now sign up for our Discord Server. This is the place where you **can exchange with the community and with us, create and join study groups to grow each other and more**
👉🏻 Join our discord server <a href="https://discord.gg/ydHrjt3WP5">here.</a>
When you join, remember to introduce yourself in #introduce-yourself and sign-up for reinforcement channels in #role-assignments.
We have multiple RL-related channels:
- `rl-announcements`: where we give the last information about the course.
- `rl-discussions`: where you can exchange about RL and share information.
- `rl-study-group`: where you can create and join study groups.
If this is your first time using Discord, we wrote a Discord 101 to get the best practices. Check the next section.
Congratulations! **You've just finished the on-boarding**. You're now ready to start to learn Deep Reinforcement Learning. Have fun!
### Keep Learning, stay awesome 🤗

View File

@@ -0,0 +1,13 @@
# Additional Readings [[additional-readings]]
These are **optional readings** if you want to go deeper.
## Deep Reinforcement Learning [[deep-rl]]
- [Reinforcement Learning: An Introduction, Richard Sutton and Andrew G. Barto Chapter 1, 2 and 3](http://incompleteideas.net/book/RLbook2020.pdf)
- [Foundations of Deep RL Series, L1 MDPs, Exact Solution Methods, Max-ent RL by Pieter Abbeel](https://youtu.be/2GwBez0D20A)
- [Spinning Up RL by OpenAI Part 1: Key concepts of RL](https://spinningup.openai.com/en/latest/spinningup/rl_intro.html)
## Gym [[gym]]
- [Getting Started With OpenAI Gym: The Basic Building Blocks](https://blog.paperspace.com/getting-started-with-openai-gym/)

View File

@@ -0,0 +1,16 @@
# Conclusion [[conclusion]]
Congrats on finishing this unit! **That was the biggest one**, and there was a lot of information. And congrats on finishing the tutorial. Youve just trained your first Deep RL agents and shared it with the community! 🥳
It's **normal if you still feel confused with some of these elements**. This was the same for me and for all people who studied RL.
**Take time to really grasp the material** before continuing. Its important to master these elements and having a solid foundations before entering the fun part.
Naturally, during the course, were going to use and explain these terms again, but its better to understand them before diving into the next units.
In the next (bonus) unit, were going to reinforce what we just learned by **training Huggy the Dog to fetch the stick**.
You will be able then to play with him 🤗.
<video src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/huggy.mp4" alt="Huggy" type="video/mp4">
</video>

View File

@@ -0,0 +1,21 @@
# The “Deep” in Reinforcement Learning [[deep-rl]]
<Tip>
What we've talked about so far is Reinforcement Learning. But where does the "Deep" come into play?
</Tip>
Deep Reinforcement Learning introduces **deep neural networks to solve Reinforcement Learning problems** — hence the name “deep”.
For instance, in the next unit, well learn about two value-based algorithms: Q-Learning (classic Reinforcement Learning) and then Deep Q-Learning.
Youll see the difference is that in the first approach, **we use a traditional algorithm** to create a Q table that helps us find what action to take for each state.
In the second approach, **we will use a Neural Network** (to approximate the Q value).
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/deep.jpg" alt="Value based RL"/>
<figcaption>Schema inspired by the Q learning notebook by Udacity
</figcaption>
</figure>
If you are not familiar with Deep Learning you definitely should watch [the FastAI Practical Deep Learning for Coders](https://course.fast.a) (Free).

View File

@@ -0,0 +1,37 @@
# The Exploration/Exploitation trade-off [[exp-exp-tradeoff]]
Finally, before looking at the different methods to solve Reinforcement Learning problems, we must cover one more very important topic: *the exploration/exploitation trade-off.*
- *Exploration* is exploring the environment by trying random actions in order to **find more information about the environment.**
- *Exploitation* is **exploiting known information to maximize the reward.**
Remember, the goal of our RL agent is to maximize the expected cumulative reward. However, **we can fall into a common trap**.
Lets take an example:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/exp_1.jpg" alt="Exploration" width="100%">
In this game, our mouse can have an **infinite amount of small cheese** (+1 each). But at the top of the maze, there is a gigantic sum of cheese (+1000).
However, if we only focus on exploitation, our agent will never reach the gigantic sum of cheese. Instead, it will only exploit **the nearest source of rewards,** even if this source is small (exploitation).
But if our agent does a little bit of exploration, it can **discover the big reward** (the pile of big cheese).
This is what we call the exploration/exploitation trade-off. We need to balance how much we **explore the environment** and how much we **exploit what we know about the environment.**
Therefore, we must **define a rule that helps to handle this trade-off**. Well see the different ways to handle it in the future units.
If its still confusing, **think of a real problem: the choice of picking a restaurant:**
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/exp_2.jpg" alt="Exploration">
<figcaption>Source: <a href="[http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_13_exploration.pdf](http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_13_exploration.pdf)"> Berkley AI Course</a>
</figcaption>
</figure>
- *Exploitation*: You go every day to the same one that you know is good and **take the risk to miss another better restaurant.**
- *Exploration*: Try restaurants you never went to before, with the risk of having a bad experience **but the probable opportunity of a fantastic experience.**
To recap:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/expexpltradeoff.jpg" alt="Exploration Exploitation Tradeoff" width="100%">

669
units/en/unit1/hands-on.mdx Normal file

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,28 @@
# Introduction to Deep Reinforcement Learning [[introduction-to-deep-reinforcement-learning]]
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/thumbnail.jpg" alt="Unit 1 thumbnail" width="100%">
Welcome to the most fascinating topic in Artificial Intelligence: **Deep Reinforcement Learning.**
Deep RL is a type of Machine Learning where an agent learns **how to behave** in an environment **by performing actions** and **seeing the results.**
In this first unit, **you'll learn the foundations of Deep Reinforcement Learning.**
Then, you'll **train your Deep Reinforcement Learning agent, a lunar lander to land correctly on the Moon** using <a href="https://stable-baselines3.readthedocs.io/en/master/"> Stable-Baselines3 </a>, a Deep Reinforcement Learning library.
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/lunarLander.gif" alt="LunarLander">
And finally, you'll **upload this trained agent to the Hugging Face Hub 🤗, a free, open platform where people can share ML models, datasets, and demos.**
It's essential **to master these elements** before diving into implementing Deep Reinforcement Learning agents. The goal of this chapter is to give you solid foundations.
After this unit, in a bonus unit, you'll be **able to train Huggy the Dog 🐶 to fetch the stick and play with him 🤗**.
<video src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/huggy.mp4" alt="Huggy" type="video/mp4">
</video>
So let's get started! 🚀

168
units/en/unit1/quiz.mdx Normal file
View File

@@ -0,0 +1,168 @@
# Quiz [[quiz]]
The best way to learn and [to avoid the illusion of competence](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf) **is to test yourself.** This will help you to find **where you need to reinforce your knowledge**.
### Q1: What is Reinforcement Learning?
<details>
<summary>Solution</summary>
Reinforcement learning is a **framework for solving control tasks (also called decision problems)** by building agents that learn from the environment by interacting with it through trial and error and **receiving rewards (positive or negative) as unique feedback**.
</details>
### Q2: Define the RL Loop
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/rl-loop-ex.jpg" alt="Exercise RL Loop"/>
At every step:
- Our Agent receives ______ from the environment
- Based on that ______ the Agent takes an ______
- Our Agent will move to the right
- The Environment goes to a ______
- The Environment gives a ______ to the Agent
<Question
choices={[
{
text: "an action a0, action a0, state s0, state s1, reward r1",
explain: "At every step: Our Agent receives **state s0** from the environment. Based on that **state s0** the Agent takes an **action a0**. Our Agent will move to the right. The Environment goes to a **new state s1**. The Environment gives **a reward r1** to the Agent."
},
{
text: "state s0, state s0, action a0, new state s1, reward r1",
explain: "",
correct: true
},
{
text: "a state s0, state s0, action a0, state s1, action a1",
explain: "At every step: Our Agent receives **state s0** from the environment. Based on that **state s0** the Agent takes an **action a0**. Our Agent will move to the right. The Environment goes to a **new state s1**. The Environment gives **a reward r1** to the Agent."
}
]}
/>
### Q3: What's the difference between a state and an observation?
<Question
choices={[
{
text: "The state is a complete description of the state of the world (there is no hidden information)",
explain: "",
correct: true
},
{
text: "The state is a partial description of the state",
explain: ""
},
{
text: "The observation is a complete description of the state of the world (there is no hidden information)",
explain: ""
},
{
text: "The observation is a partial description of the state",
explain: "",
correct: true
},
{
text: "We receive a state when we play with chess environment",
explain: "Since we have access to the whole checkboard information.",
correct: true
},
{
text: "We receive an observation when we play with chess environment",
explain: "Since we have access to the whole checkboard information."
},
{
text: "We receive a state when we play with Super Mario Bros",
explain: "We only see a part of the level close to the player, so we receive an observation."
},
{
text: "We receive an observation when we play with chess environment",
explain: "We only see a part of the level close to the player.",
correct: true
}
]}
/>
### Q4: A task is an instance of a Reinforcement Learning problem. What are the two types of tasks?
<Question
choices={[
{
text: "Episodic",
explain: "In Episodic task, we have a starting point and an ending point (a terminal state). This creates an episode: a list of States, Actions, Rewards, and new States. For instance, think about Super Mario Bros: an episode begin at the launch of a new Mario Level and ending when youre killed or you reached the end of the level.",
correct: true
},
{
text: "Recursive",
explain: ""
},
{
text: "Adversarial",
explain: ""
},
{
text: "Continuing",
explain: "Continuing tasks are tasks that continue forever (no terminal state). In this case, the agent must learn how to choose the best actions and simultaneously interact with the environment.",
correct: true
}
]}
/>
### Q5: What is the exploration/exploitation tradeoff?
<details>
<summary>Solution</summary>
In Reinforcement Learning, we need to **balance how much we explore the environment and how much we exploit what we know about the environment**.
- *Exploration* is exploring the environment by **trying random actions in order to find more information about the environment**.
- *Exploitation* is **exploiting known information to maximize the reward**.
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/expexpltradeoff.jpg" alt="Exploration Exploitation Tradeoff" width="100%">
</details>
### Q6: What is a policy?
<details>
<summary>Solution</summary>
- The Policy π **is the brain of our Agent**. Its the function that tells us what action to take given the state we are in. So it defines the agents behavior at a given time.
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/policy_1.jpg" alt="Policy">
</details>
### Q7: What are value-based methods?
<details>
<summary>Solution</summary>
- Value-based methods is one of the main approaches for solving RL problems.
- In Value-based methods, instead of training a policy function, **we train a value function that maps a state to the expected value of being at that state**.
</details>
### Q8: What are policy-based methods?
<details>
<summary>Solution</summary>
- In *Policy-Based Methods*, we learn a **policy function directly**.
- This policy function will **map from each state to the best corresponding action at that state**. Or a **probability distribution over the set of possible actions at that state**.
</details>
Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the chapter to reinforce (😏) your knowledge, but **do not worry**: during the course we'll go over again of these concepts, and you'll **reinforce your theoretical knowledge with hands-on**.

View File

@@ -0,0 +1,144 @@
# The Reinforcement Learning Framework [[the-reinforcement-learning-framework]]
## The RL Process [[the-rl-process]]
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/RL_process.jpg" alt="The RL process" width="100%">
<figcaption>The RL Process: a loop of state, action, reward and next state</figcaption>
<figcaption>Source: <a href="http://incompleteideas.net/book/RLbook2020.pdf">Reinforcement Learning: An Introduction, Richard Sutton and Andrew G. Barto</a></figcaption>
</figure>
To understand the RL process, lets imagine an agent learning to play a platform game:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/RL_process_game.jpg" alt="The RL process" width="100%">
- Our Agent receives **state \\(S_0\\)** from the **Environment** — we receive the first frame of our game (Environment).
- Based on that **state \\(S_0\\),** the Agent takes **action \\(A_0\\)** — our Agent will move to the right.
- Environment goes to a **new** **state \\(S_1\\)** — new frame.
- The environment gives some **reward \\(R_1\\)** to the Agent — were not dead *(Positive Reward +1)*.
This RL loop outputs a sequence of **state, action, reward and next state.**
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/sars.jpg" alt="State, Action, Reward, Next State" width="100%">
The agent's goal is to _maximize_ its cumulative reward, **called the expected return.**
## The reward hypothesis: the central idea of Reinforcement Learning [[reward-hypothesis]]
⇒ Why is the goal of the agent to maximize the expected return?
Because RL is based on the **reward hypothesis**, which is that all goals can be described as the **maximization of the expected return** (expected cumulative reward).
Thats why in Reinforcement Learning, **to have the best behavior,** we aim to learn to take actions that **maximize the expected cumulative reward.**
## Markov Property [[markov-property]]
In papers, youll see that the RL process is called the **Markov Decision Process** (MDP).
Well talk again about the Markov Property in the following units. But if you need to remember something today about it, it's this: the Markov Property implies that our agent needs **only the current state to decide** what action to take and **not the history of all the states and actions** they took before.
## Observations/States Space [[obs-space]]
Observations/States are the **information our agent gets from the environment.** In the case of a video game, it can be a frame (a screenshot). In the case of the trading agent, it can be the value of a certain stock, etc.
There is a differentiation to make between *observation* and *state*, however:
- *State s*: is **a complete description of the state of the world** (there is no hidden information). In a fully observed environment.
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/chess.jpg" alt="Chess">
<figcaption>In chess game, we receive a state from the environment since we have access to the whole check board information.</figcaption>
</figure>
In a chess game, we have access to the whole board information, so we receive a state from the environment. In other words, the environment is fully observed.
- *Observation o*: is a **partial description of the state.** In a partially observed environment.
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/mario.jpg" alt="Mario">
<figcaption>In Super Mario Bros, we only see a part of the level close to the player, so we receive an observation.</figcaption>
</figure>
In Super Mario Bros, we only see a part of the level close to the player, so we receive an observation.
In Super Mario Bros, we are in a partially observed environment. We receive an observation **since we only see a part of the level.**
<Tip>
In this course, we use the term "state" to denote both state and observation, but we will make the distinction in implementations.
</Tip>
To recap:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/obs_space_recap.jpg" alt="Obs space recap" width="100%">
## Action Space [[action-space]]
The Action space is the set of **all possible actions in an environment.**
The actions can come from a *discrete* or *continuous space*:
- *Discrete space*: the number of possible actions is **finite**.
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/mario.jpg" alt="Mario">
<figcaption>Again, in Super Mario Bros, we have only 5 possible actions: 4 directions and jumping</figcaption>
</figure>
In Super Mario Bros, we have a finite set of actions since we have only 4 directions and jump.
- *Continuous space*: the number of possible actions is **infinite**.
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/self_driving_car.jpg" alt="Self Driving Car">
<figcaption>A Self Driving Car agent has an infinite number of possible actions since it can turn left 20°, 21,1°, 21,2°, honk, turn right 20°…
</figcaption>
</figure>
To recap:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/action_space.jpg" alt="Action space recap" width="100%">
Taking this information into consideration is crucial because it will **have importance when choosing the RL algorithm in the future.**
## Rewards and the discounting [[rewards]]
The reward is fundamental in RL because its **the only feedback** for the agent. Thanks to it, our agent knows **if the action taken was good or not.**
The cumulative reward at each time step **t** can be written as:
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/rewards_1.jpg" alt="Rewards">
<figcaption>The cumulative reward equals to the sum of all rewards of the sequence.
</figcaption>
</figure>
Which is equivalent to:
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/rewards_2.jpg" alt="Rewards">
<figcaption>The cumulative reward = rt+1 (rt+k+1 = rt+0+1 = rt+1)+ rt+2 (rt+k+1 = rt+1+1 = rt+2) + ...
</figcaption>
</figure>
However, in reality, **we cant just add them like that.** The rewards that come sooner (at the beginning of the game) **are more likely to happen** since they are more predictable than the long-term future reward.
Lets say your agent is this tiny mouse that can move one tile each time step, and your opponent is the cat (that can move too). The mouse's goal is **to eat the maximum amount of cheese before being eaten by the cat.**
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/rewards_3.jpg" alt="Rewards" width="100%">
As we can see in the diagram, **its more probable to eat the cheese near us than the cheese close to the cat** (the closer we are to the cat, the more dangerous it is).
Consequently, **the reward near the cat, even if it is bigger (more cheese), will be more discounted** since were not really sure well be able to eat it.
To discount the rewards, we proceed like this:
1. We define a discount rate called gamma. **It must be between 0 and 1.** Most of the time between **0.99 and 0.95**.
- The larger the gamma, the smaller the discount. This means our agent **cares more about the long-term reward.**
- On the other hand, the smaller the gamma, the bigger the discount. This means our **agent cares more about the short term reward (the nearest cheese).**
2. Then, each reward will be discounted by gamma to the exponent of the time step. As the time step increases, the cat gets closer to us, **so the future reward is less and less likely to happen.**
Our discounted expected cumulative reward is:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/rewards_4.jpg" alt="Rewards" width="100%">

View File

@@ -0,0 +1,19 @@
# Summary [[summary]]
That was a lot of information! Let's summarize:
- Reinforcement Learning is a computational approach of learning from action. We build an agent that learns from the environment **by interacting with it through trial and error** and receiving rewards (negative or positive) as feedback.
- The goal of any RL agent is to maximize its expected cumulative reward (also called expected return) because RL is based on the **reward hypothesis**, which is that **all goals can be described as the maximization of the expected cumulative reward.**
- The RL process is a loop that outputs a sequence of **state, action, reward and next state.**
- To calculate the expected cumulative reward (expected return), we discount the rewards: the rewards that come sooner (at the beginning of the game) **are more probable to happen since they are more predictable than the long term future reward.**
- To solve an RL problem, you want to **find an optimal policy**. The policy is the “brain” of your agent, which will tell us **what action to take given a state.** The optimal policy is the one which **gives you the actions that maximize the expected return.**
- There are two ways to find your optimal policy:
1. By training your policy directly: **policy-based methods.**
2. By training a value function that tells us the expected return the agent will get at each state and use this function to define our policy: **value-based methods.**
- Finally, we speak about Deep RL because we introduce **deep neural networks to estimate the action to take (policy-based) or to estimate the value of a state (value-based)** hence the name “deep”.

27
units/en/unit1/tasks.mdx Normal file
View File

@@ -0,0 +1,27 @@
# Type of tasks [[tasks]]
A task is an **instance** of a Reinforcement Learning problem. We can have two types of tasks: **episodic** and **continuing**.
## Episodic task [[episodic-task]]
In this case, we have a starting point and an ending point **(a terminal state). This creates an episode**: a list of States, Actions, Rewards, and new States.
For instance, think about Super Mario Bros: an episode begin at the launch of a new Mario Level and ending **when youre killed or you reached the end of the level.**
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/mario.jpg" alt="Mario">
<figcaption>Beginning of a new episode.
</figcaption>
</figure>
## Continuing tasks [[continuing-tasks]]
These are tasks that continue forever (no terminal state). In this case, the agent must **learn how to choose the best actions and simultaneously interact with the environment.**
For instance, an agent that does automated stock trading. For this task, there is no starting point and terminal state. **The agent keeps running until we decide to stop it.**
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/stock.jpg" alt="Stock Market" width="100%">
To recap:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/tasks.jpg" alt="Tasks recap" width="100%">

View File

@@ -0,0 +1,99 @@
# Two main approaches for solving RL problems [[two-methods]]
<Tip>
Now that we learned the RL framework, how do we solve the RL problem?
</Tip>
In other terms, how to build an RL agent that can **select the actions that maximize its expected cumulative reward?**
## The Policy π: the agents brain [[policy]]
The Policy **π** is the **brain of our Agent**, its the function that tells us what **action to take given the state we are.** So it **defines the agents behavior** at a given time.
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/policy_1.jpg" alt="Policy">
<figcaption>Think of policy as the brain of our agent, the function that will tell us the action to take given a state
</figcaption>
</figure>
Think of policy as the brain of our agent, the function that will tells us the action to take given a state
This Policy **is the function we want to learn**, our goal is to find the optimal policy π*, the policy that** maximizes **expected return** when the agent acts according to it. We find this *π through training.**
There are two approaches to train our agent to find this optimal policy π*:
- **Directly,** by teaching the agent to learn which **action to take,** given the current state: **Policy-Based Methods.**
- Indirectly, **teach the agent to learn which state is more valuable** and then take the action that **leads to the more valuable states**: Value-Based Methods.
## Policy-Based Methods [[policy-based]]
In Policy-Based methods, **we learn a policy function directly.**
This function will define a mapping between each state and the best corresponding action. We can also say that it'll define **a probability distribution over the set of possible actions at that state.**
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/policy_2.jpg" alt="Policy">
<figcaption>As we can see here, the policy (deterministic) <b>directly indicates the action to take for each step.</b>
</figcaption>
</figure>
We have two types of policies:
- *Deterministic*: a policy at a given state **will always return the same action.**
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/policy_3.jpg" alt="Policy"/>
<figcaption>action = policy(state)
</figcaption>
</figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/policy_4.jpg" alt="Policy" width="100%"/>
- *Stochastic*: outputs **a probability distribution over actions.**
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/policy_5.jpg" alt="Policy"/>
<figcaption>policy(actions | state) = probability distribution over the set of actions given the current state
</figcaption>
</figure>
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/mario.jpg" alt="Mario"/>
<figcaption>Given an initial state, our stochastic policy will output probability distributions over the possible actions at that state.
</figcaption>
</figure>
If we recap:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/pbm_1.jpg" alt="Pbm recap" width="100%">
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/pbm_2.jpg" alt="Pbm recap" width="100%">
## Value-based methods [[value-based]]
In value-based methods, instead of training a policy function, we **train a value function** that maps a state to the expected value **of being at that state.**
The value of a state is the **expected discounted return** the agent can get if it **starts in that state, and then act according to our policy.**
“Act according to our policy” just means that our policy is **“going to the state with the highest value”.**
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/value_1.jpg" alt="Value based RL" width="100%">
Here we see that our value function **defined value for each possible state.**
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/value_2.jpg" alt="Value based RL"/>
<figcaption>Thanks to our value function, at each step our policy will select the state with the biggest value defined by the value function: -7, then -6, then -5 (and so on) to attain the goal.
</figcaption>
</figure>
Thanks to our value function, at each step our policy will select the state with the biggest value defined by the value function: -7, then -6, then -5 (and so on) to attain the goal.
If we recap:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/vbm_1.jpg" alt="Vbm recap" width="100%">
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/vbm_2.jpg" alt="Vbm recap" width="100%">

View File

@@ -0,0 +1,40 @@
# What is Reinforcement Learning? [[what-is-reinforcement-learning]]
To understand Reinforcement Learning, lets start with the big picture.
## The big picture [[the-big-picture]]
The idea behind Reinforcement Learning is that an agent (an AI) will learn from the environment by **interacting with it** (through trial and error) and **receiving rewards** (negative or positive) as feedback for performing actions.
Learning from interactions with the environment **comes from our natural experiences.**
For instance, imagine putting your little brother in front of a video game he never played, giving him a controller, and leaving him alone.
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/Illustration_1.jpg" alt="Illustration_1" width="100%">
Your brother will interact with the environment (the video game) by pressing the right button (action). He got a coin, thats a +1 reward. Its positive, he just understood that in this game **he must get the coins.**
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/Illustration_2.jpg" alt="Illustration_2" width="100%">
But then, **he presses right again** and he touches an enemy. He just died, so that's a -1 reward.
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/Illustration_3.jpg" alt="Illustration_3" width="100%">
By interacting with his environment through trial and error, your little brother understood that **he needed to get coins in this environment but avoid the enemies.**
**Without any supervision**, the child will get better and better at playing the game.
Thats how humans and animals learn, **through interaction.** Reinforcement Learning is just a **computational approach of learning from actions.**
### A formal definition [[a-formal-definition]]
If we take now a formal definition:
<Tip>
Reinforcement learning is a framework for solving control tasks (also called decision problems) by building agents that learn from the environment by interacting with it through trial and error and receiving rewards (positive or negative) as unique feedback.
</Tip>
But how does Reinforcement Learning work?

View File

@@ -0,0 +1,10 @@
# Conclusion [[conclusion]]
Congrats on finishing this bonus unit!
You can now sit and enjoy playing with your Huggy 🐶. And don't **forget to spread the love by sharing Huggy with your friends 🤗**. And if you share about it on social media, **please tag us @huggingface and me @simoninithomas**
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/huggy-cover.jpeg" alt="Huggy cover" width="100%">
### Keep Learning, Stay Awesome 🤗

View File

@@ -0,0 +1,65 @@
# How Huggy works [[how-huggy-works]]
Huggy is a Deep Reinforcement Learning environment made by Hugging Face and based on [Puppo the Corgi, a project by the Unity MLAgents team](https://blog.unity.com/technology/puppo-the-corgi-cuteness-overload-with-the-unity-ml-agents-toolkit).
This environment was created using the [Unity game engine](https://unity.com/) and [MLAgents](https://github.com/Unity-Technologies/ml-agents). ML-Agents is a toolkit for the game engine from Unity that allows us to **create environments using Unity or use pre-made environments to train our agents**.
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/huggy.jpg" alt="Huggy" width="100%">
In this environment we aim to train Huggy to **fetch the stick we throw at him. It means he needs to move correctly toward the stick**.
## The State Space, what Huggy perceives. [[state-space]]
Huggy doesn't "see" his environment. Instead, we provide him information about the environment:
* The target (stick) position
* The relative position between himself and the target
* The orientation of his legs.
Given all this information, Huggy can **use his policy to determine which action to take next to fulfill his goal**.
## The Action Space, what moves Huggy can perform [[action-space]]
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/huggy-action.jpg" alt="Huggy action" width="100%">
**Joint motors drive Huggy legs**. It means that to get the target, Huggy needs to **learn to rotate the joint motors of each of his legs correctly so he can move**.
## The Reward Function [[reward-function]]
The reward function is designed so that **Huggy will fulfill his goal**: fetch the stick.
Remember that one of the foundations of Reinforcement Learning is the *reward hypothesis*: a goal can be described as the **maximization of the expected cumulative reward**.
Here, our goal is that Huggy **goes towards the stick but without spinning too much**. Hence, our reward function must translate this goal.
Our reward function:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/reward.jpg" alt="Huggy reward function" width="100%">
- *Orientation bonus*: we **reward him for getting close to the target**.
- *Time penalty*: a fixed-time penalty given at every action to **force him to get to the stick as fast as possible**.
- *Rotation penalty*: we penalize Huggy if **he spins too much and turns too quickly**.
- *Getting to the target reward*: we reward Huggy for **reaching the target**.
If you want to see what this reward function looks like mathematically, check [Puppo the Corgi presentation](https://blog.unity.com/technology/puppo-the-corgi-cuteness-overload-with-the-unity-ml-agents-toolkit).
## Train Huggy
Huggy aims **to learn to run correctly and as fast as possible toward the goal**. To do that, at every step and given the environment observation, he needs to decide how to rotate each joint motor of his legs to move correctly (not spinning too much) and towards the goal.
The training loop looks like this:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/huggy-loop.jpg" alt="Huggy loop" width="100%">
The training environment looks like this:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/training-env.jpg" alt="Huggy training env" width="100%">
It's a place where a **stick is spawned randomly**. When Huggy reaches it, the stick get spawned somewhere else.
We built **multiple copies of the environment for the training**. This helps speed up the training by providing more diverse experiences.
Now that you have the big picture of the environment, you're ready to train Huggy to fetch the stick.
To do that, we're going to use [MLAgents](https://github.com/Unity-Technologies/ml-agents). Don't worry if you have never used it before. In this unit we'll use Google Colab to train Huggy, and then you'll be able to load your trained Huggy and play with him directly in the browser.
In a future unit, we will study more in-depth MLAgents and how it works. But for now, we keep things simple by just using the provided implementation.

View File

@@ -0,0 +1,7 @@
# Introduction [[introduction]]
In this bonus unit, we'll reinforce what we learned in the first unit by teaching Huggy the Dog to fetch the stick and then [play with him directly in your browser](https://huggingface.co/spaces/ThomasSimonini/Huggy) 🐶
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit2/thumbnail.png" alt="Unit bonus 1 thumbnail" width="100%">
So let's get started 🚀

View File

@@ -0,0 +1,19 @@
# Play with Huggy [[play]]
Now that you've trained Huggy and pushed it to the Hub. **You will be able to play with him ❤️**
For this step its simple:
- Open the game Huggy in your browser: https://huggingface.co/spaces/ThomasSimonini/Huggy
- Click on Play with my Huggy model
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/load-huggy.jpg" alt="load-huggy" width="100%">
1. In step 1, choose your model repository which is the model id (in my case ThomasSimonini/ppo-Huggy).
2. In step 2, **choose what model you want to replay**:
- I have multiple one, since we saved a model every 500000 timesteps.
- But if I want the more recent I choose Huggy.onnx
👉 Whats nice **is to try with different models step to see the improvement of the agent.**

View File

@@ -0,0 +1,260 @@
# Let's train and play with Huggy 🐶 [[train]]
<CourseFloatingBanner classNames="absolute z-10 right-0 top-0"
notebooks={[
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/bonus-unit1/bonus-unit1.ipynb"}
]}
askForHelpUrl="hf.co/join/discord" />
## Let's train Huggy 🐶
**To start to train Huggy, click on Open In Colab button** 👇 :
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/bonus-unit1/bonus-unit1.ipynb)
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit2/thumbnail.png" alt="Bonus Unit 1Thumbnail">
In this notebook, we'll reinforce what we learned in the first Unit by **teaching Huggy the Dog to fetch the stick and then play with it directly in your browser**
⬇️ Here is an example of what **you will achieve at the end of the unit.** ⬇️ (launch ▶ to see)
```python
%%html
<video controls autoplay><source src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/huggy.mp4" type="video/mp4"></video>
```
### The environment 🎮
- Huggy the Dog, an environment created by [Thomas Simonini](https://twitter.com/ThomasSimonini) based on [Puppo The Corgi](https://blog.unity.com/technology/puppo-the-corgi-cuteness-overload-with-the-unity-ml-agents-toolkit)
### The library used 📚
- [MLAgents (Hugging Face version)](https://github.com/huggingface/ml-agents)
We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the Github Repo](https://github.com/huggingface/deep-rl-class/issues).
## Objectives of this notebook 🏆
At the end of the notebook, you will:
- Understand **the state space, action space and reward function used to train Huggy**.
- **Train your own Huggy** to fetch the stick.
- Be able to play **with your trained Huggy directly in your browser**.
## Prerequisites 🏗️
Before diving into the notebook, you need to:
🔲 📚 **Develop an understanding of the foundations of Reinforcement learning** (MC, TD, Rewards hypothesis...) by doing Unit 1
🔲 📚 **Read the introduction to Huggy** by doing Bonus Unit 1
## Set the GPU 💪
- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step1.jpg" alt="GPU Step 1">
- `Hardware Accelerator > GPU`
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step2.jpg" alt="GPU Step 2">
## Clone the repository and install the dependencies 🔽
- We need to clone the repository, that **contains the experimental version of the library that allows you to push your trained agent to the Hub.**
```bash
# Clone this specific repository (can take 3min)
git clone https://github.com/huggingface/ml-agents/
```
```bash
# Go inside the repository and install the package (can take 3min)
%cd ml-agents
pip3 install -e ./ml-agents-envs
pip3 install -e ./ml-agents
```
## Download and move the environment zip file in `./trained-envs-executables/linux/`
- Our environment executable is in a zip file.
- We need to download it and place it to `./trained-envs-executables/linux/`
```bash
mkdir ./trained-envs-executables
mkdir ./trained-envs-executables/linux
```
```bash
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1zv3M95ZJTWHUVOWT6ckq_cm98nft8gdF' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1zv3M95ZJTWHUVOWT6ckq_cm98nft8gdF" -O ./trained-envs-executables/linux/Huggy.zip && rm -rf /tmp/cookies.txt
```
Download the file Huggy.zip from https://drive.google.com/uc?export=download&id=1zv3M95ZJTWHUVOWT6ckq_cm98nft8gdF using `wget`. Check out the full solution to download large files from GDrive [here](https://bcrf.biochem.wisc.edu/2021/02/05/download-google-drive-files-using-wget/)
```bash
%%capture
unzip -d ./trained-envs-executables/linux/ ./trained-envs-executables/linux/Huggy.zip
```
Make sure your file is accessible
```bash
chmod -R 755 ./trained-envs-executables/linux/Huggy
```
## Let's recap how this environment works
### The State Space: what Huggy perceives.
Huggy doesn't "see" his environment. Instead, we provide him information about the environment:
- The target (stick) position
- The relative position between himself and the target
- The orientation of his legs.
Given all this information, Huggy **can decide which action to take next to fulfill his goal**.
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/huggy.jpg" alt="Huggy" width="100%">
### The Action Space: what moves Huggy can do
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/huggy-action.jpg" alt="Huggy action" width="100%">
**Joint motors drive huggy legs**. It means that to get the target, Huggy needs to **learn to rotate the joint motors of each of his legs correctly so he can move**.
### The Reward Function
The reward function is designed so that **Huggy will fulfill his goal** : fetch the stick.
Remember that one of the foundations of Reinforcement Learning is the *reward hypothesis*: a goal can be described as the **maximization of the expected cumulative reward**.
Here, our goal is that Huggy **goes towards the stick but without spinning too much**. Hence, our reward function must translate this goal.
Our reward function:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/reward.jpg" alt="Huggy reward function" width="100%">
- *Orientation bonus*: we **reward him for getting close to the target**.
- *Time penalty*: a fixed-time penalty given at every action to **force him to get to the stick as fast as possible**.
- *Rotation penalty*: we penalize Huggy if **he spins too much and turns too quickly**.
- *Getting to the target reward*: we reward Huggy for **reaching the target**.
## Check the Huggy config file
- In ML-Agents, you define the **training hyperparameters into config.yaml files.**
- For the scope of this notebook, we're not going to modify the hyperparameters, but if you want to try as an experiment, you should also try to modify some other hyperparameters, Unity provides very [good documentation explaining each of them here](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-Configuration-File.md).
- **In the case you want to modify the hyperparameters**, in Google Colab notebook, you can click here to open the config.yaml: `/content/ml-agents/config/ppo/Huggy.yaml`
Were now ready to train our agent 🔥.
## Train our agent
To train our agent, we just need to **launch mlagents-learn and select the executable containing the environment.**
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/mllearn.png" alt="ml learn function" width="100%">
With ML Agents, we run a training script. We define four parameters:
1. `mlagents-learn <config>`: the path where the hyperparameter config file is.
2. `--env`: where the environment executable is.
3. `--run_id`: the name you want to give to your training run id.
4. `--no-graphics`: to not launch the visualization during the training.
Train the model and use the `--resume` flag to continue training in case of interruption.
> It will fail first time when you use `--resume`, try running the block again to bypass the error.
The training will take 30 to 45min depending on your machine (don't forget to **set up a GPU**), go take a ☕you deserve it 🤗.
```bash
mlagents-learn ./config/ppo/Huggy.yaml --env=./trained-envs-executables/linux/Huggy/Huggy --run-id="Huggy" --no-graphics
```
## Push the agent to the 🤗 Hub
- Now that we trained our agent, were **ready to push it to the Hub to be able to play with Huggy on your browser🔥.**
To be able to share your model with the community there are three more steps to follow:
1⃣ (If it's not already done) create an account to HF ➡ https://huggingface.co/join
2⃣ Sign in and then, you need to store your authentication token from the Hugging Face website.
- Create a new token (https://huggingface.co/settings/tokens) **with write role**
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/create-token.jpg" alt="Create HF Token">
- Copy the token
- Run the cell below and paste the token
```python
from huggingface_hub import notebook_login
notebook_login()
```
If you don't want to use a Google Colab or a Jupyter Notebook, you need to use this command instead: `huggingface-cli login`
Then, we simply need to run `mlagents-push-to-hf`.
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/mlpush.png" alt="ml learn function" width="100%">
And we define 4 parameters:
1. `--run-id`: the name of the training run id.
2. `--local-dir`: where the agent was saved, its results/<run_id name>, so in my case results/First Training.
3. `--repo-id`: the name of the Hugging Face repo you want to create or update. Its always <your huggingface username>/<the repo name>
If the repo does not exist **it will be created automatically**
4. `--commit-message`: since HF repos are git repository you need to define a commit message.
```bash
mlagents-push-to-hf --run-id="HuggyTraining" --local-dir="./results/Huggy" --repo-id="ThomasSimonini/ppo-Huggy" --commit-message="Huggy"
```
Else, if everything worked you should have this at the end of the process(but with a different url 😆) :
```
Your model is pushed to the hub. You can view your model here: https://huggingface.co/ThomasSimonini/ppo-Huggy
```
Its the link to your model repository. The repository contains a model card that explains how to use the model, your Tensorboard logs and your config file. **Whats awesome is that its a git repository, which means you can have different commits, update your repository with a new push, open Pull Requests, etc.**
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/modelcard.png" alt="ml learn function" width="100%">
But now comes the best: **being able to play with Huggy online 👀.**
## Play with your Huggy 🐕
This step is the simplest:
- Open the game Huggy in your browser: https://huggingface.co/spaces/ThomasSimonini/Huggy
- Click on Play with my Huggy model
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/load-huggy.jpg" alt="load-huggy" width="100%">
1. In step 1, choose your model repository which is the model id (in my case ThomasSimonini/ppo-Huggy).
2. In step 2, **choose what model you want to replay**:
- I have multiple ones, since we saved a model every 500000 timesteps.
- But since I want the more recent, I choose `Huggy.onnx`
👉 Whats nice **is to try with different models steps to see the improvement of the agent.**
Congrats on finishing this bonus unit!
You can now sit and enjoy playing with your Huggy 🐶. And don't **forget to spread the love by sharing Huggy with your friends 🤗**. And if you share about it on social media, **please tag us @huggingface and me @simoninithomas**
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit-bonus1/huggy-cover.jpeg" alt="Huggy cover" width="100%">
## Keep Learning, Stay awesome 🤗