{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "k7xBVPzoXxOg"
},
"source": [
"# Unit 3: Deep Q-Learning with Atari Games ๐พ using RL Baselines3 Zoo\n",
"\n",
"In this notebook, **you'll train a Deep Q-Learning agent** playing Space Invaders using [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), a training framework based on [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) that provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.\n",
"\n",
"โ If you have questions, please post them on #study-group-unit3 discord channel ๐ https://discord.gg/aYka4Yhff9\n",
"\n",
"๐ฎ Environments: \n",
"- SpacesInvadersNoFrameskip-v4\n",
"\n",
"๐ RL-Library: [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/)\n",
" \n",
"โฌ๏ธ Here is an example of what **you will achieve** โฌ๏ธ"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "J9S713biXntc"
},
"outputs": [],
"source": [
"%%html\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wciHGjrFYz9m"
},
"source": [
"## Objectives of this notebook ๐\n",
"At the end of the notebook, you will:\n",
"- Be able to understand deeper **how RL Baselines3 Zoo works**.\n",
"- Be able to **push your trained agent and the code to the Hub** with a nice video replay and an evaluation score ๐ฅ.\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7Hac0z2QZdOm"
},
"source": [
"## This notebook is from Deep Reinforcement Learning Class\n",
"\n",
"\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nw6fJHIAZd-J"
},
"source": [
"In this free course, you will:\n",
"\n",
"- ๐ Study Deep Reinforcement Learning in **theory and practice**.\n",
"- ๐งโ๐ป Learn to **use famous Deep RL libraries** such as Stable Baselines3, RL Baselines3 Zoo, and RLlib.\n",
"- ๐ค Train **agents in unique environments** \n",
"\n",
"And more check ๐ the syllabus ๐ https://github.com/huggingface/deep-rl-class\n",
"\n",
"The best way to keep in touch is to join our discord server to exchange with the community and with us ๐๐ป https://discord.gg/aYka4Yhff9"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0vgANIBBZg1p"
},
"source": [
"## Prerequisites ๐๏ธ\n",
"Before diving into the notebook, you need to:\n",
"\n",
"๐ฒ ๐ [Read the Unit 3 Readme](https://github.com/huggingface/deep-rl-class/blob/main/unit3/README.md) that contains all the information.\n",
"\n",
"๐ฒ ๐ [Read **Deep Q-Learning**](https://huggingface.co/blog/deep-rl-dqn) \n",
"\n",
"๐ฒ ๐ข Sign up to [our Discord Server](https://discord.gg/aYka4Yhff9) if it's not already done and **introduce yourself to #introduce-yourself channel ๐ฅณ**\n",
"\n",
"๐ฒ ๐ Are you new to Discord? Check our **discord 101 to get the best practices** ๐ https://github.com/huggingface/deep-rl-class/blob/main/DISCORD.Md\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QR0jZtYreSI5"
},
"source": [
"# Let's train a Deep Q-Learning agent playing Atari' Space Invaders ๐พ and upload it to the Hub."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ESJV9guAee3P"
},
"source": [
"### Step 0: Set the GPU ๐ช\n",
"- To **faster the agent's training, we'll use a GPU** to do that go to `Runtime > Change Runtime type`\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Zg8eWiQ1efcU"
},
"source": [
"- `Hardware Accelerator > GPU`"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "eoey4CFpejM5"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "P6XBkQ5qelOV"
},
"source": [
"### Step 0+: Setup a Virtual Display ๐ป\n",
"\n",
"During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames). \n",
"\n",
"Hence the following cell will install virtual screen libraries and create and run a virtual screen ๐ฅ"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "jV6wjQ7Be7p5"
},
"outputs": [],
"source": [
"%%capture\n",
"!pip install pyglet==1.5.1 \n",
"!apt install python-opengl\n",
"!apt install ffmpeg\n",
"!apt install xvfb\n",
"!pip3 install pyvirtualdisplay\n",
"\n",
"# Additional dependencies for RL Baselines3 Zoo\n",
"!apt-get install swig cmake freeglut3-dev \n",
"\n",
"# Virtual display\n",
"from pyvirtualdisplay import Display\n",
"\n",
"virtual_display = Display(visible=0, size=(1400, 900))\n",
"virtual_display.start()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mYIMvl5X9NAu"
},
"source": [
"### Step 1: Clone RL-Baselines3 Zoo Repo ๐"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "eu5ZDPZ09VNQ"
},
"outputs": [],
"source": [
"!git clone https://github.com/DLR-RM/rl-baselines3-zoo"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HCIoSbvbfAQh"
},
"source": [
"### Step 2: Install dependencies ๐ฝ\n",
"The first step is to install the dependencies RL-Baselines3 Zoo needs (this can take 5min โฒ)\n",
"\n",
"But we'll also install:\n",
"- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face ๐ค Hub."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "s2QsFAk29h-D"
},
"outputs": [],
"source": [
"%cd /content/rl-baselines3-zoo/"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3QaOS7Xj9j1s"
},
"outputs": [],
"source": [
"!pip install -r requirements.txt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "RLRGKFR39l9s"
},
"outputs": [],
"source": [
"%%capture\n",
"!pip install huggingface_sb3"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5iPgzluo9z-u"
},
"source": [
"### Step 3: Train our Deep Q-Learning Agent to Play Space Invaders ๐พ\n",
"\n",
"To train an agent with RL-Baselines3-Zoo, we just need to do two things:\n",
"1. We define the hyperparameters in `rl-baselines3-zoo/hyperparams/dqn.yml`\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "oInM0jLkDPfL"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_VjblFSVDQOj"
},
"source": [
"Here we see that:\n",
"- We use the Atari Wrapper that preprocess the input (Frame reduction ,grayscale, stack 4 frames)\n",
"- We use `CnnPolicy`, since we use Convolutional layers to process the frames\n",
"- We train it for 10 million `n_timesteps` \n",
"- Memory (Experience Replay) size is 100000\n",
"\n",
"๐ก My advice is to **reduce the training timesteps to 1M,** if you want to train for 10M timesteps, you should run on your local machine (to avoid Colab timeout) to do that just click on: `File>Download`"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5qTkbWrkECOJ"
},
"source": [
"You can check the documentation to understand what each hyperparameters does: https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html?highlight=deep%20q%20learning#parameters"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Hn8bRTHvERRL"
},
"source": [
"2. We run `train.py` and save the models on `logs` folder ๐"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Xr1TVW4xfbz3"
},
"outputs": [],
"source": [
"!python train.py --algo ________ --env SpaceInvadersNoFrameskip-v4 -f _________"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SeChoX-3SZfP"
},
"source": [
"#### Solution"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "PuocgdokSab9"
},
"outputs": [],
"source": [
"!python train.py --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_dLomIiMKQaf"
},
"source": [
"### Step 4: Let's evaluate our agent ๐\n",
"- RL-Baselines3-Zoo provides `enjoy.py` to evaluate our agent.\n",
"- Let's evaluate it for 5000 timesteps ๐ฅ"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "co5um_KeKbBJ"
},
"outputs": [],
"source": [
"!python enjoy.py --algo dqn --env SpaceInvadersNoFrameskip-v4 --no-render --n-timesteps _________ --folder logs/"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Q24K1tyWSj7t"
},
"source": [
"#### Solution"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "P_uSmwGRSk0z"
},
"outputs": [],
"source": [
"!python enjoy.py --algo dqn --env SpaceInvadersNoFrameskip-v4 --no-render --n-timesteps 5000 --folder logs/"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "liBeTltiHJtr"
},
"source": [
"### Step 5: Publish our trained model on the Hub ๐\n",
"Now that we saw we got good results after the training, we can publish our trained model on the hub ๐ค with one line of code.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "O6L41QiMHYM2"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ezbHS1q3HYVV"
},
"source": [
"By using `utils.push_to_hub.py` **you evaluate, record a replay, generate a model card of your agent and push it to the hub**.\n",
"\n",
"This way:\n",
"- You can **showcase our work** ๐ฅ\n",
"- You can **visualize your agent playing** ๐\n",
"- You can **share with the community an agent that others can use** ๐พ\n",
"- You can **access a leaderboard ๐ to see how well your agent is performing compared to your classmates** ๐ https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XMSeZRBiHk6X"
},
"source": [
"To be able to share your model with the community there are three more steps to follow:\n",
"\n",
"1๏ธโฃ (If it's not already done) create an account to HF โก https://huggingface.co/join\n",
"\n",
"2๏ธโฃ Sign in and then, you need to store your authentication token from the Hugging Face website.\n",
"- Create a new token (https://huggingface.co/settings/tokens) **with write role**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9ToyuaYwHmxG"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9O6FI0F8HnzE"
},
"source": [
"- Copy the token \n",
"- Run the cell below and past the token"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Ppu9yePwHrZX"
},
"outputs": [],
"source": [
"from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.\n",
"notebook_login()\n",
"!git config --global credential.helper store"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2RVEdunPHs8B"
},
"source": [
"If you don't want to use a Google Colab or a Jupyter Notebook, you need to use this command instead: `huggingface-cli login`"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dSLwdmvhHvjw"
},
"source": [
"3๏ธโฃ We're now ready to push our trained agent to the ๐ค Hub ๐ฅ"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PW436XnhHw1H"
},
"source": [
"Let's run push_to_hub.py file to upload our trained agent to the Hub.\n",
"\n",
"`--repo-name `: The name of the repo\n",
"\n",
"`-orga`: Your Hugging Face username"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9YO_qZTWeRHl"
},
"source": [
""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Ygk2sEktTDEw"
},
"outputs": [],
"source": [
"!python -m utils.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 --repo-name _____________________ -orga _____________________ -f logs/"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "otgpa0rhS9wR"
},
"source": [
"#### Solution"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_HQNlAXuEhci"
},
"outputs": [],
"source": [
"!python -m utils.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 --repo-name dqn-SpaceInvadersNoFrameskip-v4 -orga ThomasSimonini -f logs/"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0D4F5zsTTJ-L"
},
"source": [
"###."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ff89kd2HL1_s"
},
"source": [
"Congrats ๐ฅณ you've just trained and uploaded your first Deep Q-Learning agent using RL-Baselines-3 Zoo. The script above should have displayed a link to a model repository such as https://huggingface.co/ThomasSimonini/dqn-SpaceInvadersNoFrameskip-v4. When you go to this link, you can:\n",
"* see a video preview of your agent at the right. \n",
"* click \"Files and versions\" to see all the files in the repository.\n",
"* click \"Use in stable-baselines3\" to get a code snippet that shows how to load the model.\n",
"* a model card (`README.md` file) which gives a description of the model and the hyperparameters you used.\n",
"\n",
"Under the hood, the Hub uses git-based repositories (don't worry if you don't know what git is), which means you can update the model with new versions as you experiment and improve your agent.\n",
"\n",
"Compare the results of your LunarLander-v2 with your classmates using the leaderboard ๐ ๐ https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fyRKcCYY-dIo"
},
"source": [
"### Step 6: Load a powerful trained model ๐ฅ\n",
"- The Stable-Baselines3 team uploaded **more than 150 trained Deep Reinforcement Learning agents on the Hub**.\n",
"\n",
"You can find them here: ๐ https://huggingface.co/sb3\n",
"\n",
"Some examples:\n",
"- Asteroids: https://huggingface.co/sb3/dqn-AsteroidsNoFrameskip-v4\n",
"- Beam Rider: https://huggingface.co/sb3/dqn-BeamRiderNoFrameskip-v4\n",
"- Breakout: https://huggingface.co/sb3/dqn-BreakoutNoFrameskip-v4\n",
"- Road Runner: https://huggingface.co/sb3/dqn-RoadRunnerNoFrameskip-v4\n",
"\n",
"Let's load an agent playing Beam Rider: https://huggingface.co/sb3/dqn-BeamRiderNoFrameskip-v4"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "B-9QVFIROI5Y"
},
"outputs": [],
"source": [
"%%html\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7ZQNY_r6NJtC"
},
"source": [
"1. We download the model using `utils.load_from_hub`, and place it in a new folder that we can call `rl_trained`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "OdBNZHy0NGTR"
},
"outputs": [],
"source": [
"# Download model and save it into the logs/ folder\n",
"!python -m utils.load_from_hub --algo dqn --env BeamRiderNoFrameskip-v4 -orga sb3 -f rl_trained/"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LFt6hmWsNdBo"
},
"source": [
"2. Let's evaluate if for 5000 timesteps"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "aOxs0rNuN0uS"
},
"outputs": [],
"source": [
"!python enjoy.py --algo dqn --env BeamRiderNoFrameskip-v4 -n 5000 -f rl_trained/"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kxMDuDfPON57"
},
"source": [
"Why not trying to train your own **Deep Q-Learning Agent playing BeamRiderNoFrameskip-v4? ๐.**\n",
"\n",
"If you want to try, check https://huggingface.co/sb3/dqn-BeamRiderNoFrameskip-v4#hyperparameters **in the model card, you have the hyperparameters of the trained agent.**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xL_ZtUgpOuY6"
},
"source": [
"We'll see in the next Unit, how we can **use Optuna for optimizing the Hyperparameters ๐ฅ.**\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-pqaco8W-huW"
},
"source": [
"## Some additional challenges ๐\n",
"The best way to learn **is to try things by your own**!\n",
"\n",
"In the [Leaderboard](https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top?\n",
"\n",
"Here's a list of environments you can try to train your agent with:\n",
"- BeamRiderNoFrameskip-v4\n",
"- BreakoutNoFrameskip-v4 \n",
"- EnduroNoFrameskip-v4\n",
"- PongNoFrameskip-v4\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JNeyXYt-PtCQ"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "paS-XKo4-kmu"
},
"source": [
"________________________________________________________________________\n",
"Congrats on finishing this chapter!\n",
"\n",
"If youโre still feel confused with all these elements...it's totally normal! **This was the same for me and for all people who studied RL.**\n",
"\n",
"Take time to really **grasp the material before continuing and try the additional challenges**. Itโs important to master these elements and having a solid foundations.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5WRx7tO7-mvC"
},
"source": [
"\n",
"\n",
"### This is a course built with you ๐ท๐ฟโโ๏ธ\n",
"\n",
"Finally, we want to improve and update the course iteratively with your feedback. If you have some, please fill this form ๐ https://forms.gle/3HgA7bEHwAmmLfwh9\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fS3Xerx0fIMV"
},
"source": [
"### Keep Learning, Stay Awesome ๐ค"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "Copie de Unit 3: Deep Q-Learning with Space Invaders.ipynb",
"private_outputs": true,
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}