deep-rl-class/notebooks/unit8/unit8_part2.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/huggingface/deep-rl-class/blob/main/notebooks/unit8/unit8_part2.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OVx1gdg9wt9t"
      },
      "source": [
        "# Unit 8 Part 2: Advanced Deep Reinforcement Learning. Using Sample Factory to play Doom from pixels\n",
        "\n",
        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/thumbnail2.png\" alt=\"Thumbnail\"/>\n",
        "\n",
        "In this notebook, we will learn how to train a Deep Neural Network to collect objects in a 3D environment based on the game of Doom, a video of the resulting policy is shown below. We train this policy using [Sample Factory](https://www.samplefactory.dev/), an asynchronous implementation of the PPO algorithm.\n",
        "\n",
        "Please note the following points:\n",
        "\n",
        "*   [Sample Factory](https://www.samplefactory.dev/) is an advanced RL framework and **only functions on Linux and Mac** (not Windows).\n",
        "\n",
        "*  The framework performs best on a **GPU machine with many CPU cores**, where it can achieve speeds of 100k interactions per second. The resources available on a standard Colab notebook **limit the performance of this library**. So the speed in this setting **does not reflect the real-world performance**.\n",
        "* Benchmarks for Sample Factory are available in a number of settings, check out the [examples](https://github.com/alex-petrenko/sample-factory/tree/master/sf_examples) if you want to find out more.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "I6_67HfI1CKg"
      },
      "outputs": [],
      "source": [
        "from IPython.display import HTML\n",
        "\n",
        "HTML('''<video width=\"640\" height=\"480\" controls>\n",
        "  <source src=\"https://huggingface.co/edbeeching/doom_health_gathering_supreme_3333/resolve/main/replay.mp4\"\n",
        "  type=\"video/mp4\">Your browser does not support the video tag.</video>'''\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DgHRAsYEXdyw"
      },
      "source": [
        "To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to push one model:\n",
        "\n",
        "- `doom_health_gathering_supreme` get a result of >= 5.\n",
        "\n",
        "To find your result, go to the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) and find your model, **the result = mean_reward - std of reward**\n",
        "\n",
        "If you don't find your model, **go to the bottom of the page and click on the refresh button**\n",
        "\n",
        "For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PU4FVzaoM6fC"
      },
      "source": [
        "## Set the GPU 💪\n",
        "- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`\n",
        "\n",
        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step1.jpg\" alt=\"GPU Step 1\">"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KV0NyFdQM9ZG"
      },
      "source": [
        "- `Hardware Accelerator > GPU`\n",
        "\n",
        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step2.jpg\" alt=\"GPU Step 2\">"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-fSy5HzUcMWB"
      },
      "source": [
        "Before starting to train our agent, let's **study the library and environments we're going to use**.\n",
        "\n",
        "## Sample Factory\n",
        "\n",
        "[Sample Factory](https://www.samplefactory.dev/) is one of the **fastest RL libraries focused on very efficient synchronous and asynchronous implementations of policy gradients (PPO)**.\n",
        "\n",
        "Sample Factory is thoroughly **tested, used by many researchers and practitioners**, and is actively maintained. Our implementation is known to **reach SOTA performance in a variety of domains while minimizing RL experiment training time and hardware requirements**.\n",
        "\n",
        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/samplefactoryenvs.png\" alt=\"Sample factory\"/>\n",
        "\n",
        "\n",
        "\n",
        "### Key features\n",
        "\n",
        "- Highly optimized algorithm [architecture](https://www.samplefactory.dev/06-architecture/overview/) for maximum learning throughput\n",
        "- [Synchronous and asynchronous](https://www.samplefactory.dev/07-advanced-topics/sync-async/) training regimes\n",
        "- [Serial (single-process) mode](https://www.samplefactory.dev/07-advanced-topics/serial-mode/) for easy debugging\n",
        "- Optimal performance in both CPU-based and [GPU-accelerated environments](https://www.samplefactory.dev/09-environment-integrations/isaacgym/)\n",
        "- Single- & multi-agent training, self-play, supports [training multiple policies](https://www.samplefactory.dev/07-advanced-topics/multi-policy-training/) at once on one or many GPUs\n",
        "- Population-Based Training ([PBT](https://www.samplefactory.dev/07-advanced-topics/pbt/))\n",
        "- Discrete, continuous, hybrid action spaces\n",
        "- Vector-based, image-based, dictionary observation spaces\n",
        "- Automatically creates a model architecture by parsing action/observation space specification. Supports [custom model architectures](https://www.samplefactory.dev/03-customization/custom-models/)\n",
        "- Designed to be imported into other projects, [custom environments](https://www.samplefactory.dev/03-customization/custom-environments/) are first-class citizens\n",
        "- Detailed [WandB and Tensorboard summaries](https://www.samplefactory.dev/05-monitoring/metrics-reference/), [custom metrics](https://www.samplefactory.dev/05-monitoring/custom-metrics/)\n",
        "- [HuggingFace 🤗 integration](https://www.samplefactory.dev/10-huggingface/huggingface/) (upload trained models and metrics to the Hub)\n",
        "- [Multiple](https://www.samplefactory.dev/09-environment-integrations/mujoco/) [example](https://www.samplefactory.dev/09-environment-integrations/atari/) [environment](https://www.samplefactory.dev/09-environment-integrations/vizdoom/) [integrations](https://www.samplefactory.dev/09-environment-integrations/dmlab/) with tuned parameters and trained models\n",
        "\n",
        "All of the above policies are available on the 🤗 hub. Search for the tag [sample-factory](https://huggingface.co/models?library=sample-factory&sort=downloads)\n",
        "\n",
        "### How sample-factory works\n",
        "\n",
        "Sample-factory is one of the **most highly optimized RL implementations available to the community**.\n",
        "\n",
        "It works by **spawning multiple processes that run rollout workers, inference workers and a learner worker**.\n",
        "\n",
        "The *workers* **communicate through shared memory, which lowers the communication cost between processes**.\n",
        "\n",
        "The *rollout workers* interact with the environment and send observations to the *inference workers*.\n",
        "\n",
        "The *inferences workers* query a fixed version of the policy and **send actions back to the rollout worker**.\n",
        "\n",
        "After *k* steps the rollout works send a trajectory of experience to the learner worker, **which it uses to update the agent’s policy network**.\n",
        "\n",
        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/samplefactory.png\" alt=\"Sample factory\"/>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nB68Eb9UgC94"
      },
      "source": [
        "### Actor Critic models in Sample-factory\n",
        "\n",
        "Actor Critic models in Sample Factory are composed of three components:\n",
        "\n",
        "- **Encoder** - Process input observations (images, vectors) and map them to a vector. This is the part of the model you will most likely want to customize.\n",
        "- **Core** - Intergrate vectors from one or more encoders, can optionally include a single- or multi-layer LSTM/GRU in a memory-based agent.\n",
        "- **Decoder** - Apply additional layers to the output of the model core before computing the policy and value outputs.\n",
        "\n",
        "The library has been designed to automatically support any observation and action spaces. Users can easily add their custom models. You can find out more in the [documentation](https://www.samplefactory.dev/03-customization/custom-models/#actor-critic-models-in-sample-factory)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ez5UhUtYcWXF"
      },
      "source": [
        "## ViZDoom\n",
        "\n",
        "[ViZDoom](https://vizdoom.cs.put.edu.pl/) is an **open-source python interface for the Doom Engine**.\n",
        "\n",
        "The library was created in 2016 by Marek Wydmuch, Michal Kempka  at the Institute of Computing Science, Poznan University of Technology, Poland.\n",
        "\n",
        "The library enables the **training of agents directly from the screen pixels in a number of scenarios**, including team deathmatch, shown in the video below. Because the ViZDoom environment is based on a game the was created in the 90s, it can be run on modern hardware at accelerated speeds, **allowing us to learn complex AI behaviors fairly quickly**.\n",
        "\n",
        "The library includes feature such as:\n",
        "\n",
        "- Multi-platform (Linux, macOS, Windows),\n",
        "- API for Python and C++,\n",
        "- [OpenAI Gym](https://www.gymlibrary.dev/) environment wrappers\n",
        "- Easy-to-create custom scenarios (visual editors, scripting language, and examples available),\n",
        "- Async and sync single-player and multiplayer modes,\n",
        "- Lightweight (few MBs) and fast (up to 7000 fps in sync mode, single-threaded),\n",
        "- Customizable resolution and rendering parameters,\n",
        "- Access to the depth buffer (3D vision),\n",
        "- Automatic labeling of game objects visible in the frame,\n",
        "- Access to the audio buffer\n",
        "- Access to the list of actors/objects and map geometry,\n",
        "- Off-screen rendering and episode recording,\n",
        "- Time scaling in async mode."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wAMwza0d5QVj"
      },
      "source": [
        "## We first need to install some dependencies that are required for the ViZDoom environment\n",
        "\n",
        "Now that our Colab runtime is set up, we can start by installing the dependencies required to run ViZDoom on linux.\n",
        "\n",
        "If you are following on your machine on Mac, you will want to follow the installation instructions on the [github page](https://github.com/Farama-Foundation/ViZDoom/blob/master/doc/Quickstart.md#-quickstart-for-macos-and-anaconda3-python-36)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "RJMxkaldwIVx"
      },
      "outputs": [],
      "source": [
        "%%capture\n",
        "%%bash\n",
        "# Install ViZDoom deps from\n",
        "# https://github.com/mwydmuch/ViZDoom/blob/master/doc/Building.md#-linux\n",
        "\n",
        "apt-get install build-essential zlib1g-dev libsdl2-dev libjpeg-dev \\\n",
        "nasm tar libbz2-dev libgtk2.0-dev cmake git libfluidsynth-dev libgme-dev \\\n",
        "libopenal-dev timidity libwildmidi-dev unzip ffmpeg\n",
        "\n",
        "# Boost libraries\n",
        "apt-get install libboost-all-dev\n",
        "\n",
        "# Lua binding dependencies\n",
        "apt-get install liblua5.1-dev"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JT4att2c57MW"
      },
      "source": [
        "## Then we can install Sample Factory and ViZDoom\n",
        "- This can take 7min"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "bbqfPZnIsvA6"
      },
      "outputs": [],
      "source": [
        "# install python libraries\n",
        "# thanks toinsson\n",
        "!pip install faster-fifo==1.4.2\n",
        "!pip install vizdoom"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install sample-factory==2.1.1"
      ],
      "metadata": {
        "id": "alxUt7Au-O8e"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1jizouGpghUZ"
      },
      "source": [
        "## Setting up the Doom Environment in sample-factory"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "bCgZbeiavcDU"
      },
      "outputs": [],
      "source": [
        "import functools\n",
        "\n",
        "from sample_factory.algo.utils.context import global_model_factory\n",
        "from sample_factory.cfg.arguments import parse_full_cfg, parse_sf_args\n",
        "from sample_factory.envs.env_utils import register_env\n",
        "from sample_factory.train import run_rl\n",
        "\n",
        "from sf_examples.vizdoom.doom.doom_model import make_vizdoom_encoder\n",
        "from sf_examples.vizdoom.doom.doom_params import add_doom_env_args, doom_override_defaults\n",
        "from sf_examples.vizdoom.doom.doom_utils import DOOM_ENVS, make_doom_env_from_spec\n",
        "\n",
        "\n",
        "# Registers all the ViZDoom environments\n",
        "def register_vizdoom_envs():\n",
        "    for env_spec in DOOM_ENVS:\n",
        "        make_env_func = functools.partial(make_doom_env_from_spec, env_spec)\n",
        "        register_env(env_spec.name, make_env_func)\n",
        "\n",
        "# Sample Factory allows the registration of a custom Neural Network architecture\n",
        "# See https://github.com/alex-petrenko/sample-factory/blob/master/sf_examples/vizdoom/doom/doom_model.py for more details\n",
        "def register_vizdoom_models():\n",
        "    global_model_factory().register_encoder_factory(make_vizdoom_encoder)\n",
        "\n",
        "\n",
        "def register_vizdoom_components():\n",
        "    register_vizdoom_envs()\n",
        "    register_vizdoom_models()\n",
        "\n",
        "# parse the command line args and create a config\n",
        "def parse_vizdoom_cfg(argv=None, evaluation=False):\n",
        "    parser, _ = parse_sf_args(argv=argv, evaluation=evaluation)\n",
        "    # parameters specific to Doom envs\n",
        "    add_doom_env_args(parser)\n",
        "    # override Doom default values for algo parameters\n",
        "    doom_override_defaults(parser)\n",
        "    # second parsing pass yields the final configuration\n",
        "    final_cfg = parse_full_cfg(parser, argv)\n",
        "    return final_cfg"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sgRy6wnrgnij"
      },
      "source": [
        "Now that the setup if complete, we can train the agent. We have chosen here to learn a ViZDoom task called `Health Gathering Supreme`.\n",
        "\n",
        "### The scenario: Health Gathering Supreme\n",
        "\n",
        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/Health-Gathering-Supreme.png\" alt=\"Health-Gathering-Supreme\"/>\n",
        "\n",
        "\n",
        "\n",
        "The objective of this scenario is to **teach the agent how to survive without knowing what makes him survive**. Agent know only that **life is precious** and death is bad so **it must learn what prolongs his existence and that his health is connected with it**.\n",
        "\n",
        "Map is a rectangle containing walls and with a green, acidic floor which **hurts the player periodically**. Initially there are some medkits spread uniformly over the map. A new medkit falls from the skies every now and then. **Medkits heal some portions of player's health** - to survive agent needs to pick them up. Episode finishes after player's death or on timeout.\n",
        "\n",
        "Further configuration:\n",
        "- Living_reward = 1\n",
        "- 3 available buttons: turn left, turn right, move forward\n",
        "- 1 available game variable: HEALTH\n",
        "- death penalty = 100\n",
        "\n",
        "You can find out more about the scenarios available in ViZDoom [here](https://github.com/Farama-Foundation/ViZDoom/tree/master/scenarios).\n",
        "\n",
        "There are also a number of more complex scenarios that have been create for ViZDoom, such as the ones detailed on [this github page](https://github.com/edbeeching/3d_control_deep_rl).\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "siHZZ34DiZEp"
      },
      "source": [
        "## Training the agent\n",
        "- We're going to train the agent for 4000000 steps it will take approximately 20min"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "y_TeicMvyKHP"
      },
      "outputs": [],
      "source": [
        "## Start the training, this should take around 15 minutes\n",
        "register_vizdoom_components()\n",
        "\n",
        "# The scenario we train on today is health gathering\n",
        "# other scenarios include \"doom_basic\", \"doom_two_colors_easy\", \"doom_dm\", \"doom_dwango5\", \"doom_my_way_home\", \"doom_deadly_corridor\", \"doom_defend_the_center\", \"doom_defend_the_line\"\n",
        "env = \"doom_health_gathering_supreme\"\n",
        "cfg = parse_vizdoom_cfg(argv=[f\"--env={env}\", \"--num_workers=8\", \"--num_envs_per_worker=4\", \"--train_for_env_steps=4000000\"])\n",
        "\n",
        "status = run_rl(cfg)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5L0nBS9e_jqC"
      },
      "source": [
        "## Let's take a look at the performance of the trained policy and output a video of the agent."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "MGSA4Kg5_i0j"
      },
      "outputs": [],
      "source": [
        "from sample_factory.enjoy import enjoy\n",
        "cfg = parse_vizdoom_cfg(argv=[f\"--env={env}\", \"--num_workers=1\", \"--save_video\", \"--no_render\", \"--max_num_episodes=10\"], evaluation=True)\n",
        "status = enjoy(cfg)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Lj5L1x0WLxwB"
      },
      "source": [
        "## Now lets visualize the performance of the agent"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "WsXhBY7JNOdJ"
      },
      "outputs": [],
      "source": [
        "from base64 import b64encode\n",
        "from IPython.display import HTML\n",
        "\n",
        "mp4 = open('/content/train_dir/default_experiment/replay.mp4','rb').read()\n",
        "data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n",
        "HTML(\"\"\"\n",
        "<video width=640 controls>\n",
        "      <source src=\"%s\" type=\"video/mp4\">\n",
        "</video>\n",
        "\"\"\" % data_url)"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "The agent has learned something, but its performance could be better. We would clearly need to train for longer. But let's upload this model to the Hub."
      ],
      "metadata": {
        "id": "2A4pf_1VwPqR"
      }
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CSQVWF0kNuy9"
      },
      "source": [
        "## Now lets upload your checkpoint and video to the Hugging Face Hub\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JquRrWytA6eo"
      },
      "source": [
        "To be able to share your model with the community there are three more steps to follow:\n",
        "\n",
        "1️⃣ (If it's not already done) create an account to HF ➡ https://huggingface.co/join\n",
        "\n",
        "2️⃣ Sign in and then, you need to store your authentication token from the Hugging Face website.\n",
        "- Create a new token (https://huggingface.co/settings/tokens) **with write role**\n",
        "\n",
        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/create-token.jpg\" alt=\"Create HF Token\">\n",
        "\n",
        "- Copy the token\n",
        "- Run the cell below and paste the token"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_tsf2uv0g_4p"
      },
      "source": [
        "If you don't want to use a Google Colab or a Jupyter Notebook, you need to use this command instead: `huggingface-cli login`"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "GoQm_jYSOts0"
      },
      "outputs": [],
      "source": [
        "from huggingface_hub import notebook_login\n",
        "notebook_login()\n",
        "!git config --global credential.helper store"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "sEawW_i0OvJV"
      },
      "outputs": [],
      "source": [
        "from sample_factory.enjoy import enjoy\n",
        "\n",
        "hf_username = \"ThomasSimonini\" # insert your HuggingFace username here\n",
        "\n",
        "cfg = parse_vizdoom_cfg(argv=[f\"--env={env}\", \"--num_workers=1\", \"--save_video\", \"--no_render\", \"--max_num_episodes=10\", \"--max_num_frames=100000\", \"--push_to_hub\", f\"--hf_repository={hf_username}/rl_course_vizdoom_health_gathering_supreme\"], evaluation=True)\n",
        "status = enjoy(cfg)"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Let's load another model\n",
        "\n",
        "\n"
      ],
      "metadata": {
        "id": "9PzeXx-qxVvw"
      }
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mHZAWSgL5F7P"
      },
      "source": [
        "This agent's performance was good, but can do better! Let's download and visualize an agent trained for 10B timesteps from the hub."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Ud6DwAUl5S-l"
      },
      "outputs": [],
      "source": [
        "#download the agent from the hub\n",
        "!python -m sample_factory.huggingface.load_from_hub -r edbeeching/doom_health_gathering_supreme_2222 -d ./train_dir\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "qoUJhL6x6sY5"
      },
      "outputs": [],
      "source": [
        "!ls train_dir/doom_health_gathering_supreme_2222"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "lZskc8LG8qr8"
      },
      "outputs": [],
      "source": [
        "env = \"doom_health_gathering_supreme\"\n",
        "cfg = parse_vizdoom_cfg(argv=[f\"--env={env}\", \"--num_workers=1\", \"--save_video\", \"--no_render\", \"--max_num_episodes=10\", \"--experiment=doom_health_gathering_supreme_2222\", \"--train_dir=train_dir\"], evaluation=True)\n",
        "status = enjoy(cfg)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "BtzXBoj65Wmq"
      },
      "outputs": [],
      "source": [
        "mp4 = open('/content/train_dir/doom_health_gathering_supreme_2222/replay.mp4','rb').read()\n",
        "data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n",
        "HTML(\"\"\"\n",
        "<video width=640 controls>\n",
        "      <source src=\"%s\" type=\"video/mp4\">\n",
        "</video>\n",
        "\"\"\" % data_url)"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Some additional challenges 🏆: Doom Deathmatch\n",
        "\n",
        "Training an agent to play a Doom deathmatch **takes many hours on a more beefy machine than is available in Colab**.\n",
        "\n",
        "Fortunately, we have have **already trained an agent in this scenario and it is available in the 🤗 Hub!** Let’s download the model and visualize the agent’s performance."
      ],
      "metadata": {
        "id": "ie5YWC3NyKO8"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "fq3WFeus81iI"
      },
      "outputs": [],
      "source": [
        "# Download the agent from the hub\n",
        "!python -m sample_factory.huggingface.load_from_hub -r edbeeching/doom_deathmatch_bots_2222 -d ./train_dir"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Given the agent plays for a long time the video generation can take **10 minutes**."
      ],
      "metadata": {
        "id": "7AX_LwxR2FQ0"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0hq6XL__85Bv"
      },
      "outputs": [],
      "source": [
        "\n",
        "from sample_factory.enjoy import enjoy\n",
        "register_vizdoom_components()\n",
        "env = \"doom_deathmatch_bots\"\n",
        "cfg = parse_vizdoom_cfg(argv=[f\"--env={env}\", \"--num_workers=1\", \"--save_video\", \"--no_render\", \"--max_num_episodes=1\", \"--experiment=doom_deathmatch_bots_2222\", \"--train_dir=train_dir\"], evaluation=True)\n",
        "status = enjoy(cfg)\n",
        "mp4 = open('/content/train_dir/doom_deathmatch_bots_2222/replay.mp4','rb').read()\n",
        "data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n",
        "HTML(\"\"\"\n",
        "<video width=640 controls>\n",
        "      <source src=\"%s\" type=\"video/mp4\">\n",
        "</video>\n",
        "\"\"\" % data_url)"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "\n",
        "You **can try to train your agent in this environment** using the code above, but not on colab.\n",
        "**Good luck 🤞**"
      ],
      "metadata": {
        "id": "N6mEC-4zyihx"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "If you prefer an easier scenario, **why not try training in another ViZDoom scenario such as `doom_deadly_corridor` or `doom_defend_the_center`.**\n",
        "\n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "\n",
        "This concludes the last unit. But we are not finished yet! 🤗 The following **bonus section include some of the most interesting, advanced and cutting edge work in Deep Reinforcement Learning**.\n",
        "\n",
        "## Keep learning, stay awesome 🤗"
      ],
      "metadata": {
        "id": "YnDAngN6zeeI"
      }
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "provenance": [],
      "collapsed_sections": [
        "PU4FVzaoM6fC",
        "nB68Eb9UgC94",
        "ez5UhUtYcWXF",
        "sgRy6wnrgnij"
      ],
      "private_outputs": true,
      "include_colab_link": true
    },
    "gpuClass": "standard",
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}