Update Unit3

2026-06-14 22:17:15 +08:00 · 2023-05-30 21:17:47 +02:00
parent 873f20c75b
commit 336e6e6f7e
1 changed files with 120 additions and 93 deletions
--- a/notebooks/unit3/unit3.ipynb
+++ b/notebooks/unit3/unit3.ipynb
@@ -7,7 +7,7 @@
        "colab_type": "text"
      },
      "source": [
-        "<a href=\"https://colab.research.google.com/github/huggingface/deep-rl-class/blob/ThomasSimonini%2FUnit3/notebooks/unit3/unit3.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+        "<a href=\"https://colab.research.google.com/github/huggingface/deep-rl-class/blob/main/notebooks/unit3.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
@@ -44,7 +44,9 @@
      "source": [
        "### 🎮 Environments: \n",
        "\n",
-        "- SpacesInvadersNoFrameskip-v4 \n",
+        "- [SpacesInvadersNoFrameskip-v4](https://gymnasium.farama.org/environments/atari/space_invaders/)\n",
+        "\n",
+        "You can see the difference between Space Invaders versions here 👉 https://gymnasium.farama.org/environments/atari/space_invaders/#variants\n",
        "\n",
        "### 📚 RL-Library: \n",
        "\n",
@@ -127,6 +129,10 @@
      "source": [
        "# Let's train a Deep Q-Learning agent playing Atari' Space Invaders 👾 and upload it to the Hub.\n",
        "\n",
+        "We strongly recommend students **to use Google Colab for the hands-on exercises instead of running them on their personal computers**.\n",
+        "\n",
+        "By using Google Colab, **you can focus on learning and experimenting without worrying about the technical aspects of setting up your environments**.\n",
+        "\n",
        "To validate this hands-on for the certification process, you need to push your trained model to the Hub and **get a result of >= 200**.\n",
        "\n",
        "To find your result, go to the leaderboard and find your model, **the result = mean_reward - std of reward**\n",
@@ -173,6 +179,81 @@
        "id": "KV0NyFdQM9ZG"
      }
    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Install RL-Baselines3 Zoo and its dependencies 📚\n",
+        "\n",
+        "If you see `ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.` **this is normal and it's not a critical error** there's a conflict of version. But the packages we need are installed."
+      ],
+      "metadata": {
+        "id": "wS_cVefO-aYg"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# For now we install this update of RL-Baselines3 Zoo\n",
+        "!pip install git+https://github.com/DLR-RM/rl-baselines3-zoo@update/hf"
+      ],
+      "metadata": {
+        "id": "hLTwHqIWdnPb"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "IF AND ONLY IF THE VERSION ABOVE DOES NOT EXIST ANYMORE. UNCOMMENT AND INSTALL THE ONE BELOW"
+      ],
+      "metadata": {
+        "id": "p0xe2sJHdtHy"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#!pip install rl_zoo3==2.0.0a9"
+      ],
+      "metadata": {
+        "id": "N0d6wy-F-f39"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!apt-get install swig cmake ffmpeg"
+      ],
+      "metadata": {
+        "id": "8_MllY6Om1eI"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "4S9mJiKg6SqC"
+      },
+      "source": [
+        "To be able to use Atari games in Gymnasium we need to install atari package. And accept-rom-license to download the rom files (games files)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!pip install gymnasium[atari]\n",
+        "!pip install gymnasium[accept-rom-license]"
+      ],
+      "metadata": {
+        "id": "NsRP-lX1_2fC"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
    {
      "cell_type": "markdown",
      "source": [
@@ -201,29 +282,6 @@
        "!pip3 install pyvirtualdisplay"
      ]
    },
-    {
-      "cell_type": "code",
-      "source": [
-        "# Additional dependencies for RL Baselines3 Zoo\n",
-        "!apt-get install swig cmake freeglut3-dev "
-      ],
-      "metadata": {
-        "id": "fWyKJCy_NJBX"
-      },
-      "execution_count": null,
-      "outputs": []
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "!pip install pyglet==1.5.1"
-      ],
-      "metadata": {
-        "id": "C5LwHrISW7Q5"
-      },
-      "execution_count": null,
-      "outputs": []
-    },
    {
      "cell_type": "code",
      "source": [
@@ -234,68 +292,11 @@
        "virtual_display.start()"
      ],
      "metadata": {
-        "id": "ww5PQH1gNLI4"
+        "id": "BE5JWP5rQIKf"
      },
      "execution_count": null,
      "outputs": []
    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "mYIMvl5X9NAu"
-      },
-      "source": [
-        "## Clone RL-Baselines3 Zoo Repo 📚\n",
-        "You can now directly install from python package `pip install rl_zoo3` but since we want **the full installation with extra environments and dependencies** we're going to clone `RL-Baselines3-Zoo` repository and install from source."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "eu5ZDPZ09VNQ"
-      },
-      "outputs": [],
-      "source": [
-        "!git clone https://github.com/DLR-RM/rl-baselines3-zoo"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "HCIoSbvbfAQh"
-      },
-      "source": [
-        "## Install dependencies 🔽\n",
-        "We can now install the dependencies RL-Baselines3 Zoo needs (this can take 5min ⏲)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "s2QsFAk29h-D"
-      },
-      "outputs": [],
-      "source": [
-        "%cd /content/rl-baselines3-zoo/ \n",
-        "!git checkout v1.8.0"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "3QaOS7Xj9j1s"
-      },
-      "outputs": [],
-      "source": [
-        "!pip install setuptools==65.5.0\n",
-        "!pip install -r requirements.txt\n",
-        "# Since colab uses Python 3.9 we need to add this installation\n",
-        "!pip install gym[atari,accept-rom-license]==0.21.0"
-      ]
-    },
    {
      "cell_type": "markdown",
      "metadata": {
@@ -305,9 +306,31 @@
        "## Train our Deep Q-Learning Agent to Play Space Invaders 👾\n",
        "\n",
        "To train an agent with RL-Baselines3-Zoo, we just need to do two things:\n",
-        "1. We define the hyperparameters in `/content/rl-baselines3-zoo/hyperparams/dqn.yml`\n",
        "\n",
-        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit3/hyperparameters.png\" alt=\"DQN Hyperparameters\">\n"
+        "1. Create a hyperparameter config file that will contain our training hyperparameters called `dqn.yml`.\n",
+        "\n",
+        "This is a template example:\n",
+        "\n",
+        "```\n",
+        "SpaceInvadersNoFrameskip-v4:\n",
+        "  env_wrapper:\n",
+        "    - stable_baselines3.common.atari_wrappers.AtariWrapper\n",
+        "  frame_stack: 4\n",
+        "  policy: 'CnnPolicy'\n",
+        "  n_timesteps: !!float 1e7\n",
+        "  buffer_size: 100000\n",
+        "  learning_rate: !!float 1e-4\n",
+        "  batch_size: 32\n",
+        "  learning_starts: 100000\n",
+        "  target_update_interval: 1000\n",
+        "  train_freq: 4\n",
+        "  gradient_steps: 1\n",
+        "  exploration_fraction: 0.1\n",
+        "  exploration_final_eps: 0.01\n",
+        "  # If True, you need to deactivate handle_timeout_termination\n",
+        "  # in the replay_buffer_kwargs\n",
+        "  optimize_memory_usage: False\n",
+        "```"
      ]
    },
    {
@@ -346,7 +369,9 @@
        "id": "Hn8bRTHvERRL"
      },
      "source": [
-        "2. We run `train.py` and save the models on `logs` folder 📁"
+        "2. We start the training and save the models on `logs` folder 📁\n",
+        "\n",
+        "- Define the algorithm after `--algo`, where we save the model after `-f` and where the hyperparameter config is after `-c`."
      ]
    },
    {
@@ -357,7 +382,7 @@
      },
      "outputs": [],
      "source": [
-        "!python train.py --algo ________ --env SpaceInvadersNoFrameskip-v4  -f _________"
+        "!python -m rl_zoo3.train --algo ________ --env SpaceInvadersNoFrameskip-v4  -f _________  -c _________"
      ]
    },
    {
@@ -377,7 +402,7 @@
      },
      "outputs": [],
      "source": [
-        "!python train.py --algo dqn  --env SpaceInvadersNoFrameskip-v4 -f logs/"
+        "!python -m rl_zoo3.train --algo dqn  --env SpaceInvadersNoFrameskip-v4 -f logs/ -c dqn.yml"
      ]
    },
    {
@@ -399,7 +424,7 @@
      },
      "outputs": [],
      "source": [
-        "!python enjoy.py  --algo dqn  --env SpaceInvadersNoFrameskip-v4  --no-render  --n-timesteps _________  --folder logs/"
+        "!python -m rl_zoo3.enjoy  --algo dqn  --env SpaceInvadersNoFrameskip-v4  --no-render  --n-timesteps _________  --folder logs/ "
      ]
    },
    {
@@ -419,7 +444,7 @@
      },
      "outputs": [],
      "source": [
-        "!python enjoy.py  --algo dqn  --env SpaceInvadersNoFrameskip-v4  --no-render  --n-timesteps 5000  --folder logs/"
+        "!python -m rl_zoo3.enjoy  --algo dqn  --env SpaceInvadersNoFrameskip-v4  --no-render  --n-timesteps 5000  --folder logs/"
      ]
    },
    {
@@ -440,7 +465,7 @@
        "id": "ezbHS1q3HYVV"
      },
      "source": [
-        "By using `rl_zoo3.push_to_hub.py` **you evaluate, record a replay, generate a model card of your agent and push it to the hub**.\n",
+        "By using `rl_zoo3.push_to_hub` **you evaluate, record a replay, generate a model card of your agent and push it to the hub**.\n",
        "\n",
        "This way:\n",
        "- You can **showcase our work** 🔥\n",
@@ -518,6 +543,8 @@
        "\n",
        "`-orga`: Your Hugging Face username\n",
        "\n",
+        "`-f`: Where the trained model folder is (in our case `logs`)\n",
+        "\n",
        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit3/select-id.png\" alt=\"Select Id\">"
      ]
    },
@@ -649,7 +676,7 @@
      },
      "outputs": [],
      "source": [
-        "!python enjoy.py --algo dqn --env BeamRiderNoFrameskip-v4 -n 5000  -f rl_trained/"
+        "!python -m rl_zoo3.enjoy --algo dqn --env BeamRiderNoFrameskip-v4 -n 5000  -f rl_trained/ --no-render"
      ]
    },
    {
@@ -803,4 +830,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 0
-}
+}