Add unit6 WIP

2026-04-09 21:59:22 +08:00 · 2023-01-01 17:30:34 +01:00
parent 14bd94d574
commit 1680476a04
1 changed files with 771 additions and 0 deletions
--- a/notebooks/unit6/unit6.ipynb
+++ b/notebooks/unit6/unit6.ipynb
@@ -0,0 +1,771 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "private_outputs": true,
+      "authorship_tag": "ABX9TyM4Z04oGTU1B2rRuxHfuNly",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    },
+    "accelerator": "GPU",
+    "gpuClass": "standard"
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/huggingface/deep-rl-class/blob/ThomasSimonini%2FA2C/notebooks/unit6/unit6.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Unit 6: Advantage Actor Critic (A2C) using Robotics Simulations with PyBullet and Panda-Gym 🤖\n",
+        "\n",
+        "TODO: ADD THUMBNAIL\n",
+        "\n",
+        "In this small notebook you'll learn to use A2C with PyBullet and Panda-Gym two set of robotics environments. \n",
+        "\n",
+        "With [PyBullet](https://github.com/bulletphysics/bullet3), you're going to **train robots to walk and run**:\n",
+        "- `AntBulletEnv-v0` 🕸️ More precisely a spider (they say Ant but come on... it's a spider 😆) 🕸️\n",
+        "- `HalfCheetahBulletEnv-v0`\n",
+        "\n",
+        "Then, with [Panda-Gym](https://github.com/qgallouedec/panda-gym), you're going **to train a robotic arm** (Franka Emika Panda robot) to perform some tasks:\n",
+        "- `Reach`: the robot must place its end-effector at a target position.\n",
+        "- `Slide`: the robot has to slide an object to a target position.\n",
+        "\n",
+        "After that, you'll be able to train other robotics environments."
+      ],
+      "metadata": {
+        "id": "-PTReiOw-RAN"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "TODO: ADD VIDEO OF WHAT IT LOOKS LIKE"
+      ],
+      "metadata": {
+        "id": "2VGL_0ncoAJI"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### 🎮 Environments: \n",
+        "\n",
+        "- [PyBullet](https://github.com/bulletphysics/bullet3)\n",
+        "- [Panda-Gym](https://github.com/qgallouedec/panda-gym)\n",
+        "\n",
+        "###📚 RL-Library: \n",
+        "\n",
+        "- [Stable-Baselines3](https://stable-baselines3.readthedocs.io/)"
+      ],
+      "metadata": {
+        "id": "QInFitfWno1Q"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the GitHub Repo](https://github.com/huggingface/deep-rl-class/issues)."
+      ],
+      "metadata": {
+        "id": "2CcdX4g3oFlp"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Objectives of this notebook 🏆\n",
+        "\n",
+        "At the end of the notebook, you will:\n",
+        "\n",
+        "- Be able to use **PyBullet** and **Panda-Gym**, the environment libraries.\n",
+        "- Be able to **train robots using A2C**.\n",
+        "- Understand why **we need to normalize the input**.\n",
+        "- Be able to **push your trained agent and the code to the Hub** with a nice video replay and an evaluation score 🔥.\n",
+        "\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "MoubJX20oKaQ"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## This notebook is from the Deep Reinforcement Learning Course\n",
+        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/deep-rl-course-illustration.jpg\" alt=\"Deep RL Course illustration\"/>\n",
+        "\n",
+        "In this free course, you will:\n",
+        "\n",
+        "- 📖 Study Deep Reinforcement Learning in **theory and practice**.\n",
+        "- 🧑‍💻 Learn to **use famous Deep RL libraries** such as Stable Baselines3, RL Baselines3 Zoo, CleanRL and Sample Factory 2.0.\n",
+        "- 🤖 Train **agents in unique environments** \n",
+        "\n",
+        "And more check 📚 the syllabus 👉 https://simoninithomas.github.io/deep-rl-course\n",
+        "\n",
+        "Don’t forget to **<a href=\"http://eepurl.com/ic5ZUD\">sign up to the course</a>** (we are collecting your email to be able to **send you the links when each Unit is published and give you information about the challenges and updates).**\n",
+        "\n",
+        "\n",
+        "The best way to keep in touch is to join our discord server to exchange with the community and with us 👉🏻 https://discord.gg/ydHrjt3WP5"
+      ],
+      "metadata": {
+        "id": "DoUNkTExoUED"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Prerequisites 🏗️\n",
+        "Before diving into the notebook, you need to:\n",
+        "\n",
+        "🔲 📚 Study [Actor-Critic methods by reading Unit 6](https://huggingface.co/deep-rl-course/unit6/introduction) 🤗  "
+      ],
+      "metadata": {
+        "id": "BTuQAUAPoa5E"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Let's train our first robots 🤖"
+      ],
+      "metadata": {
+        "id": "iajHvVDWoo01"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to:\n",
+        "\n",
+        "TODO ADD CERTIFICATION RECOMMENDATION\n",
+        "\n",
+        "To find your result, go to the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) and find your model, **the result = mean_reward - std of reward**\n",
+        "\n",
+        "For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process"
+      ],
+      "metadata": {
+        "id": "zbOENTE2os_D"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Set the GPU 💪\n",
+        "- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`\n",
+        "\n",
+        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step1.jpg\" alt=\"GPU Step 1\">"
+      ],
+      "metadata": {
+        "id": "PU4FVzaoM6fC"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "- `Hardware Accelerator > GPU`\n",
+        "\n",
+        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step2.jpg\" alt=\"GPU Step 2\">"
+      ],
+      "metadata": {
+        "id": "KV0NyFdQM9ZG"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Create a virtual display 🔽\n",
+        "\n",
+        "During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames). \n",
+        "\n",
+        "Hence the following cell will install the librairies and create and run a virtual screen 🖥"
+      ],
+      "metadata": {
+        "id": "bTpYcVZVMzUI"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "jV6wjQ7Be7p5"
+      },
+      "outputs": [],
+      "source": [
+        "%%capture\n",
+        "!apt install python-opengl\n",
+        "!apt install ffmpeg\n",
+        "!apt install xvfb\n",
+        "!pip3 install pyvirtualdisplay"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Additional dependencies for RL Baselines3 Zoo\n",
+        "!apt-get install swig cmake freeglut3-dev "
+      ],
+      "metadata": {
+        "id": "fWyKJCy_NJBX"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Virtual display\n",
+        "from pyvirtualdisplay import Display\n",
+        "\n",
+        "virtual_display = Display(visible=0, size=(1400, 900))\n",
+        "virtual_display.start()"
+      ],
+      "metadata": {
+        "id": "ww5PQH1gNLI4"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Install dependencies 🔽\n",
+        "The first step is to install the dependencies, we’ll install multiple ones:\n",
+        "\n",
+        "- `pybullet`: Contains the walking robots environments.\n",
+        "- `panda-gym`: Contains the robotics arm environments.\n",
+        "- `stable-baselines3[extra]`: The SB3 deep reinforcement learning library.\n",
+        "- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.\n",
+        "- `huggingface_hub`: Library allowing anyone to work with the Hub repositories.\n",
+        "\n",
+        "We're going to install **two versions of gym**:\n",
+        "- `gym==0.21`: The classical version of gym for PyBullet environments.\n",
+        "- `gymnasium`: [The new Gym library by Farama Foundation](https://github.com/Farama-Foundation/Gymnasium) for Panda Gym environments."
+      ],
+      "metadata": {
+        "id": "e1obkbdJ_KnG"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!pip install -r https://huggingface.co/spaces/ThomasSimonini/temp-space-requirements/raw/main/requirements/requirements-unit6.txt"
+      ],
+      "metadata": {
+        "id": "69jUeXrLryos"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "2yZRi_0bQGPM"
+      },
+      "outputs": [],
+      "source": [
+        "TODO: CHANGE TO THE ONE COMMENTED#!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit6/requirements-unit6.txt"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Import the packages 📦"
+      ],
+      "metadata": {
+        "id": "QTep3PQQABLr"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import gymnasium as gymnasium\n",
+        "import panda_gym\n",
+        "\n",
+        "import gym\n",
+        "import pybullet_envs\n",
+        "\n",
+        "import os\n",
+        "\n",
+        "from huggingface_sb3 import load_from_hub, package_to_hub\n",
+        "\n",
+        "from stable_baselines3 import A2C\n",
+        "from stable_baselines3.common.evaluation import evaluate_policy\n",
+        "from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize\n",
+        "from stable_baselines3.common.env_util import make_vec_env\n",
+        "\n",
+        "from huggingface_hub import notebook_login"
+      ],
+      "metadata": {
+        "id": "HpiB8VdnQ7Bk"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Part 1: PyBullet Environments\n"
+      ],
+      "metadata": {
+        "id": "KIqf-N-otczo"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Environment 1: AntBulletEnv-v0 🕸\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "lfBwIS_oAVXI"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Create the AntBulletEnv-v0\n",
+        "#### The environment 🎮\n",
+        "In this environment, the agent needs to use correctly its different joints to walk correctly."
+      ],
+      "metadata": {
+        "id": "frVXOrnlBerQ"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import gym # As mentionned we use gym for PyBullet and gymnasium for panda-gym"
+      ],
+      "metadata": {
+        "id": "RJ0XJccTt9FX"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "env_id = \"AntBulletEnv-v0\"\n",
+        "# Create the env\n",
+        "env = gym.make(env_id)\n",
+        "\n",
+        "# Get the state space and action space\n",
+        "s_size = env.observation_space.shape[0]\n",
+        "a_size = env.action_space"
+      ],
+      "metadata": {
+        "id": "JpU-JCDQYYax"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "print(\"_____OBSERVATION SPACE_____ \\n\")\n",
+        "print(\"The State Space is: \", s_size)\n",
+        "print(\"Sample observation\", env.observation_space.sample()) # Get a random observation"
+      ],
+      "metadata": {
+        "id": "2ZfvcCqEYgrg"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "print(\"\\n _____ACTION SPACE_____ \\n\")\n",
+        "print(\"The Action Space is: \", a_size)\n",
+        "print(\"Action Space Sample\", env.action_space.sample()) # Take a random action"
+      ],
+      "metadata": {
+        "id": "Tc89eLTYYkK2"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Normalize observation and rewards"
+      ],
+      "metadata": {
+        "id": "S5sXcg469ysB"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "A good practice in reinforcement learning is to [normalize input features](https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html). For that, a wrapper exists and will compute a running average and standard deviation of input features.\n",
+        "\n",
+        "We also normalize rewards with this same wrapper by adding `norm_reward = True`\n",
+        "\n",
+        "[You should check the documentation to fill this cell](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)"
+      ],
+      "metadata": {
+        "id": "1ZyX6qf3Zva9"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "env = make_vec_env(env_id, n_envs=4)\n",
+        "\n",
+        "# Adding this wrapper to normalize the observation and the reward\n",
+        "env = # TODO: Add the wrapper"
+      ],
+      "metadata": {
+        "id": "1RsDtHHAQ9Ie"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "#### Solution"
+      ],
+      "metadata": {
+        "id": "tF42HvI7-gs5"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "env = make_vec_env(env_id, n_envs=4)\n",
+        "\n",
+        "env = VecNormalize(env, norm_obs=True, norm_reward=False, clip_obs=10.)"
+      ],
+      "metadata": {
+        "id": "2O67mqgC-hol"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Create the A2C Model 🤖\n",
+        "\n",
+        "In this case, because we have a vector of 28 values as input, we'll use an MLP (multi-layer perceptron) as policy.\n",
+        "\n",
+        "To find the best parameters I checked the [official trained agents by Stable-Baselines3 team](https://huggingface.co/sb3)."
+      ],
+      "metadata": {
+        "id": "4JmEVU6z1ZA-"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "model = # Create the A2C model and try to find the best parameters"
+      ],
+      "metadata": {
+        "id": "vR3T4qFt164I"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "#### Solution"
+      ],
+      "metadata": {
+        "id": "nWAuOOLh-oQf"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "model = A2C(policy = \"MlpPolicy\",\n",
+        "            env = env,\n",
+        "            gae_lambda = 0.9,\n",
+        "            gamma = 0.99,\n",
+        "            learning_rate = 0.00096,\n",
+        "            max_grad_norm = 0.5,\n",
+        "            n_steps = 8,\n",
+        "            vf_coef = 0.4,\n",
+        "            ent_coef = 0.0,\n",
+        "            tensorboard_log = \"./tensorboard\",\n",
+        "            policy_kwargs=dict(\n",
+        "            log_std_init=-2, ortho_init=False),\n",
+        "            normalize_advantage=False,\n",
+        "            use_rms_prop= True,\n",
+        "            use_sde= True,\n",
+        "            verbose=1)"
+      ],
+      "metadata": {
+        "id": "FKFLY54T-pU1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Train the A2C agent 🏃\n",
+        "- Let's train our agent for 2,000,000 timesteps, don't forget to use GPU on Colab. It will take approximately ~25-40min"
+      ],
+      "metadata": {
+        "id": "opyK3mpJ1-m9"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "model.learn(2_000_000)"
+      ],
+      "metadata": {
+        "id": "4TuGHZD7RF1G"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Save the model and  VecNormalize statistics when saving the agent\n",
+        "model.save(\"a2c-AntBulletEnv-v0\")\n",
+        "env.save(\"vec_normalize.pkl\")"
+      ],
+      "metadata": {
+        "id": "MfYtjj19cKFr"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Evaluate the agent 📈\n",
+        "- Now that's our  agent is trained, we need to **check its performance**.\n",
+        "- Stable-Baselines3 provides a method to do that `evaluate_policy`\n",
+        "- In my case, I've got a mean reward of `2371.90 +/- 16.50`"
+      ],
+      "metadata": {
+        "id": "01M9GCd32Ig-"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize\n",
+        "\n",
+        "# Load the saved statistics\n",
+        "eval_env = DummyVecEnv([lambda: gym.make(\"AntBulletEnv-v0\")])\n",
+        "eval_env = VecNormalize.load(\"vec_normalize.pkl\", eval_env)\n",
+        "\n",
+        "#  do not update them at test time\n",
+        "eval_env.training = False\n",
+        "# reward normalization is not needed at test time\n",
+        "eval_env.norm_reward = False\n",
+        "\n",
+        "# Load the agent\n",
+        "model = A2C.load(\"a2c-AntBulletEnv-v0\")\n",
+        "\n",
+        "mean_reward, std_reward = evaluate_policy(model, env)\n",
+        "\n",
+        "print(f\"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}\")"
+      ],
+      "metadata": {
+        "id": "liirTVoDkHq3"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Publish your trained model on the Hub 🔥\n",
+        "Now that we saw we got good results after the training, we can publish our trained model on the hub 🤗 with one line of code.\n",
+        "\n",
+        "📚 The libraries documentation 👉 https://github.com/huggingface/huggingface_sb3/tree/main#hugging-face--x-stable-baselines3-v20\n",
+        "\n",
+        "Here's an example of a Model Card (with a PyBullet environment):\n",
+        "\n",
+        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/modelcardpybullet.png\" alt=\"Model Card Pybullet\"/>"
+      ],
+      "metadata": {
+        "id": "44L9LVQaavR8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "By using `package_to_hub`, as we already mentionned in the former units, **you evaluate, record a replay, generate a model card of your agent and push it to the hub**.\n",
+        "\n",
+        "This way:\n",
+        "- You can **showcase our work** 🔥\n",
+        "- You can **visualize your agent playing** 👀\n",
+        "- You can **share with the community an agent that others can use** 💾\n",
+        "- You can **access a leaderboard 🏆 to see how well your agent is performing compared to your classmates** 👉 https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard\n"
+      ],
+      "metadata": {
+        "id": "MkMk99m8bgaQ"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "JquRrWytA6eo"
+      },
+      "source": [
+        "To be able to share your model with the community there are three more steps to follow:\n",
+        "\n",
+        "1️⃣ (If it's not already done) create an account to HF ➡ https://huggingface.co/join\n",
+        "\n",
+        "2️⃣ Sign in and then, you need to store your authentication token from the Hugging Face website.\n",
+        "- Create a new token (https://huggingface.co/settings/tokens) **with write role**\n",
+        "\n",
+        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/create-token.jpg\" alt=\"Create HF Token\">\n",
+        "\n",
+        "- Copy the token \n",
+        "- Run the cell below and paste the token"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "GZiFBBlzxzxY"
+      },
+      "outputs": [],
+      "source": [
+        "notebook_login()\n",
+        "!git config --global credential.helper store"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "_tsf2uv0g_4p"
+      },
+      "source": [
+        "If you don't want to use a Google Colab or a Jupyter Notebook, you need to use this command instead: `huggingface-cli login`"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "FGNh9VsZok0i"
+      },
+      "source": [
+        "3️⃣ We're now ready to push our trained agent to the 🤗 Hub 🔥 using `package_to_hub()` function"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "package_to_hub(\n",
+        "    model=model,\n",
+        "    model_name=f\"a2c-{env_id}\",\n",
+        "    model_architecture=\"A2C\",\n",
+        "    env_id=env_id,\n",
+        "    eval_env=eval_env,\n",
+        "    repo_id=f\"ThomasSimonini/a2c-{env_id}\", # Change the username\n",
+        "    commit_message=\"Initial commit\",\n",
+        ")"
+      ],
+      "metadata": {
+        "id": "ueuzWVCUTkfS"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Environment 2: HalfCheetahBulletEnv-v0\n",
+        "\n",
+        "For this environment, you need to follow the same process that the first one. **Don't hesitate here to save this notebook to your Google Drive** since timeout can happen. You may also want to **complete this notebook in two times**.\n",
+        "\n",
+        "In order to see that you understood the complete process from environment definition to `package_to_hub` why not trying to do **it yourself first without solution?**\n",
+        "\n",
+        "1. Define the enviroment called HalfCheetahBulletEnv-v0\n",
+        "2. Make a vectorized environment\n",
+        "3. Add a wrapper to normalize the observations and rewards. [Check the documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)\n",
+        "4. Create the A2C Model\n",
+        "5. Train it for 2M Timesteps\n",
+        "6. Save the model and  VecNormalize statistics when saving the agent\n",
+        "7. Evaluate your agent\n",
+        "8. Publish your trained model on the Hub 🔥 with `package_to_hub`"
+      ],
+      "metadata": {
+        "id": "-voECBK3An9j"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Take a coffee break ☕\n",
+        "- You already trained two robotics environments that learned to move congratutlations 🥳!\n",
+        "- It's **time to take a break**. Don't hesitate to **save this notebook** `File > Save a copy to Drive` to work on this second part later.\n"
+      ],
+      "metadata": {
+        "id": "Qk9ykOk9D6Qh"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Part 2: Robotic Arm Environments with `panda-gym`\n"
+      ],
+      "metadata": {
+        "id": "5VWfwAA7EJg7"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [],
+      "metadata": {
+        "id": "fW_CdlUsEVP2"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Some additional challenges 🏆\n",
+        "The best way to learn **is to try things by your own**! Why not trying  `HalfCheetahBulletEnv-v0`?\n",
+        "\n",
+        "In the [Leaderboard](https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top?\n",
+        "\n",
+        "Here are some ideas to achieve so:\n",
+        "* Train more steps\n",
+        "* Try different hyperparameters by looking at what your classmates have done 👉 https://huggingface.co/models?other=https://huggingface.co/models?other=AntBulletEnv-v0\n",
+        "* **Push your new trained model** on the Hub 🔥\n"
+      ],
+      "metadata": {
+        "id": "G3xy3Nf3c2O1"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "See you on Unit 8! 🔥\n",
+        "## Keep learning, stay awesome 🤗"
+      ],
+      "metadata": {
+        "id": "usatLaZ8dM4P"
+      }
+    }
+  ]
+}