Créé avec Colaboratory

2026-02-10 05:35:13 +08:00 · 2023-01-17 07:34:49 +01:00
parent 368b54970f
commit d406e5bb08
1 changed files with 138 additions and 122 deletions
--- a/notebooks/unit6/unit6.ipynb
+++ b/notebooks/unit6/unit6.ipynb
@@ -5,18 +5,7 @@
    "colab": {
      "provenance": [],
      "private_outputs": true,
-      "collapsed_sections": [
-        "MoubJX20oKaQ",
-        "DoUNkTExoUED",
-        "BTuQAUAPoa5E",
-        "tF42HvI7-gs5",
-        "nWAuOOLh-oQf",
-        "-voECBK3An9j",
-        "Qk9ykOk9D6Qh",
-        "G3xy3Nf3c2O1",
-        "usatLaZ8dM4P"
-      ],
-      "authorship_tag": "ABX9TyPovbUwEqbQAH1J8OxiHKDm",
+      "authorship_tag": "ABX9TyNTCZRW9WsSED/roRBW2oQ5",
      "include_colab_link": true
    },
    "kernelspec": {
@@ -47,17 +36,15 @@
        "\n",
        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/thumbnail.png\"  alt=\"Thumbnail\"/>\n",
        "\n",
-        "In this small notebook you'll learn to use A2C with PyBullet and Panda-Gym two set of robotics environments. \n",
+        "In this notebook, you'll learn to use A2C with PyBullet and Panda-Gym, two set of robotics environments. \n",
        "\n",
-        "With [PyBullet](https://github.com/bulletphysics/bullet3), you're going to **train robots to walk and run**:\n",
-        "- `AntBulletEnv-v0` 🕸️ More precisely a spider (they say Ant but come on... it's a spider 😆) 🕸️\n",
-        "- `HalfCheetahBulletEnv-v0`\n",
+        "With [PyBullet](https://github.com/bulletphysics/bullet3), you're going to **train a robot to move**:\n",
+        "- `AntBulletEnv-v0` 🕸️ More precisely, a spider (they say Ant but come on... it's a spider 😆) 🕸️\n",
        "\n",
-        "Then, with [Panda-Gym](https://github.com/qgallouedec/panda-gym), you're going **to train a robotic arm** (Franka Emika Panda robot) to perform some tasks:\n",
+        "Then, with [Panda-Gym](https://github.com/qgallouedec/panda-gym), you're going **to train a robotic arm** (Franka Emika Panda robot) to perform a task:\n",
        "- `Reach`: the robot must place its end-effector at a target position.\n",
-        "- `Slide`: the robot has to slide an object to a target position.\n",
        "\n",
-        "After that, you'll be able to train other robotics environments."
+        "After that, you'll be able **to train in other robotics environments**.\n"
      ],
      "metadata": {
        "id": "-PTReiOw-RAN"
@@ -66,7 +53,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "TODO: ADD VIDEO OF WHAT IT LOOKS LIKE"
+        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/environments.gif\" alt=\"Robotics environments\"/>"
      ],
      "metadata": {
        "id": "2VGL_0ncoAJI"
@@ -162,12 +149,15 @@
    {
      "cell_type": "markdown",
      "source": [
-        "To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to:\n",
+        "To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to push three models:\n",
        "\n",
-        "TODO ADD CERTIFICATION RECOMMENDATION\n",
+        "- `AntBulletEnv-v0` get a result of >= 650.\n",
+        "- `PandaReachDense-v2` get a result of >= -3.5.\n",
        "\n",
        "To find your result, go to the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) and find your model, **the result = mean_reward - std of reward**\n",
        "\n",
+        "If you don't find your model, **go to the bottom of the page and click on the refresh button**\n",
+        "\n",
        "For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process"
      ],
      "metadata": {
@@ -225,18 +215,6 @@
        "!pip3 install pyvirtualdisplay"
      ]
    },
-    {
-      "cell_type": "code",
-      "source": [
-        "# Additional dependencies for RL Baselines3 Zoo\n",
-        "!apt-get install swig cmake freeglut3-dev "
-      ],
-      "metadata": {
-        "id": "fWyKJCy_NJBX"
-      },
-      "execution_count": null,
-      "outputs": []
-    },
    {
      "cell_type": "code",
      "source": [
@@ -262,24 +240,12 @@
        "- `panda-gym`: Contains the robotics arm environments.\n",
        "- `stable-baselines3[extra]`: The SB3 deep reinforcement learning library.\n",
        "- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.\n",
-        "- `huggingface_hub`: Library allowing anyone to work with the Hub repositories.\n",
-        "- `gym==0.21`: The classical version of gym."
+        "- `huggingface_hub`: Library allowing anyone to work with the Hub repositories."
      ],
      "metadata": {
        "id": "e1obkbdJ_KnG"
      }
    },
-    {
-      "cell_type": "code",
-      "source": [
-        "!pip install -r https://huggingface.co/spaces/ThomasSimonini/temp-space-requirements/raw/main/requirements/requirements-unit6.txt"
-      ],
-      "metadata": {
-        "id": "69jUeXrLryos"
-      },
-      "execution_count": null,
-      "outputs": []
-    },
    {
      "cell_type": "code",
      "execution_count": null,
@@ -288,7 +254,7 @@
      },
      "outputs": [],
      "source": [
-        "TODO: CHANGE TO THE ONE COMMENTED#!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit6/requirements-unit6.txt"
+        "!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit6/requirements-unit6.txt"
      ]
    },
    {
@@ -303,11 +269,9 @@
    {
      "cell_type": "code",
      "source": [
-        "import gym\n",
        "import pybullet_envs\n",
-        "\n",
-        "import gymnasium\n",
        "import panda_gym\n",
+        "import gym\n",
        "\n",
        "import os\n",
        "\n",
@@ -326,15 +290,6 @@
      "execution_count": null,
      "outputs": []
    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "# Part 1: PyBullet Environments\n"
-      ],
-      "metadata": {
-        "id": "KIqf-N-otczo"
-      }
-    },
    {
      "cell_type": "markdown",
      "source": [
@@ -350,23 +305,13 @@
      "source": [
        "### Create the AntBulletEnv-v0\n",
        "#### The environment 🎮\n",
-        "In this environment, the agent needs to use correctly its different joints to walk correctly."
+        "In this environment, the agent needs to use correctly its different joints to walk correctly.\n",
+        "You can find a detailled explanation of this environment here: https://hackmd.io/@jeffreymo/SJJrSJh5_#PyBullet"
      ],
      "metadata": {
        "id": "frVXOrnlBerQ"
      }
    },
-    {
-      "cell_type": "code",
-      "source": [
-        "import gym"
-      ],
-      "metadata": {
-        "id": "RJ0XJccTt9FX"
-      },
-      "execution_count": null,
-      "outputs": []
-    },
    {
      "cell_type": "code",
      "source": [
@@ -400,7 +345,9 @@
    {
      "cell_type": "markdown",
      "source": [
-        "TODO: Add explanation obs space"
+        "The observation Space (from [Jeffrey Y Mo](https://hackmd.io/@jeffreymo/SJJrSJh5_#PyBullet)):\n",
+        "\n",
+        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/obs_space.png\" alt=\"PyBullet Ant Obs space\"/>\n"
      ],
      "metadata": {
        "id": "QzMmsdMJS7jh"
@@ -422,7 +369,9 @@
    {
      "cell_type": "markdown",
      "source": [
-        "Todo: Add explanation action space"
+        "The action Space (from [Jeffrey Y Mo](https://hackmd.io/@jeffreymo/SJJrSJh5_#PyBullet)):\n",
+        "\n",
+        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/action_space.png\" alt=\"PyBullet Ant Obs space\"/>\n"
      ],
      "metadata": {
        "id": "3RfsHhzZS9Pw"
@@ -440,7 +389,9 @@
    {
      "cell_type": "markdown",
      "source": [
-        "A good practice in reinforcement learning is to [normalize input features](https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html). For that, a wrapper exists and will compute a running average and standard deviation of input features.\n",
+        "A good practice in reinforcement learning is to [normalize input features](https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html). \n",
+        "\n",
+        "For that, a wrapper exists and will compute a running average and standard deviation of input features.\n",
        "\n",
        "We also normalize rewards with this same wrapper by adding `norm_reward = True`\n",
        "\n",
@@ -493,6 +444,8 @@
        "\n",
        "In this case, because we have a vector of 28 values as input, we'll use an MLP (multi-layer perceptron) as policy.\n",
        "\n",
+        "For more information about A2C implementation with StableBaselines3 check: https://stable-baselines3.readthedocs.io/en/master/modules/a2c.html#notes\n",
+        "\n",
        "To find the best parameters I checked the [official trained agents by Stable-Baselines3 team](https://huggingface.co/sb3)."
      ],
      "metadata": {
@@ -531,7 +484,6 @@
        "            n_steps = 8,\n",
        "            vf_coef = 0.4,\n",
        "            ent_coef = 0.0,\n",
-        "            tensorboard_log = \"./tensorboard\",\n",
        "            policy_kwargs=dict(\n",
        "            log_std_init=-2, ortho_init=False),\n",
        "            normalize_advantage=False,\n",
@@ -717,33 +669,11 @@
      "execution_count": null,
      "outputs": []
    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "## Environment 2: HalfCheetahBulletEnv-v0\n",
-        "\n",
-        "For this environment, you must follow the same process as the first one. **Don't hesitate to save this notebook to your Google Drive** since timeout can happen. You may also want to **complete this notebook two times**.\n",
-        "\n",
-        "To see that you understood the complete process from environment definition to `package_to_hub` why not try to do **it yourself first without the solution?**\n",
-        "\n",
-        "1. Define the environment called HalfCheetahBulletEnv-v0\n",
-        "2. Make a vectorized environment\n",
-        "3. Add a wrapper to normalize the observations and rewards. [Check the documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)\n",
-        "4. Create the A2C Model\n",
-        "5. Train it for 2M Timesteps\n",
-        "6. Save the model and  VecNormalize statistics when saving the agent\n",
-        "7. Evaluate your agent\n",
-        "8. Publish your trained model on the Hub 🔥 with `package_to_hub`"
-      ],
-      "metadata": {
-        "id": "-voECBK3An9j"
-      }
-    },
    {
      "cell_type": "markdown",
      "source": [
        "## Take a coffee break ☕\n",
-        "- You already trained two robotics environments that learned to move congratutlations 🥳!\n",
+        "- You already trained your first robot that learned to move congratutlations 🥳!\n",
        "- It's **time to take a break**. Don't hesitate to **save this notebook** `File > Save a copy to Drive` to work on this second part later.\n"
      ],
      "metadata": {
@@ -753,16 +683,15 @@
    {
      "cell_type": "markdown",
      "source": [
-        "# Part 2: Robotic Arm Environments with `panda-gym`\n",
+        "## Environment 2: PandaReachDense-v2 🦾\n",
        "\n",
-        "The second set of robotics environments we're going to train are a robotic arm that needs to do controls (moving the arm and using the end-effector).\n",
+        "The agent we're going to train is a robotic arm that needs to do controls (moving the arm and using the end-effector).\n",
        "\n",
        "In robotics, the *end-effector* is the device at the end of a robotic arm designed to interact with the environment.\n",
        "\n",
-        "1. In the first environment, `PandaReach`, the robot must place its end-effector at a target position (green ball).\n",
-        "2. In the second environment, `PandaSlide`, the robot has to slide an object to a target position.\n",
+        "In `PandaReach`, the robot must place its end-effector at a target position (green ball).\n",
        "\n",
-        "We're going to use the dense version of the environments. It means we'll get a *dense reward function* that **will provide a reward at each timestep** (the closer the agent is to complete the task, the higher the reward). Contrary to a *sparse reward function* where the environment **return a reward if and only if the task is completed**.\n",
+        "We're going to use the dense version of this environment. It means we'll get a *dense reward function* that **will provide a reward at each timestep** (the closer the agent is to completing the task, the higher the reward). Contrary to a *sparse reward function* where the environment **return a reward if and only if the task is completed**.\n",
        "\n",
        "Also, we're going to use the *End-effector displacement control*, it means the **action corresponds to the displacement of the end-effector**. We don't control the individual motion of each joint (joint control).\n",
        "\n",
@@ -776,10 +705,24 @@
        "id": "5VWfwAA7EJg7"
      }
    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "\n",
+        "\n",
+        "In `PandaReachDense-v2` the robotic arm must place its end-effector at a target position (green ball).\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "oZ7FyDEi7G3T"
+      }
+    },
    {
      "cell_type": "code",
      "source": [
-        "env_id = \"PandaReachDense-v2\"\n",
+        "import gym\n",
+        "\n",
+        "env_id = \"PandaPushDense-v2\"\n",
        "\n",
        "# Create the env\n",
        "env = gym.make(env_id)\n",
@@ -810,11 +753,12 @@
    {
      "cell_type": "markdown",
      "source": [
-        "The observation space is a dictionary with 3 different element:\n",
+        "The observation space **is a dictionary with 3 different element**:\n",
        "- `achieved_goal`: (x,y,z) position of the goal.\n",
        "- `desired_goal`: (x,y,z) distance between the goal position and the current object position.\n",
        "- `observation`: position (x,y,z) and velocity of the end-effector (vx, vy, vz).\n",
-        "\n"
+        "\n",
+        "Given it's a dictionary as observation, **we will need to use a MultiInputPolicy policy instead of MlpPolicy**."
      ],
      "metadata": {
        "id": "g_JClfElGFnF"
@@ -836,35 +780,103 @@
    {
      "cell_type": "markdown",
      "source": [
-        "TODO: ADd action space"
+        "The action space is a vector with 3 values:\n",
+        "- Control x, y, z movement"
      ],
      "metadata": {
        "id": "5MHTHEHZS4yp"
      }
    },
    {
-      "cell_type": "code",
+      "cell_type": "markdown",
      "source": [
+        "Now it's your turn:\n",
        "\n",
-        "\n",
-        "\n",
-        "\n",
-        "\n",
-        "\n",
-        "model = A2C(\"MultiInputPolicy\", env)\n",
-        "model.learn(total_timesteps=100000)"
+        "1. Define the environment called \"PandaReachDense-v2\"\n",
+        "2. Make a vectorized environment\n",
+        "3. Add a wrapper to normalize the observations and rewards. [Check the documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)\n",
+        "4. Create the A2C Model (don't forget verbose=1 to print the training logs).\n",
+        "5. Train it for 2M Timesteps\n",
+        "6. Save the model and  VecNormalize statistics when saving the agent\n",
+        "7. Evaluate your agent\n",
+        "8. Publish your trained model on the Hub 🔥 with `package_to_hub`"
      ],
      "metadata": {
-        "id": "C-3SfbJr0N7I"
+        "id": "nIhPoc5t9HjG"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Solution (fill the todo)"
+      ],
+      "metadata": {
+        "id": "sKGbFXZq9ikN"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# 1 - 2\n",
+        "env_id = \"PandaReachDense-v2\"\n",
+        "env = make_vec_env(env_id, n_envs=4)\n",
+        "\n",
+        "# 3\n",
+        "env = VecNormalize(env, norm_obs=True, norm_reward=False, clip_obs=10.)\n",
+        "\n",
+        "# 4\n",
+        "model = A2C(policy = \"MultiInputPolicy\",\n",
+        "            env = env,\n",
+        "            verbose=1)\n",
+        "# 5\n",
+        "model.learn(1_000_000)"
+      ],
+      "metadata": {
+        "id": "J-cC-Feg9iMm"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
-      "source": [],
+      "source": [
+        "# 6\n",
+        "model_name = \"a2c-PandaReachDense-v2\"; \n",
+        "model.save(model_name)\n",
+        "env.save(\"vec_normalize.pkl\")\n",
+        "\n",
+        "# 7\n",
+        "from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize\n",
+        "\n",
+        "# Load the saved statistics\n",
+        "eval_env = DummyVecEnv([lambda: gym.make(\"PandaReachDense-v2\")])\n",
+        "eval_env = VecNormalize.load(\"vec_normalize.pkl\", eval_env)\n",
+        "\n",
+        "#  do not update them at test time\n",
+        "eval_env.training = False\n",
+        "# reward normalization is not needed at test time\n",
+        "eval_env.norm_reward = False\n",
+        "\n",
+        "# Load the agent\n",
+        "model = A2C.load(model_name)\n",
+        "\n",
+        "mean_reward, std_reward = evaluate_policy(model, env)\n",
+        "\n",
+        "print(f\"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}\")\n",
+        "\n",
+        "# 8\n",
+        "package_to_hub(\n",
+        "    model=model,\n",
+        "    model_name=f\"a2c-{env_id}\",\n",
+        "    model_architecture=\"A2C\",\n",
+        "    env_id=env_id,\n",
+        "    eval_env=eval_env,\n",
+        "    repo_id=f\"ThomasSimonini/a2c-{env_id}\", # TODO: Change the username\n",
+        "    commit_message=\"Initial commit\",\n",
+        ")"
+      ],
      "metadata": {
-        "id": "16pttUsKFyZY"
+        "id": "-UnlKLmpg80p"
      },
      "execution_count": null,
      "outputs": []
@@ -873,9 +885,13 @@
      "cell_type": "markdown",
      "source": [
        "## Some additional challenges 🏆\n",
-        "The best way to learn **is to try things by your own**! Why not trying  `HalfCheetahBulletEnv-v0`?\n",
+        "The best way to learn **is to try things by your own**! Why not trying  `HalfCheetahBulletEnv-v0` for PyBullet?\n",
        "\n",
-        "In the [Leaderboard](https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top?\n",
+        "If you want to try more advanced tasks for panda-gym you need to check what was done using **TQC or SAC** (a more sample efficient algorithm suited for robotics tasks). In real robotics, you'll use more sample-efficient algorithm for a simple reason: contrary to a simulation **if you move your robotic arm too much you have a risk to break it**.\n",
+        "\n",
+        "PandaPickAndPlace-v1: https://huggingface.co/sb3/tqc-PandaPickAndPlace-v1\n",
+        "\n",
+        "And don't hesitate to check panda-gym documentation here: https://panda-gym.readthedocs.io/en/latest/usage/train_with_sb3.html\n",
        "\n",
        "Here are some ideas to achieve so:\n",
        "* Train more steps\n",
@@ -889,7 +905,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "See you on Unit 8! 🔥\n",
+        "See you on Unit 7! 🔥\n",
        "## Keep learning, stay awesome 🤗"
      ],
      "metadata": {