From be7f8a34f0b4bd4b6a00be602a650cb4b9221e59 Mon Sep 17 00:00:00 2001
From: Thomas Simonini <simonini.thomas.pro@gmail.com>
Date: Mon, 2 Jan 2023 12:44:57 +0100
Subject: [PATCH] Update notebook

---
 notebooks/unit6/unit6.ipynb | 161 ++++++++++++++++++++++++++++++++----
 1 file changed, 145 insertions(+), 16 deletions(-)
diff --git a/notebooks/unit6/unit6.ipynb b/notebooks/unit6/unit6.ipynb
index 8ecae3c..7358a72 100644
--- a/notebooks/unit6/unit6.ipynb
+++ b/notebooks/unit6/unit6.ipynb
@@ -5,7 +5,18 @@
     "colab": {
       "provenance": [],
       "private_outputs": true,
-      "authorship_tag": "ABX9TyM4Z04oGTU1B2rRuxHfuNly",
+      "collapsed_sections": [
+        "MoubJX20oKaQ",
+        "DoUNkTExoUED",
+        "BTuQAUAPoa5E",
+        "tF42HvI7-gs5",
+        "nWAuOOLh-oQf",
+        "-voECBK3An9j",
+        "Qk9ykOk9D6Qh",
+        "G3xy3Nf3c2O1",
+        "usatLaZ8dM4P"
+      ],
+      "authorship_tag": "ABX9TyPovbUwEqbQAH1J8OxiHKDm",
       "include_colab_link": true
     },
     "kernelspec": {
@@ -34,7 +45,7 @@
       "source": [
         "# Unit 6: Advantage Actor Critic (A2C) using Robotics Simulations with PyBullet and Panda-Gym 🤖\n",
         "\n",
-        "TODO: ADD THUMBNAIL\n",
+        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/thumbnail.png\"  alt=\"Thumbnail\"/>\n",
         "\n",
         "In this small notebook you'll learn to use A2C with PyBullet and Panda-Gym two set of robotics environments. \n",
         "\n",
@@ -252,10 +263,7 @@
         "- `stable-baselines3[extra]`: The SB3 deep reinforcement learning library.\n",
         "- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.\n",
         "- `huggingface_hub`: Library allowing anyone to work with the Hub repositories.\n",
-        "\n",
-        "We're going to install **two versions of gym**:\n",
-        "- `gym==0.21`: The classical version of gym for PyBullet environments.\n",
-        "- `gymnasium`: [The new Gym library by Farama Foundation](https://github.com/Farama-Foundation/Gymnasium) for Panda Gym environments."
+        "- `gym==0.21`: The classical version of gym."
       ],
       "metadata": {
         "id": "e1obkbdJ_KnG"
@@ -295,12 +303,12 @@
     {
       "cell_type": "code",
       "source": [
-        "import gymnasium as gymnasium\n",
-        "import panda_gym\n",
-        "\n",
         "import gym\n",
         "import pybullet_envs\n",
         "\n",
+        "import gymnasium\n",
+        "import panda_gym\n",
+        "\n",
         "import os\n",
         "\n",
         "from huggingface_sb3 import load_from_hub, package_to_hub\n",
@@ -351,7 +359,7 @@
     {
       "cell_type": "code",
       "source": [
-        "import gym # As mentionned we use gym for PyBullet and gymnasium for panda-gym"
+        "import gym"
       ],
       "metadata": {
         "id": "RJ0XJccTt9FX"
@@ -389,6 +397,15 @@
       "execution_count": null,
       "outputs": []
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "TODO: Add explanation obs space"
+      ],
+      "metadata": {
+        "id": "QzMmsdMJS7jh"
+      }
+    },
     {
       "cell_type": "code",
       "source": [
@@ -402,6 +419,15 @@
       "execution_count": null,
       "outputs": []
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Todo: Add explanation action space"
+      ],
+      "metadata": {
+        "id": "3RfsHhzZS9Pw"
+      }
+    },
     {
       "cell_type": "markdown",
       "source": [
@@ -696,11 +722,11 @@
       "source": [
         "## Environment 2: HalfCheetahBulletEnv-v0\n",
         "\n",
-        "For this environment, you need to follow the same process that the first one. **Don't hesitate here to save this notebook to your Google Drive** since timeout can happen. You may also want to **complete this notebook in two times**.\n",
+        "For this environment, you must follow the same process as the first one. **Don't hesitate to save this notebook to your Google Drive** since timeout can happen. You may also want to **complete this notebook two times**.\n",
         "\n",
-        "In order to see that you understood the complete process from environment definition to `package_to_hub` why not trying to do **it yourself first without solution?**\n",
+        "To see that you understood the complete process from environment definition to `package_to_hub` why not try to do **it yourself first without the solution?**\n",
         "\n",
-        "1. Define the enviroment called HalfCheetahBulletEnv-v0\n",
+        "1. Define the environment called HalfCheetahBulletEnv-v0\n",
         "2. Make a vectorized environment\n",
         "3. Add a wrapper to normalize the observations and rewards. [Check the documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)\n",
         "4. Create the A2C Model\n",
@@ -727,18 +753,121 @@
     {
       "cell_type": "markdown",
       "source": [
-        "# Part 2: Robotic Arm Environments with `panda-gym`\n"
+        "# Part 2: Robotic Arm Environments with `panda-gym`\n",
+        "\n",
+        "The second set of robotics environments we're going to train are a robotic arm that needs to do controls (moving the arm and using the end-effector).\n",
+        "\n",
+        "In robotics, the *end-effector* is the device at the end of a robotic arm designed to interact with the environment.\n",
+        "\n",
+        "1. In the first environment, `PandaReach`, the robot must place its end-effector at a target position (green ball).\n",
+        "2. In the second environment, `PandaSlide`, the robot has to slide an object to a target position.\n",
+        "\n",
+        "We're going to use the dense version of the environments. It means we'll get a *dense reward function* that **will provide a reward at each timestep** (the closer the agent is to complete the task, the higher the reward). Contrary to a *sparse reward function* where the environment **return a reward if and only if the task is completed**.\n",
+        "\n",
+        "Also, we're going to use the *End-effector displacement control*, it means the **action corresponds to the displacement of the end-effector**. We don't control the individual motion of each joint (joint control).\n",
+        "\n",
+        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/robotics.jpg\"  alt=\"Robotics\"/>\n",
+        "\n",
+        "\n",
+        "This way, **the training will be easier**.\n",
+        "\n"
       ],
       "metadata": {
         "id": "5VWfwAA7EJg7"
       }
     },
+    {
+      "cell_type": "code",
+      "source": [
+        "env_id = \"PandaReachDense-v2\"\n",
+        "\n",
+        "# Create the env\n",
+        "env = gym.make(env_id)\n",
+        "\n",
+        "# Get the state space and action space\n",
+        "s_size = env.observation_space.shape\n",
+        "a_size = env.action_space"
+      ],
+      "metadata": {
+        "id": "zXzAu3HYF1WD"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "print(\"_____OBSERVATION SPACE_____ \\n\")\n",
+        "print(\"The State Space is: \", s_size)\n",
+        "print(\"Sample observation\", env.observation_space.sample()) # Get a random observation"
+      ],
+      "metadata": {
+        "id": "E-U9dexcF-FB"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
     {
       "cell_type": "markdown",
+      "source": [
+        "The observation space is a dictionary with 3 different element:\n",
+        "- `achieved_goal`: (x,y,z) position of the goal.\n",
+        "- `desired_goal`: (x,y,z) distance between the goal position and the current object position.\n",
+        "- `observation`: position (x,y,z) and velocity of the end-effector (vx, vy, vz).\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "g_JClfElGFnF"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "print(\"\\n _____ACTION SPACE_____ \\n\")\n",
+        "print(\"The Action Space is: \", a_size)\n",
+        "print(\"Action Space Sample\", env.action_space.sample()) # Take a random action"
+      ],
+      "metadata": {
+        "id": "ib1Kxy4AF-FC"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "TODO: ADd action space"
+      ],
+      "metadata": {
+        "id": "5MHTHEHZS4yp"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "\n",
+        "\n",
+        "\n",
+        "\n",
+        "\n",
+        "\n",
+        "model = A2C(\"MultiInputPolicy\", env)\n",
+        "model.learn(total_timesteps=100000)"
+      ],
+      "metadata": {
+        "id": "C-3SfbJr0N7I"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
       "source": [],
       "metadata": {
-        "id": "fW_CdlUsEVP2"
-      }
+        "id": "16pttUsKFyZY"
+      },
+      "execution_count": null,
+      "outputs": []
     },
     {
       "cell_type": "markdown",