From be7f8a34f0b4bd4b6a00be602a650cb4b9221e59 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Mon, 2 Jan 2023 12:44:57 +0100 Subject: [PATCH] Update notebook --- notebooks/unit6/unit6.ipynb | 161 ++++++++++++++++++++++++++++++++---- 1 file changed, 145 insertions(+), 16 deletions(-) diff --git a/notebooks/unit6/unit6.ipynb b/notebooks/unit6/unit6.ipynb index 8ecae3c..7358a72 100644 --- a/notebooks/unit6/unit6.ipynb +++ b/notebooks/unit6/unit6.ipynb @@ -5,7 +5,18 @@ "colab": { "provenance": [], "private_outputs": true, - "authorship_tag": "ABX9TyM4Z04oGTU1B2rRuxHfuNly", + "collapsed_sections": [ + "MoubJX20oKaQ", + "DoUNkTExoUED", + "BTuQAUAPoa5E", + "tF42HvI7-gs5", + "nWAuOOLh-oQf", + "-voECBK3An9j", + "Qk9ykOk9D6Qh", + "G3xy3Nf3c2O1", + "usatLaZ8dM4P" + ], + "authorship_tag": "ABX9TyPovbUwEqbQAH1J8OxiHKDm", "include_colab_link": true }, "kernelspec": { @@ -34,7 +45,7 @@ "source": [ "# Unit 6: Advantage Actor Critic (A2C) using Robotics Simulations with PyBullet and Panda-Gym 🤖\n", "\n", - "TODO: ADD THUMBNAIL\n", + "\"Thumbnail\"/\n", "\n", "In this small notebook you'll learn to use A2C with PyBullet and Panda-Gym two set of robotics environments. \n", "\n", @@ -252,10 +263,7 @@ "- `stable-baselines3[extra]`: The SB3 deep reinforcement learning library.\n", "- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.\n", "- `huggingface_hub`: Library allowing anyone to work with the Hub repositories.\n", - "\n", - "We're going to install **two versions of gym**:\n", - "- `gym==0.21`: The classical version of gym for PyBullet environments.\n", - "- `gymnasium`: [The new Gym library by Farama Foundation](https://github.com/Farama-Foundation/Gymnasium) for Panda Gym environments." + "- `gym==0.21`: The classical version of gym." ], "metadata": { "id": "e1obkbdJ_KnG" @@ -295,12 +303,12 @@ { "cell_type": "code", "source": [ - "import gymnasium as gymnasium\n", - "import panda_gym\n", - "\n", "import gym\n", "import pybullet_envs\n", "\n", + "import gymnasium\n", + "import panda_gym\n", + "\n", "import os\n", "\n", "from huggingface_sb3 import load_from_hub, package_to_hub\n", @@ -351,7 +359,7 @@ { "cell_type": "code", "source": [ - "import gym # As mentionned we use gym for PyBullet and gymnasium for panda-gym" + "import gym" ], "metadata": { "id": "RJ0XJccTt9FX" @@ -389,6 +397,15 @@ "execution_count": null, "outputs": [] }, + { + "cell_type": "markdown", + "source": [ + "TODO: Add explanation obs space" + ], + "metadata": { + "id": "QzMmsdMJS7jh" + } + }, { "cell_type": "code", "source": [ @@ -402,6 +419,15 @@ "execution_count": null, "outputs": [] }, + { + "cell_type": "markdown", + "source": [ + "Todo: Add explanation action space" + ], + "metadata": { + "id": "3RfsHhzZS9Pw" + } + }, { "cell_type": "markdown", "source": [ @@ -696,11 +722,11 @@ "source": [ "## Environment 2: HalfCheetahBulletEnv-v0\n", "\n", - "For this environment, you need to follow the same process that the first one. **Don't hesitate here to save this notebook to your Google Drive** since timeout can happen. You may also want to **complete this notebook in two times**.\n", + "For this environment, you must follow the same process as the first one. **Don't hesitate to save this notebook to your Google Drive** since timeout can happen. You may also want to **complete this notebook two times**.\n", "\n", - "In order to see that you understood the complete process from environment definition to `package_to_hub` why not trying to do **it yourself first without solution?**\n", + "To see that you understood the complete process from environment definition to `package_to_hub` why not try to do **it yourself first without the solution?**\n", "\n", - "1. Define the enviroment called HalfCheetahBulletEnv-v0\n", + "1. Define the environment called HalfCheetahBulletEnv-v0\n", "2. Make a vectorized environment\n", "3. Add a wrapper to normalize the observations and rewards. [Check the documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)\n", "4. Create the A2C Model\n", @@ -727,18 +753,121 @@ { "cell_type": "markdown", "source": [ - "# Part 2: Robotic Arm Environments with `panda-gym`\n" + "# Part 2: Robotic Arm Environments with `panda-gym`\n", + "\n", + "The second set of robotics environments we're going to train are a robotic arm that needs to do controls (moving the arm and using the end-effector).\n", + "\n", + "In robotics, the *end-effector* is the device at the end of a robotic arm designed to interact with the environment.\n", + "\n", + "1. In the first environment, `PandaReach`, the robot must place its end-effector at a target position (green ball).\n", + "2. In the second environment, `PandaSlide`, the robot has to slide an object to a target position.\n", + "\n", + "We're going to use the dense version of the environments. It means we'll get a *dense reward function* that **will provide a reward at each timestep** (the closer the agent is to complete the task, the higher the reward). Contrary to a *sparse reward function* where the environment **return a reward if and only if the task is completed**.\n", + "\n", + "Also, we're going to use the *End-effector displacement control*, it means the **action corresponds to the displacement of the end-effector**. We don't control the individual motion of each joint (joint control).\n", + "\n", + "\"Robotics\"/\n", + "\n", + "\n", + "This way, **the training will be easier**.\n", + "\n" ], "metadata": { "id": "5VWfwAA7EJg7" } }, + { + "cell_type": "code", + "source": [ + "env_id = \"PandaReachDense-v2\"\n", + "\n", + "# Create the env\n", + "env = gym.make(env_id)\n", + "\n", + "# Get the state space and action space\n", + "s_size = env.observation_space.shape\n", + "a_size = env.action_space" + ], + "metadata": { + "id": "zXzAu3HYF1WD" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "print(\"_____OBSERVATION SPACE_____ \\n\")\n", + "print(\"The State Space is: \", s_size)\n", + "print(\"Sample observation\", env.observation_space.sample()) # Get a random observation" + ], + "metadata": { + "id": "E-U9dexcF-FB" + }, + "execution_count": null, + "outputs": [] + }, { "cell_type": "markdown", + "source": [ + "The observation space is a dictionary with 3 different element:\n", + "- `achieved_goal`: (x,y,z) position of the goal.\n", + "- `desired_goal`: (x,y,z) distance between the goal position and the current object position.\n", + "- `observation`: position (x,y,z) and velocity of the end-effector (vx, vy, vz).\n", + "\n" + ], + "metadata": { + "id": "g_JClfElGFnF" + } + }, + { + "cell_type": "code", + "source": [ + "print(\"\\n _____ACTION SPACE_____ \\n\")\n", + "print(\"The Action Space is: \", a_size)\n", + "print(\"Action Space Sample\", env.action_space.sample()) # Take a random action" + ], + "metadata": { + "id": "ib1Kxy4AF-FC" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "TODO: ADd action space" + ], + "metadata": { + "id": "5MHTHEHZS4yp" + } + }, + { + "cell_type": "code", + "source": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "model = A2C(\"MultiInputPolicy\", env)\n", + "model.learn(total_timesteps=100000)" + ], + "metadata": { + "id": "C-3SfbJr0N7I" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", "source": [], "metadata": { - "id": "fW_CdlUsEVP2" - } + "id": "16pttUsKFyZY" + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown",