Update notebook

This commit is contained in:
Thomas Simonini
2023-01-02 12:44:57 +01:00
parent f937f8c7db
commit be7f8a34f0

View File

@@ -5,7 +5,18 @@
"colab": {
"provenance": [],
"private_outputs": true,
"authorship_tag": "ABX9TyM4Z04oGTU1B2rRuxHfuNly",
"collapsed_sections": [
"MoubJX20oKaQ",
"DoUNkTExoUED",
"BTuQAUAPoa5E",
"tF42HvI7-gs5",
"nWAuOOLh-oQf",
"-voECBK3An9j",
"Qk9ykOk9D6Qh",
"G3xy3Nf3c2O1",
"usatLaZ8dM4P"
],
"authorship_tag": "ABX9TyPovbUwEqbQAH1J8OxiHKDm",
"include_colab_link": true
},
"kernelspec": {
@@ -34,7 +45,7 @@
"source": [
"# Unit 6: Advantage Actor Critic (A2C) using Robotics Simulations with PyBullet and Panda-Gym 🤖\n",
"\n",
"TODO: ADD THUMBNAIL\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/thumbnail.png\" alt=\"Thumbnail\"/>\n",
"\n",
"In this small notebook you'll learn to use A2C with PyBullet and Panda-Gym two set of robotics environments. \n",
"\n",
@@ -252,10 +263,7 @@
"- `stable-baselines3[extra]`: The SB3 deep reinforcement learning library.\n",
"- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.\n",
"- `huggingface_hub`: Library allowing anyone to work with the Hub repositories.\n",
"\n",
"We're going to install **two versions of gym**:\n",
"- `gym==0.21`: The classical version of gym for PyBullet environments.\n",
"- `gymnasium`: [The new Gym library by Farama Foundation](https://github.com/Farama-Foundation/Gymnasium) for Panda Gym environments."
"- `gym==0.21`: The classical version of gym."
],
"metadata": {
"id": "e1obkbdJ_KnG"
@@ -295,12 +303,12 @@
{
"cell_type": "code",
"source": [
"import gymnasium as gymnasium\n",
"import panda_gym\n",
"\n",
"import gym\n",
"import pybullet_envs\n",
"\n",
"import gymnasium\n",
"import panda_gym\n",
"\n",
"import os\n",
"\n",
"from huggingface_sb3 import load_from_hub, package_to_hub\n",
@@ -351,7 +359,7 @@
{
"cell_type": "code",
"source": [
"import gym # As mentionned we use gym for PyBullet and gymnasium for panda-gym"
"import gym"
],
"metadata": {
"id": "RJ0XJccTt9FX"
@@ -389,6 +397,15 @@
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"TODO: Add explanation obs space"
],
"metadata": {
"id": "QzMmsdMJS7jh"
}
},
{
"cell_type": "code",
"source": [
@@ -402,6 +419,15 @@
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Todo: Add explanation action space"
],
"metadata": {
"id": "3RfsHhzZS9Pw"
}
},
{
"cell_type": "markdown",
"source": [
@@ -696,11 +722,11 @@
"source": [
"## Environment 2: HalfCheetahBulletEnv-v0\n",
"\n",
"For this environment, you need to follow the same process that the first one. **Don't hesitate here to save this notebook to your Google Drive** since timeout can happen. You may also want to **complete this notebook in two times**.\n",
"For this environment, you must follow the same process as the first one. **Don't hesitate to save this notebook to your Google Drive** since timeout can happen. You may also want to **complete this notebook two times**.\n",
"\n",
"In order to see that you understood the complete process from environment definition to `package_to_hub` why not trying to do **it yourself first without solution?**\n",
"To see that you understood the complete process from environment definition to `package_to_hub` why not try to do **it yourself first without the solution?**\n",
"\n",
"1. Define the enviroment called HalfCheetahBulletEnv-v0\n",
"1. Define the environment called HalfCheetahBulletEnv-v0\n",
"2. Make a vectorized environment\n",
"3. Add a wrapper to normalize the observations and rewards. [Check the documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)\n",
"4. Create the A2C Model\n",
@@ -727,18 +753,121 @@
{
"cell_type": "markdown",
"source": [
"# Part 2: Robotic Arm Environments with `panda-gym`\n"
"# Part 2: Robotic Arm Environments with `panda-gym`\n",
"\n",
"The second set of robotics environments we're going to train are a robotic arm that needs to do controls (moving the arm and using the end-effector).\n",
"\n",
"In robotics, the *end-effector* is the device at the end of a robotic arm designed to interact with the environment.\n",
"\n",
"1. In the first environment, `PandaReach`, the robot must place its end-effector at a target position (green ball).\n",
"2. In the second environment, `PandaSlide`, the robot has to slide an object to a target position.\n",
"\n",
"We're going to use the dense version of the environments. It means we'll get a *dense reward function* that **will provide a reward at each timestep** (the closer the agent is to complete the task, the higher the reward). Contrary to a *sparse reward function* where the environment **return a reward if and only if the task is completed**.\n",
"\n",
"Also, we're going to use the *End-effector displacement control*, it means the **action corresponds to the displacement of the end-effector**. We don't control the individual motion of each joint (joint control).\n",
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/robotics.jpg\" alt=\"Robotics\"/>\n",
"\n",
"\n",
"This way, **the training will be easier**.\n",
"\n"
],
"metadata": {
"id": "5VWfwAA7EJg7"
}
},
{
"cell_type": "code",
"source": [
"env_id = \"PandaReachDense-v2\"\n",
"\n",
"# Create the env\n",
"env = gym.make(env_id)\n",
"\n",
"# Get the state space and action space\n",
"s_size = env.observation_space.shape\n",
"a_size = env.action_space"
],
"metadata": {
"id": "zXzAu3HYF1WD"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print(\"_____OBSERVATION SPACE_____ \\n\")\n",
"print(\"The State Space is: \", s_size)\n",
"print(\"Sample observation\", env.observation_space.sample()) # Get a random observation"
],
"metadata": {
"id": "E-U9dexcF-FB"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"The observation space is a dictionary with 3 different element:\n",
"- `achieved_goal`: (x,y,z) position of the goal.\n",
"- `desired_goal`: (x,y,z) distance between the goal position and the current object position.\n",
"- `observation`: position (x,y,z) and velocity of the end-effector (vx, vy, vz).\n",
"\n"
],
"metadata": {
"id": "g_JClfElGFnF"
}
},
{
"cell_type": "code",
"source": [
"print(\"\\n _____ACTION SPACE_____ \\n\")\n",
"print(\"The Action Space is: \", a_size)\n",
"print(\"Action Space Sample\", env.action_space.sample()) # Take a random action"
],
"metadata": {
"id": "ib1Kxy4AF-FC"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"TODO: ADd action space"
],
"metadata": {
"id": "5MHTHEHZS4yp"
}
},
{
"cell_type": "code",
"source": [
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"model = A2C(\"MultiInputPolicy\", env)\n",
"model.learn(total_timesteps=100000)"
],
"metadata": {
"id": "C-3SfbJr0N7I"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "fW_CdlUsEVP2"
}
"id": "16pttUsKFyZY"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",