diff --git a/notebooks/unit6/unit6.ipynb b/notebooks/unit6/unit6.ipynb index 7358a72..ceee2b1 100644 --- a/notebooks/unit6/unit6.ipynb +++ b/notebooks/unit6/unit6.ipynb @@ -5,18 +5,7 @@ "colab": { "provenance": [], "private_outputs": true, - "collapsed_sections": [ - "MoubJX20oKaQ", - "DoUNkTExoUED", - "BTuQAUAPoa5E", - "tF42HvI7-gs5", - "nWAuOOLh-oQf", - "-voECBK3An9j", - "Qk9ykOk9D6Qh", - "G3xy3Nf3c2O1", - "usatLaZ8dM4P" - ], - "authorship_tag": "ABX9TyPovbUwEqbQAH1J8OxiHKDm", + "authorship_tag": "ABX9TyNTCZRW9WsSED/roRBW2oQ5", "include_colab_link": true }, "kernelspec": { @@ -47,17 +36,15 @@ "\n", "\"Thumbnail\"/\n", "\n", - "In this small notebook you'll learn to use A2C with PyBullet and Panda-Gym two set of robotics environments. \n", + "In this notebook, you'll learn to use A2C with PyBullet and Panda-Gym, two set of robotics environments. \n", "\n", - "With [PyBullet](https://github.com/bulletphysics/bullet3), you're going to **train robots to walk and run**:\n", - "- `AntBulletEnv-v0` 🕸️ More precisely a spider (they say Ant but come on... it's a spider 😆) 🕸️\n", - "- `HalfCheetahBulletEnv-v0`\n", + "With [PyBullet](https://github.com/bulletphysics/bullet3), you're going to **train a robot to move**:\n", + "- `AntBulletEnv-v0` 🕸️ More precisely, a spider (they say Ant but come on... it's a spider 😆) 🕸️\n", "\n", - "Then, with [Panda-Gym](https://github.com/qgallouedec/panda-gym), you're going **to train a robotic arm** (Franka Emika Panda robot) to perform some tasks:\n", + "Then, with [Panda-Gym](https://github.com/qgallouedec/panda-gym), you're going **to train a robotic arm** (Franka Emika Panda robot) to perform a task:\n", "- `Reach`: the robot must place its end-effector at a target position.\n", - "- `Slide`: the robot has to slide an object to a target position.\n", "\n", - "After that, you'll be able to train other robotics environments." + "After that, you'll be able **to train in other robotics environments**.\n" ], "metadata": { "id": "-PTReiOw-RAN" @@ -66,7 +53,7 @@ { "cell_type": "markdown", "source": [ - "TODO: ADD VIDEO OF WHAT IT LOOKS LIKE" + "\"Robotics" ], "metadata": { "id": "2VGL_0ncoAJI" @@ -162,12 +149,15 @@ { "cell_type": "markdown", "source": [ - "To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to:\n", + "To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to push three models:\n", "\n", - "TODO ADD CERTIFICATION RECOMMENDATION\n", + "- `AntBulletEnv-v0` get a result of >= 650.\n", + "- `PandaReachDense-v2` get a result of >= -3.5.\n", "\n", "To find your result, go to the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) and find your model, **the result = mean_reward - std of reward**\n", "\n", + "If you don't find your model, **go to the bottom of the page and click on the refresh button**\n", + "\n", "For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process" ], "metadata": { @@ -225,18 +215,6 @@ "!pip3 install pyvirtualdisplay" ] }, - { - "cell_type": "code", - "source": [ - "# Additional dependencies for RL Baselines3 Zoo\n", - "!apt-get install swig cmake freeglut3-dev " - ], - "metadata": { - "id": "fWyKJCy_NJBX" - }, - "execution_count": null, - "outputs": [] - }, { "cell_type": "code", "source": [ @@ -262,24 +240,12 @@ "- `panda-gym`: Contains the robotics arm environments.\n", "- `stable-baselines3[extra]`: The SB3 deep reinforcement learning library.\n", "- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.\n", - "- `huggingface_hub`: Library allowing anyone to work with the Hub repositories.\n", - "- `gym==0.21`: The classical version of gym." + "- `huggingface_hub`: Library allowing anyone to work with the Hub repositories." ], "metadata": { "id": "e1obkbdJ_KnG" } }, - { - "cell_type": "code", - "source": [ - "!pip install -r https://huggingface.co/spaces/ThomasSimonini/temp-space-requirements/raw/main/requirements/requirements-unit6.txt" - ], - "metadata": { - "id": "69jUeXrLryos" - }, - "execution_count": null, - "outputs": [] - }, { "cell_type": "code", "execution_count": null, @@ -288,7 +254,7 @@ }, "outputs": [], "source": [ - "TODO: CHANGE TO THE ONE COMMENTED#!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit6/requirements-unit6.txt" + "!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit6/requirements-unit6.txt" ] }, { @@ -303,11 +269,9 @@ { "cell_type": "code", "source": [ - "import gym\n", "import pybullet_envs\n", - "\n", - "import gymnasium\n", "import panda_gym\n", + "import gym\n", "\n", "import os\n", "\n", @@ -326,15 +290,6 @@ "execution_count": null, "outputs": [] }, - { - "cell_type": "markdown", - "source": [ - "# Part 1: PyBullet Environments\n" - ], - "metadata": { - "id": "KIqf-N-otczo" - } - }, { "cell_type": "markdown", "source": [ @@ -350,23 +305,13 @@ "source": [ "### Create the AntBulletEnv-v0\n", "#### The environment 🎮\n", - "In this environment, the agent needs to use correctly its different joints to walk correctly." + "In this environment, the agent needs to use correctly its different joints to walk correctly.\n", + "You can find a detailled explanation of this environment here: https://hackmd.io/@jeffreymo/SJJrSJh5_#PyBullet" ], "metadata": { "id": "frVXOrnlBerQ" } }, - { - "cell_type": "code", - "source": [ - "import gym" - ], - "metadata": { - "id": "RJ0XJccTt9FX" - }, - "execution_count": null, - "outputs": [] - }, { "cell_type": "code", "source": [ @@ -400,7 +345,9 @@ { "cell_type": "markdown", "source": [ - "TODO: Add explanation obs space" + "The observation Space (from [Jeffrey Y Mo](https://hackmd.io/@jeffreymo/SJJrSJh5_#PyBullet)):\n", + "\n", + "\"PyBullet\n" ], "metadata": { "id": "QzMmsdMJS7jh" @@ -422,7 +369,9 @@ { "cell_type": "markdown", "source": [ - "Todo: Add explanation action space" + "The action Space (from [Jeffrey Y Mo](https://hackmd.io/@jeffreymo/SJJrSJh5_#PyBullet)):\n", + "\n", + "\"PyBullet\n" ], "metadata": { "id": "3RfsHhzZS9Pw" @@ -440,7 +389,9 @@ { "cell_type": "markdown", "source": [ - "A good practice in reinforcement learning is to [normalize input features](https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html). For that, a wrapper exists and will compute a running average and standard deviation of input features.\n", + "A good practice in reinforcement learning is to [normalize input features](https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html). \n", + "\n", + "For that, a wrapper exists and will compute a running average and standard deviation of input features.\n", "\n", "We also normalize rewards with this same wrapper by adding `norm_reward = True`\n", "\n", @@ -493,6 +444,8 @@ "\n", "In this case, because we have a vector of 28 values as input, we'll use an MLP (multi-layer perceptron) as policy.\n", "\n", + "For more information about A2C implementation with StableBaselines3 check: https://stable-baselines3.readthedocs.io/en/master/modules/a2c.html#notes\n", + "\n", "To find the best parameters I checked the [official trained agents by Stable-Baselines3 team](https://huggingface.co/sb3)." ], "metadata": { @@ -531,7 +484,6 @@ " n_steps = 8,\n", " vf_coef = 0.4,\n", " ent_coef = 0.0,\n", - " tensorboard_log = \"./tensorboard\",\n", " policy_kwargs=dict(\n", " log_std_init=-2, ortho_init=False),\n", " normalize_advantage=False,\n", @@ -717,33 +669,11 @@ "execution_count": null, "outputs": [] }, - { - "cell_type": "markdown", - "source": [ - "## Environment 2: HalfCheetahBulletEnv-v0\n", - "\n", - "For this environment, you must follow the same process as the first one. **Don't hesitate to save this notebook to your Google Drive** since timeout can happen. You may also want to **complete this notebook two times**.\n", - "\n", - "To see that you understood the complete process from environment definition to `package_to_hub` why not try to do **it yourself first without the solution?**\n", - "\n", - "1. Define the environment called HalfCheetahBulletEnv-v0\n", - "2. Make a vectorized environment\n", - "3. Add a wrapper to normalize the observations and rewards. [Check the documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)\n", - "4. Create the A2C Model\n", - "5. Train it for 2M Timesteps\n", - "6. Save the model and VecNormalize statistics when saving the agent\n", - "7. Evaluate your agent\n", - "8. Publish your trained model on the Hub 🔥 with `package_to_hub`" - ], - "metadata": { - "id": "-voECBK3An9j" - } - }, { "cell_type": "markdown", "source": [ "## Take a coffee break ☕\n", - "- You already trained two robotics environments that learned to move congratutlations 🥳!\n", + "- You already trained your first robot that learned to move congratutlations 🥳!\n", "- It's **time to take a break**. Don't hesitate to **save this notebook** `File > Save a copy to Drive` to work on this second part later.\n" ], "metadata": { @@ -753,16 +683,15 @@ { "cell_type": "markdown", "source": [ - "# Part 2: Robotic Arm Environments with `panda-gym`\n", + "## Environment 2: PandaReachDense-v2 🦾\n", "\n", - "The second set of robotics environments we're going to train are a robotic arm that needs to do controls (moving the arm and using the end-effector).\n", + "The agent we're going to train is a robotic arm that needs to do controls (moving the arm and using the end-effector).\n", "\n", "In robotics, the *end-effector* is the device at the end of a robotic arm designed to interact with the environment.\n", "\n", - "1. In the first environment, `PandaReach`, the robot must place its end-effector at a target position (green ball).\n", - "2. In the second environment, `PandaSlide`, the robot has to slide an object to a target position.\n", + "In `PandaReach`, the robot must place its end-effector at a target position (green ball).\n", "\n", - "We're going to use the dense version of the environments. It means we'll get a *dense reward function* that **will provide a reward at each timestep** (the closer the agent is to complete the task, the higher the reward). Contrary to a *sparse reward function* where the environment **return a reward if and only if the task is completed**.\n", + "We're going to use the dense version of this environment. It means we'll get a *dense reward function* that **will provide a reward at each timestep** (the closer the agent is to completing the task, the higher the reward). Contrary to a *sparse reward function* where the environment **return a reward if and only if the task is completed**.\n", "\n", "Also, we're going to use the *End-effector displacement control*, it means the **action corresponds to the displacement of the end-effector**. We don't control the individual motion of each joint (joint control).\n", "\n", @@ -776,10 +705,24 @@ "id": "5VWfwAA7EJg7" } }, + { + "cell_type": "markdown", + "source": [ + "\n", + "\n", + "In `PandaReachDense-v2` the robotic arm must place its end-effector at a target position (green ball).\n", + "\n" + ], + "metadata": { + "id": "oZ7FyDEi7G3T" + } + }, { "cell_type": "code", "source": [ - "env_id = \"PandaReachDense-v2\"\n", + "import gym\n", + "\n", + "env_id = \"PandaPushDense-v2\"\n", "\n", "# Create the env\n", "env = gym.make(env_id)\n", @@ -810,11 +753,12 @@ { "cell_type": "markdown", "source": [ - "The observation space is a dictionary with 3 different element:\n", + "The observation space **is a dictionary with 3 different element**:\n", "- `achieved_goal`: (x,y,z) position of the goal.\n", "- `desired_goal`: (x,y,z) distance between the goal position and the current object position.\n", "- `observation`: position (x,y,z) and velocity of the end-effector (vx, vy, vz).\n", - "\n" + "\n", + "Given it's a dictionary as observation, **we will need to use a MultiInputPolicy policy instead of MlpPolicy**." ], "metadata": { "id": "g_JClfElGFnF" @@ -836,35 +780,103 @@ { "cell_type": "markdown", "source": [ - "TODO: ADd action space" + "The action space is a vector with 3 values:\n", + "- Control x, y, z movement" ], "metadata": { "id": "5MHTHEHZS4yp" } }, { - "cell_type": "code", + "cell_type": "markdown", "source": [ + "Now it's your turn:\n", "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "model = A2C(\"MultiInputPolicy\", env)\n", - "model.learn(total_timesteps=100000)" + "1. Define the environment called \"PandaReachDense-v2\"\n", + "2. Make a vectorized environment\n", + "3. Add a wrapper to normalize the observations and rewards. [Check the documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)\n", + "4. Create the A2C Model (don't forget verbose=1 to print the training logs).\n", + "5. Train it for 2M Timesteps\n", + "6. Save the model and VecNormalize statistics when saving the agent\n", + "7. Evaluate your agent\n", + "8. Publish your trained model on the Hub 🔥 with `package_to_hub`" ], "metadata": { - "id": "C-3SfbJr0N7I" + "id": "nIhPoc5t9HjG" + } + }, + { + "cell_type": "markdown", + "source": [ + "### Solution (fill the todo)" + ], + "metadata": { + "id": "sKGbFXZq9ikN" + } + }, + { + "cell_type": "code", + "source": [ + "# 1 - 2\n", + "env_id = \"PandaReachDense-v2\"\n", + "env = make_vec_env(env_id, n_envs=4)\n", + "\n", + "# 3\n", + "env = VecNormalize(env, norm_obs=True, norm_reward=False, clip_obs=10.)\n", + "\n", + "# 4\n", + "model = A2C(policy = \"MultiInputPolicy\",\n", + " env = env,\n", + " verbose=1)\n", + "# 5\n", + "model.learn(1_000_000)" + ], + "metadata": { + "id": "J-cC-Feg9iMm" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", - "source": [], + "source": [ + "# 6\n", + "model_name = \"a2c-PandaReachDense-v2\"; \n", + "model.save(model_name)\n", + "env.save(\"vec_normalize.pkl\")\n", + "\n", + "# 7\n", + "from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize\n", + "\n", + "# Load the saved statistics\n", + "eval_env = DummyVecEnv([lambda: gym.make(\"PandaReachDense-v2\")])\n", + "eval_env = VecNormalize.load(\"vec_normalize.pkl\", eval_env)\n", + "\n", + "# do not update them at test time\n", + "eval_env.training = False\n", + "# reward normalization is not needed at test time\n", + "eval_env.norm_reward = False\n", + "\n", + "# Load the agent\n", + "model = A2C.load(model_name)\n", + "\n", + "mean_reward, std_reward = evaluate_policy(model, env)\n", + "\n", + "print(f\"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}\")\n", + "\n", + "# 8\n", + "package_to_hub(\n", + " model=model,\n", + " model_name=f\"a2c-{env_id}\",\n", + " model_architecture=\"A2C\",\n", + " env_id=env_id,\n", + " eval_env=eval_env,\n", + " repo_id=f\"ThomasSimonini/a2c-{env_id}\", # TODO: Change the username\n", + " commit_message=\"Initial commit\",\n", + ")" + ], "metadata": { - "id": "16pttUsKFyZY" + "id": "-UnlKLmpg80p" }, "execution_count": null, "outputs": [] @@ -873,9 +885,13 @@ "cell_type": "markdown", "source": [ "## Some additional challenges 🏆\n", - "The best way to learn **is to try things by your own**! Why not trying `HalfCheetahBulletEnv-v0`?\n", + "The best way to learn **is to try things by your own**! Why not trying `HalfCheetahBulletEnv-v0` for PyBullet?\n", "\n", - "In the [Leaderboard](https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top?\n", + "If you want to try more advanced tasks for panda-gym you need to check what was done using **TQC or SAC** (a more sample efficient algorithm suited for robotics tasks). In real robotics, you'll use more sample-efficient algorithm for a simple reason: contrary to a simulation **if you move your robotic arm too much you have a risk to break it**.\n", + "\n", + "PandaPickAndPlace-v1: https://huggingface.co/sb3/tqc-PandaPickAndPlace-v1\n", + "\n", + "And don't hesitate to check panda-gym documentation here: https://panda-gym.readthedocs.io/en/latest/usage/train_with_sb3.html\n", "\n", "Here are some ideas to achieve so:\n", "* Train more steps\n", @@ -889,7 +905,7 @@ { "cell_type": "markdown", "source": [ - "See you on Unit 8! 🔥\n", + "See you on Unit 7! 🔥\n", "## Keep learning, stay awesome 🤗" ], "metadata": {