diff --git a/notebooks/unit6/unit6.ipynb b/notebooks/unit6/unit6.ipynb new file mode 100644 index 0000000..8ecae3c --- /dev/null +++ b/notebooks/unit6/unit6.ipynb @@ -0,0 +1,771 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "private_outputs": true, + "authorship_tag": "ABX9TyM4Z04oGTU1B2rRuxHfuNly", + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + }, + "accelerator": "GPU", + "gpuClass": "standard" + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "source": [ + "# Unit 6: Advantage Actor Critic (A2C) using Robotics Simulations with PyBullet and Panda-Gym ๐Ÿค–\n", + "\n", + "TODO: ADD THUMBNAIL\n", + "\n", + "In this small notebook you'll learn to use A2C with PyBullet and Panda-Gym two set of robotics environments. \n", + "\n", + "With [PyBullet](https://github.com/bulletphysics/bullet3), you're going to **train robots to walk and run**:\n", + "- `AntBulletEnv-v0` ๐Ÿ•ธ๏ธ More precisely a spider (they say Ant but come on... it's a spider ๐Ÿ˜†) ๐Ÿ•ธ๏ธ\n", + "- `HalfCheetahBulletEnv-v0`\n", + "\n", + "Then, with [Panda-Gym](https://github.com/qgallouedec/panda-gym), you're going **to train a robotic arm** (Franka Emika Panda robot) to perform some tasks:\n", + "- `Reach`: the robot must place its end-effector at a target position.\n", + "- `Slide`: the robot has to slide an object to a target position.\n", + "\n", + "After that, you'll be able to train other robotics environments." + ], + "metadata": { + "id": "-PTReiOw-RAN" + } + }, + { + "cell_type": "markdown", + "source": [ + "TODO: ADD VIDEO OF WHAT IT LOOKS LIKE" + ], + "metadata": { + "id": "2VGL_0ncoAJI" + } + }, + { + "cell_type": "markdown", + "source": [ + "### ๐ŸŽฎ Environments: \n", + "\n", + "- [PyBullet](https://github.com/bulletphysics/bullet3)\n", + "- [Panda-Gym](https://github.com/qgallouedec/panda-gym)\n", + "\n", + "###๐Ÿ“š RL-Library: \n", + "\n", + "- [Stable-Baselines3](https://stable-baselines3.readthedocs.io/)" + ], + "metadata": { + "id": "QInFitfWno1Q" + } + }, + { + "cell_type": "markdown", + "source": [ + "We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the GitHub Repo](https://github.com/huggingface/deep-rl-class/issues)." + ], + "metadata": { + "id": "2CcdX4g3oFlp" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Objectives of this notebook ๐Ÿ†\n", + "\n", + "At the end of the notebook, you will:\n", + "\n", + "- Be able to use **PyBullet** and **Panda-Gym**, the environment libraries.\n", + "- Be able to **train robots using A2C**.\n", + "- Understand why **we need to normalize the input**.\n", + "- Be able to **push your trained agent and the code to the Hub** with a nice video replay and an evaluation score ๐Ÿ”ฅ.\n", + "\n", + "\n" + ], + "metadata": { + "id": "MoubJX20oKaQ" + } + }, + { + "cell_type": "markdown", + "source": [ + "## This notebook is from the Deep Reinforcement Learning Course\n", + "\"Deep\n", + "\n", + "In this free course, you will:\n", + "\n", + "- ๐Ÿ“– Study Deep Reinforcement Learning in **theory and practice**.\n", + "- ๐Ÿง‘โ€๐Ÿ’ป Learn to **use famous Deep RL libraries** such as Stable Baselines3, RL Baselines3 Zoo, CleanRL and Sample Factory 2.0.\n", + "- ๐Ÿค– Train **agents in unique environments** \n", + "\n", + "And more check ๐Ÿ“š the syllabus ๐Ÿ‘‰ https://simoninithomas.github.io/deep-rl-course\n", + "\n", + "Donโ€™t forget to **sign up to the course** (we are collecting your email to be able toย **send you the links when each Unit is published and give you information about the challenges and updates).**\n", + "\n", + "\n", + "The best way to keep in touch is to join our discord server to exchange with the community and with us ๐Ÿ‘‰๐Ÿป https://discord.gg/ydHrjt3WP5" + ], + "metadata": { + "id": "DoUNkTExoUED" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Prerequisites ๐Ÿ—๏ธ\n", + "Before diving into the notebook, you need to:\n", + "\n", + "๐Ÿ”ฒ ๐Ÿ“š Study [Actor-Critic methods by reading Unit 6](https://huggingface.co/deep-rl-course/unit6/introduction) ๐Ÿค— " + ], + "metadata": { + "id": "BTuQAUAPoa5E" + } + }, + { + "cell_type": "markdown", + "source": [ + "# Let's train our first robots ๐Ÿค–" + ], + "metadata": { + "id": "iajHvVDWoo01" + } + }, + { + "cell_type": "markdown", + "source": [ + "To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to:\n", + "\n", + "TODO ADD CERTIFICATION RECOMMENDATION\n", + "\n", + "To find your result, go to the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) and find your model, **the result = mean_reward - std of reward**\n", + "\n", + "For more information about the certification process, check this section ๐Ÿ‘‰ https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process" + ], + "metadata": { + "id": "zbOENTE2os_D" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Set the GPU ๐Ÿ’ช\n", + "- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`\n", + "\n", + "\"GPU" + ], + "metadata": { + "id": "PU4FVzaoM6fC" + } + }, + { + "cell_type": "markdown", + "source": [ + "- `Hardware Accelerator > GPU`\n", + "\n", + "\"GPU" + ], + "metadata": { + "id": "KV0NyFdQM9ZG" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Create a virtual display ๐Ÿ”ฝ\n", + "\n", + "During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames). \n", + "\n", + "Hence the following cell will install the librairies and create and run a virtual screen ๐Ÿ–ฅ" + ], + "metadata": { + "id": "bTpYcVZVMzUI" + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "jV6wjQ7Be7p5" + }, + "outputs": [], + "source": [ + "%%capture\n", + "!apt install python-opengl\n", + "!apt install ffmpeg\n", + "!apt install xvfb\n", + "!pip3 install pyvirtualdisplay" + ] + }, + { + "cell_type": "code", + "source": [ + "# Additional dependencies for RL Baselines3 Zoo\n", + "!apt-get install swig cmake freeglut3-dev " + ], + "metadata": { + "id": "fWyKJCy_NJBX" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Virtual display\n", + "from pyvirtualdisplay import Display\n", + "\n", + "virtual_display = Display(visible=0, size=(1400, 900))\n", + "virtual_display.start()" + ], + "metadata": { + "id": "ww5PQH1gNLI4" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Install dependencies ๐Ÿ”ฝ\n", + "The first step is to install the dependencies, weโ€™ll install multiple ones:\n", + "\n", + "- `pybullet`: Contains the walking robots environments.\n", + "- `panda-gym`: Contains the robotics arm environments.\n", + "- `stable-baselines3[extra]`: The SB3 deep reinforcement learning library.\n", + "- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face ๐Ÿค— Hub.\n", + "- `huggingface_hub`: Library allowing anyone to work with the Hub repositories.\n", + "\n", + "We're going to install **two versions of gym**:\n", + "- `gym==0.21`: The classical version of gym for PyBullet environments.\n", + "- `gymnasium`: [The new Gym library by Farama Foundation](https://github.com/Farama-Foundation/Gymnasium) for Panda Gym environments." + ], + "metadata": { + "id": "e1obkbdJ_KnG" + } + }, + { + "cell_type": "code", + "source": [ + "!pip install -r https://huggingface.co/spaces/ThomasSimonini/temp-space-requirements/raw/main/requirements/requirements-unit6.txt" + ], + "metadata": { + "id": "69jUeXrLryos" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2yZRi_0bQGPM" + }, + "outputs": [], + "source": [ + "TODO: CHANGE TO THE ONE COMMENTED#!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit6/requirements-unit6.txt" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Import the packages ๐Ÿ“ฆ" + ], + "metadata": { + "id": "QTep3PQQABLr" + } + }, + { + "cell_type": "code", + "source": [ + "import gymnasium as gymnasium\n", + "import panda_gym\n", + "\n", + "import gym\n", + "import pybullet_envs\n", + "\n", + "import os\n", + "\n", + "from huggingface_sb3 import load_from_hub, package_to_hub\n", + "\n", + "from stable_baselines3 import A2C\n", + "from stable_baselines3.common.evaluation import evaluate_policy\n", + "from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize\n", + "from stable_baselines3.common.env_util import make_vec_env\n", + "\n", + "from huggingface_hub import notebook_login" + ], + "metadata": { + "id": "HpiB8VdnQ7Bk" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "# Part 1: PyBullet Environments\n" + ], + "metadata": { + "id": "KIqf-N-otczo" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Environment 1: AntBulletEnv-v0 ๐Ÿ•ธ\n", + "\n" + ], + "metadata": { + "id": "lfBwIS_oAVXI" + } + }, + { + "cell_type": "markdown", + "source": [ + "### Create the AntBulletEnv-v0\n", + "#### The environment ๐ŸŽฎ\n", + "In this environment, the agent needs to use correctly its different joints to walk correctly." + ], + "metadata": { + "id": "frVXOrnlBerQ" + } + }, + { + "cell_type": "code", + "source": [ + "import gym # As mentionned we use gym for PyBullet and gymnasium for panda-gym" + ], + "metadata": { + "id": "RJ0XJccTt9FX" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "env_id = \"AntBulletEnv-v0\"\n", + "# Create the env\n", + "env = gym.make(env_id)\n", + "\n", + "# Get the state space and action space\n", + "s_size = env.observation_space.shape[0]\n", + "a_size = env.action_space" + ], + "metadata": { + "id": "JpU-JCDQYYax" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "print(\"_____OBSERVATION SPACE_____ \\n\")\n", + "print(\"The State Space is: \", s_size)\n", + "print(\"Sample observation\", env.observation_space.sample()) # Get a random observation" + ], + "metadata": { + "id": "2ZfvcCqEYgrg" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "print(\"\\n _____ACTION SPACE_____ \\n\")\n", + "print(\"The Action Space is: \", a_size)\n", + "print(\"Action Space Sample\", env.action_space.sample()) # Take a random action" + ], + "metadata": { + "id": "Tc89eLTYYkK2" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Normalize observation and rewards" + ], + "metadata": { + "id": "S5sXcg469ysB" + } + }, + { + "cell_type": "markdown", + "source": [ + "A good practice in reinforcement learning is to [normalize input features](https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html). For that, a wrapper exists and will compute a running average and standard deviation of input features.\n", + "\n", + "We also normalize rewards with this same wrapper by adding `norm_reward = True`\n", + "\n", + "[You should check the documentation to fill this cell](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)" + ], + "metadata": { + "id": "1ZyX6qf3Zva9" + } + }, + { + "cell_type": "code", + "source": [ + "env = make_vec_env(env_id, n_envs=4)\n", + "\n", + "# Adding this wrapper to normalize the observation and the reward\n", + "env = # TODO: Add the wrapper" + ], + "metadata": { + "id": "1RsDtHHAQ9Ie" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "#### Solution" + ], + "metadata": { + "id": "tF42HvI7-gs5" + } + }, + { + "cell_type": "code", + "source": [ + "env = make_vec_env(env_id, n_envs=4)\n", + "\n", + "env = VecNormalize(env, norm_obs=True, norm_reward=False, clip_obs=10.)" + ], + "metadata": { + "id": "2O67mqgC-hol" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Create the A2C Model ๐Ÿค–\n", + "\n", + "In this case, because we have a vector of 28 values as input, we'll use an MLP (multi-layer perceptron) as policy.\n", + "\n", + "To find the best parameters I checked the [official trained agents by Stable-Baselines3 team](https://huggingface.co/sb3)." + ], + "metadata": { + "id": "4JmEVU6z1ZA-" + } + }, + { + "cell_type": "code", + "source": [ + "model = # Create the A2C model and try to find the best parameters" + ], + "metadata": { + "id": "vR3T4qFt164I" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "#### Solution" + ], + "metadata": { + "id": "nWAuOOLh-oQf" + } + }, + { + "cell_type": "code", + "source": [ + "model = A2C(policy = \"MlpPolicy\",\n", + " env = env,\n", + " gae_lambda = 0.9,\n", + " gamma = 0.99,\n", + " learning_rate = 0.00096,\n", + " max_grad_norm = 0.5,\n", + " n_steps = 8,\n", + " vf_coef = 0.4,\n", + " ent_coef = 0.0,\n", + " tensorboard_log = \"./tensorboard\",\n", + " policy_kwargs=dict(\n", + " log_std_init=-2, ortho_init=False),\n", + " normalize_advantage=False,\n", + " use_rms_prop= True,\n", + " use_sde= True,\n", + " verbose=1)" + ], + "metadata": { + "id": "FKFLY54T-pU1" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Train the A2C agent ๐Ÿƒ\n", + "- Let's train our agent for 2,000,000 timesteps, don't forget to use GPU on Colab. It will take approximately ~25-40min" + ], + "metadata": { + "id": "opyK3mpJ1-m9" + } + }, + { + "cell_type": "code", + "source": [ + "model.learn(2_000_000)" + ], + "metadata": { + "id": "4TuGHZD7RF1G" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Save the model and VecNormalize statistics when saving the agent\n", + "model.save(\"a2c-AntBulletEnv-v0\")\n", + "env.save(\"vec_normalize.pkl\")" + ], + "metadata": { + "id": "MfYtjj19cKFr" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Evaluate the agent ๐Ÿ“ˆ\n", + "- Now that's our agent is trained, we need to **check its performance**.\n", + "- Stable-Baselines3 provides a method to do that `evaluate_policy`\n", + "- In my case, I've got a mean reward of `2371.90 +/- 16.50`" + ], + "metadata": { + "id": "01M9GCd32Ig-" + } + }, + { + "cell_type": "code", + "source": [ + "from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize\n", + "\n", + "# Load the saved statistics\n", + "eval_env = DummyVecEnv([lambda: gym.make(\"AntBulletEnv-v0\")])\n", + "eval_env = VecNormalize.load(\"vec_normalize.pkl\", eval_env)\n", + "\n", + "# do not update them at test time\n", + "eval_env.training = False\n", + "# reward normalization is not needed at test time\n", + "eval_env.norm_reward = False\n", + "\n", + "# Load the agent\n", + "model = A2C.load(\"a2c-AntBulletEnv-v0\")\n", + "\n", + "mean_reward, std_reward = evaluate_policy(model, env)\n", + "\n", + "print(f\"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}\")" + ], + "metadata": { + "id": "liirTVoDkHq3" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Publish your trained model on the Hub ๐Ÿ”ฅ\n", + "Now that we saw we got good results after the training, we can publish our trained model on the hub ๐Ÿค— with one line of code.\n", + "\n", + "๐Ÿ“š The libraries documentation ๐Ÿ‘‰ https://github.com/huggingface/huggingface_sb3/tree/main#hugging-face--x-stable-baselines3-v20\n", + "\n", + "Here's an example of a Model Card (with a PyBullet environment):\n", + "\n", + "\"Model" + ], + "metadata": { + "id": "44L9LVQaavR8" + } + }, + { + "cell_type": "markdown", + "source": [ + "By using `package_to_hub`, as we already mentionned in the former units, **you evaluate, record a replay, generate a model card of your agent and push it to the hub**.\n", + "\n", + "This way:\n", + "- You can **showcase our work** ๐Ÿ”ฅ\n", + "- You can **visualize your agent playing** ๐Ÿ‘€\n", + "- You can **share with the community an agent that others can use** ๐Ÿ’พ\n", + "- You can **access a leaderboard ๐Ÿ† to see how well your agent is performing compared to your classmates** ๐Ÿ‘‰ https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard\n" + ], + "metadata": { + "id": "MkMk99m8bgaQ" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JquRrWytA6eo" + }, + "source": [ + "To be able to share your model with the community there are three more steps to follow:\n", + "\n", + "1๏ธโƒฃ (If it's not already done) create an account to HF โžก https://huggingface.co/join\n", + "\n", + "2๏ธโƒฃ Sign in and then, you need to store your authentication token from the Hugging Face website.\n", + "- Create a new token (https://huggingface.co/settings/tokens) **with write role**\n", + "\n", + "\"Create\n", + "\n", + "- Copy the token \n", + "- Run the cell below and paste the token" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "GZiFBBlzxzxY" + }, + "outputs": [], + "source": [ + "notebook_login()\n", + "!git config --global credential.helper store" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_tsf2uv0g_4p" + }, + "source": [ + "If you don't want to use a Google Colab or a Jupyter Notebook, you need to use this command instead: `huggingface-cli login`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FGNh9VsZok0i" + }, + "source": [ + "3๏ธโƒฃ We're now ready to push our trained agent to the ๐Ÿค— Hub ๐Ÿ”ฅ using `package_to_hub()` function" + ] + }, + { + "cell_type": "code", + "source": [ + "package_to_hub(\n", + " model=model,\n", + " model_name=f\"a2c-{env_id}\",\n", + " model_architecture=\"A2C\",\n", + " env_id=env_id,\n", + " eval_env=eval_env,\n", + " repo_id=f\"ThomasSimonini/a2c-{env_id}\", # Change the username\n", + " commit_message=\"Initial commit\",\n", + ")" + ], + "metadata": { + "id": "ueuzWVCUTkfS" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Environment 2: HalfCheetahBulletEnv-v0\n", + "\n", + "For this environment, you need to follow the same process that the first one. **Don't hesitate here to save this notebook to your Google Drive** since timeout can happen. You may also want to **complete this notebook in two times**.\n", + "\n", + "In order to see that you understood the complete process from environment definition to `package_to_hub` why not trying to do **it yourself first without solution?**\n", + "\n", + "1. Define the enviroment called HalfCheetahBulletEnv-v0\n", + "2. Make a vectorized environment\n", + "3. Add a wrapper to normalize the observations and rewards. [Check the documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)\n", + "4. Create the A2C Model\n", + "5. Train it for 2M Timesteps\n", + "6. Save the model and VecNormalize statistics when saving the agent\n", + "7. Evaluate your agent\n", + "8. Publish your trained model on the Hub ๐Ÿ”ฅ with `package_to_hub`" + ], + "metadata": { + "id": "-voECBK3An9j" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Take a coffee break โ˜•\n", + "- You already trained two robotics environments that learned to move congratutlations ๐Ÿฅณ!\n", + "- It's **time to take a break**. Don't hesitate to **save this notebook** `File > Save a copy to Drive` to work on this second part later.\n" + ], + "metadata": { + "id": "Qk9ykOk9D6Qh" + } + }, + { + "cell_type": "markdown", + "source": [ + "# Part 2: Robotic Arm Environments with `panda-gym`\n" + ], + "metadata": { + "id": "5VWfwAA7EJg7" + } + }, + { + "cell_type": "markdown", + "source": [], + "metadata": { + "id": "fW_CdlUsEVP2" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Some additional challenges ๐Ÿ†\n", + "The best way to learn **is to try things by your own**! Why not trying `HalfCheetahBulletEnv-v0`?\n", + "\n", + "In the [Leaderboard](https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top?\n", + "\n", + "Here are some ideas to achieve so:\n", + "* Train more steps\n", + "* Try different hyperparameters by looking at what your classmates have done ๐Ÿ‘‰ https://huggingface.co/models?other=https://huggingface.co/models?other=AntBulletEnv-v0\n", + "* **Push your new trained model** on the Hub ๐Ÿ”ฅ\n" + ], + "metadata": { + "id": "G3xy3Nf3c2O1" + } + }, + { + "cell_type": "markdown", + "source": [ + "See you on Unit 8! ๐Ÿ”ฅ\n", + "## Keep learning, stay awesome ๐Ÿค—" + ], + "metadata": { + "id": "usatLaZ8dM4P" + } + } + ] +} \ No newline at end of file