Add unit6 WIP

This commit is contained in:
Thomas Simonini
2023-01-01 17:30:34 +01:00
parent 14bd94d574
commit 1680476a04

771
notebooks/unit6/unit6.ipynb Normal file
View File

@@ -0,0 +1,771 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"private_outputs": true,
"authorship_tag": "ABX9TyM4Z04oGTU1B2rRuxHfuNly",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU",
"gpuClass": "standard"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/huggingface/deep-rl-class/blob/ThomasSimonini%2FA2C/notebooks/unit6/unit6.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# Unit 6: Advantage Actor Critic (A2C) using Robotics Simulations with PyBullet and Panda-Gym 🤖\n",
"\n",
"TODO: ADD THUMBNAIL\n",
"\n",
"In this small notebook you'll learn to use A2C with PyBullet and Panda-Gym two set of robotics environments. \n",
"\n",
"With [PyBullet](https://github.com/bulletphysics/bullet3), you're going to **train robots to walk and run**:\n",
"- `AntBulletEnv-v0` 🕸️ More precisely a spider (they say Ant but come on... it's a spider 😆) 🕸️\n",
"- `HalfCheetahBulletEnv-v0`\n",
"\n",
"Then, with [Panda-Gym](https://github.com/qgallouedec/panda-gym), you're going **to train a robotic arm** (Franka Emika Panda robot) to perform some tasks:\n",
"- `Reach`: the robot must place its end-effector at a target position.\n",
"- `Slide`: the robot has to slide an object to a target position.\n",
"\n",
"After that, you'll be able to train other robotics environments."
],
"metadata": {
"id": "-PTReiOw-RAN"
}
},
{
"cell_type": "markdown",
"source": [
"TODO: ADD VIDEO OF WHAT IT LOOKS LIKE"
],
"metadata": {
"id": "2VGL_0ncoAJI"
}
},
{
"cell_type": "markdown",
"source": [
"### 🎮 Environments: \n",
"\n",
"- [PyBullet](https://github.com/bulletphysics/bullet3)\n",
"- [Panda-Gym](https://github.com/qgallouedec/panda-gym)\n",
"\n",
"###📚 RL-Library: \n",
"\n",
"- [Stable-Baselines3](https://stable-baselines3.readthedocs.io/)"
],
"metadata": {
"id": "QInFitfWno1Q"
}
},
{
"cell_type": "markdown",
"source": [
"We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the GitHub Repo](https://github.com/huggingface/deep-rl-class/issues)."
],
"metadata": {
"id": "2CcdX4g3oFlp"
}
},
{
"cell_type": "markdown",
"source": [
"## Objectives of this notebook 🏆\n",
"\n",
"At the end of the notebook, you will:\n",
"\n",
"- Be able to use **PyBullet** and **Panda-Gym**, the environment libraries.\n",
"- Be able to **train robots using A2C**.\n",
"- Understand why **we need to normalize the input**.\n",
"- Be able to **push your trained agent and the code to the Hub** with a nice video replay and an evaluation score 🔥.\n",
"\n",
"\n"
],
"metadata": {
"id": "MoubJX20oKaQ"
}
},
{
"cell_type": "markdown",
"source": [
"## This notebook is from the Deep Reinforcement Learning Course\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/deep-rl-course-illustration.jpg\" alt=\"Deep RL Course illustration\"/>\n",
"\n",
"In this free course, you will:\n",
"\n",
"- 📖 Study Deep Reinforcement Learning in **theory and practice**.\n",
"- 🧑‍💻 Learn to **use famous Deep RL libraries** such as Stable Baselines3, RL Baselines3 Zoo, CleanRL and Sample Factory 2.0.\n",
"- 🤖 Train **agents in unique environments** \n",
"\n",
"And more check 📚 the syllabus 👉 https://simoninithomas.github.io/deep-rl-course\n",
"\n",
"Dont forget to **<a href=\"http://eepurl.com/ic5ZUD\">sign up to the course</a>** (we are collecting your email to be able to **send you the links when each Unit is published and give you information about the challenges and updates).**\n",
"\n",
"\n",
"The best way to keep in touch is to join our discord server to exchange with the community and with us 👉🏻 https://discord.gg/ydHrjt3WP5"
],
"metadata": {
"id": "DoUNkTExoUED"
}
},
{
"cell_type": "markdown",
"source": [
"## Prerequisites 🏗️\n",
"Before diving into the notebook, you need to:\n",
"\n",
"🔲 📚 Study [Actor-Critic methods by reading Unit 6](https://huggingface.co/deep-rl-course/unit6/introduction) 🤗 "
],
"metadata": {
"id": "BTuQAUAPoa5E"
}
},
{
"cell_type": "markdown",
"source": [
"# Let's train our first robots 🤖"
],
"metadata": {
"id": "iajHvVDWoo01"
}
},
{
"cell_type": "markdown",
"source": [
"To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to:\n",
"\n",
"TODO ADD CERTIFICATION RECOMMENDATION\n",
"\n",
"To find your result, go to the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) and find your model, **the result = mean_reward - std of reward**\n",
"\n",
"For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process"
],
"metadata": {
"id": "zbOENTE2os_D"
}
},
{
"cell_type": "markdown",
"source": [
"## Set the GPU 💪\n",
"- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`\n",
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step1.jpg\" alt=\"GPU Step 1\">"
],
"metadata": {
"id": "PU4FVzaoM6fC"
}
},
{
"cell_type": "markdown",
"source": [
"- `Hardware Accelerator > GPU`\n",
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step2.jpg\" alt=\"GPU Step 2\">"
],
"metadata": {
"id": "KV0NyFdQM9ZG"
}
},
{
"cell_type": "markdown",
"source": [
"## Create a virtual display 🔽\n",
"\n",
"During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames). \n",
"\n",
"Hence the following cell will install the librairies and create and run a virtual screen 🖥"
],
"metadata": {
"id": "bTpYcVZVMzUI"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "jV6wjQ7Be7p5"
},
"outputs": [],
"source": [
"%%capture\n",
"!apt install python-opengl\n",
"!apt install ffmpeg\n",
"!apt install xvfb\n",
"!pip3 install pyvirtualdisplay"
]
},
{
"cell_type": "code",
"source": [
"# Additional dependencies for RL Baselines3 Zoo\n",
"!apt-get install swig cmake freeglut3-dev "
],
"metadata": {
"id": "fWyKJCy_NJBX"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Virtual display\n",
"from pyvirtualdisplay import Display\n",
"\n",
"virtual_display = Display(visible=0, size=(1400, 900))\n",
"virtual_display.start()"
],
"metadata": {
"id": "ww5PQH1gNLI4"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Install dependencies 🔽\n",
"The first step is to install the dependencies, well install multiple ones:\n",
"\n",
"- `pybullet`: Contains the walking robots environments.\n",
"- `panda-gym`: Contains the robotics arm environments.\n",
"- `stable-baselines3[extra]`: The SB3 deep reinforcement learning library.\n",
"- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.\n",
"- `huggingface_hub`: Library allowing anyone to work with the Hub repositories.\n",
"\n",
"We're going to install **two versions of gym**:\n",
"- `gym==0.21`: The classical version of gym for PyBullet environments.\n",
"- `gymnasium`: [The new Gym library by Farama Foundation](https://github.com/Farama-Foundation/Gymnasium) for Panda Gym environments."
],
"metadata": {
"id": "e1obkbdJ_KnG"
}
},
{
"cell_type": "code",
"source": [
"!pip install -r https://huggingface.co/spaces/ThomasSimonini/temp-space-requirements/raw/main/requirements/requirements-unit6.txt"
],
"metadata": {
"id": "69jUeXrLryos"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "2yZRi_0bQGPM"
},
"outputs": [],
"source": [
"TODO: CHANGE TO THE ONE COMMENTED#!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit6/requirements-unit6.txt"
]
},
{
"cell_type": "markdown",
"source": [
"## Import the packages 📦"
],
"metadata": {
"id": "QTep3PQQABLr"
}
},
{
"cell_type": "code",
"source": [
"import gymnasium as gymnasium\n",
"import panda_gym\n",
"\n",
"import gym\n",
"import pybullet_envs\n",
"\n",
"import os\n",
"\n",
"from huggingface_sb3 import load_from_hub, package_to_hub\n",
"\n",
"from stable_baselines3 import A2C\n",
"from stable_baselines3.common.evaluation import evaluate_policy\n",
"from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize\n",
"from stable_baselines3.common.env_util import make_vec_env\n",
"\n",
"from huggingface_hub import notebook_login"
],
"metadata": {
"id": "HpiB8VdnQ7Bk"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Part 1: PyBullet Environments\n"
],
"metadata": {
"id": "KIqf-N-otczo"
}
},
{
"cell_type": "markdown",
"source": [
"## Environment 1: AntBulletEnv-v0 🕸\n",
"\n"
],
"metadata": {
"id": "lfBwIS_oAVXI"
}
},
{
"cell_type": "markdown",
"source": [
"### Create the AntBulletEnv-v0\n",
"#### The environment 🎮\n",
"In this environment, the agent needs to use correctly its different joints to walk correctly."
],
"metadata": {
"id": "frVXOrnlBerQ"
}
},
{
"cell_type": "code",
"source": [
"import gym # As mentionned we use gym for PyBullet and gymnasium for panda-gym"
],
"metadata": {
"id": "RJ0XJccTt9FX"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"env_id = \"AntBulletEnv-v0\"\n",
"# Create the env\n",
"env = gym.make(env_id)\n",
"\n",
"# Get the state space and action space\n",
"s_size = env.observation_space.shape[0]\n",
"a_size = env.action_space"
],
"metadata": {
"id": "JpU-JCDQYYax"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print(\"_____OBSERVATION SPACE_____ \\n\")\n",
"print(\"The State Space is: \", s_size)\n",
"print(\"Sample observation\", env.observation_space.sample()) # Get a random observation"
],
"metadata": {
"id": "2ZfvcCqEYgrg"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print(\"\\n _____ACTION SPACE_____ \\n\")\n",
"print(\"The Action Space is: \", a_size)\n",
"print(\"Action Space Sample\", env.action_space.sample()) # Take a random action"
],
"metadata": {
"id": "Tc89eLTYYkK2"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Normalize observation and rewards"
],
"metadata": {
"id": "S5sXcg469ysB"
}
},
{
"cell_type": "markdown",
"source": [
"A good practice in reinforcement learning is to [normalize input features](https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html). For that, a wrapper exists and will compute a running average and standard deviation of input features.\n",
"\n",
"We also normalize rewards with this same wrapper by adding `norm_reward = True`\n",
"\n",
"[You should check the documentation to fill this cell](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)"
],
"metadata": {
"id": "1ZyX6qf3Zva9"
}
},
{
"cell_type": "code",
"source": [
"env = make_vec_env(env_id, n_envs=4)\n",
"\n",
"# Adding this wrapper to normalize the observation and the reward\n",
"env = # TODO: Add the wrapper"
],
"metadata": {
"id": "1RsDtHHAQ9Ie"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#### Solution"
],
"metadata": {
"id": "tF42HvI7-gs5"
}
},
{
"cell_type": "code",
"source": [
"env = make_vec_env(env_id, n_envs=4)\n",
"\n",
"env = VecNormalize(env, norm_obs=True, norm_reward=False, clip_obs=10.)"
],
"metadata": {
"id": "2O67mqgC-hol"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Create the A2C Model 🤖\n",
"\n",
"In this case, because we have a vector of 28 values as input, we'll use an MLP (multi-layer perceptron) as policy.\n",
"\n",
"To find the best parameters I checked the [official trained agents by Stable-Baselines3 team](https://huggingface.co/sb3)."
],
"metadata": {
"id": "4JmEVU6z1ZA-"
}
},
{
"cell_type": "code",
"source": [
"model = # Create the A2C model and try to find the best parameters"
],
"metadata": {
"id": "vR3T4qFt164I"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#### Solution"
],
"metadata": {
"id": "nWAuOOLh-oQf"
}
},
{
"cell_type": "code",
"source": [
"model = A2C(policy = \"MlpPolicy\",\n",
" env = env,\n",
" gae_lambda = 0.9,\n",
" gamma = 0.99,\n",
" learning_rate = 0.00096,\n",
" max_grad_norm = 0.5,\n",
" n_steps = 8,\n",
" vf_coef = 0.4,\n",
" ent_coef = 0.0,\n",
" tensorboard_log = \"./tensorboard\",\n",
" policy_kwargs=dict(\n",
" log_std_init=-2, ortho_init=False),\n",
" normalize_advantage=False,\n",
" use_rms_prop= True,\n",
" use_sde= True,\n",
" verbose=1)"
],
"metadata": {
"id": "FKFLY54T-pU1"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Train the A2C agent 🏃\n",
"- Let's train our agent for 2,000,000 timesteps, don't forget to use GPU on Colab. It will take approximately ~25-40min"
],
"metadata": {
"id": "opyK3mpJ1-m9"
}
},
{
"cell_type": "code",
"source": [
"model.learn(2_000_000)"
],
"metadata": {
"id": "4TuGHZD7RF1G"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Save the model and VecNormalize statistics when saving the agent\n",
"model.save(\"a2c-AntBulletEnv-v0\")\n",
"env.save(\"vec_normalize.pkl\")"
],
"metadata": {
"id": "MfYtjj19cKFr"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Evaluate the agent 📈\n",
"- Now that's our agent is trained, we need to **check its performance**.\n",
"- Stable-Baselines3 provides a method to do that `evaluate_policy`\n",
"- In my case, I've got a mean reward of `2371.90 +/- 16.50`"
],
"metadata": {
"id": "01M9GCd32Ig-"
}
},
{
"cell_type": "code",
"source": [
"from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize\n",
"\n",
"# Load the saved statistics\n",
"eval_env = DummyVecEnv([lambda: gym.make(\"AntBulletEnv-v0\")])\n",
"eval_env = VecNormalize.load(\"vec_normalize.pkl\", eval_env)\n",
"\n",
"# do not update them at test time\n",
"eval_env.training = False\n",
"# reward normalization is not needed at test time\n",
"eval_env.norm_reward = False\n",
"\n",
"# Load the agent\n",
"model = A2C.load(\"a2c-AntBulletEnv-v0\")\n",
"\n",
"mean_reward, std_reward = evaluate_policy(model, env)\n",
"\n",
"print(f\"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}\")"
],
"metadata": {
"id": "liirTVoDkHq3"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Publish your trained model on the Hub 🔥\n",
"Now that we saw we got good results after the training, we can publish our trained model on the hub 🤗 with one line of code.\n",
"\n",
"📚 The libraries documentation 👉 https://github.com/huggingface/huggingface_sb3/tree/main#hugging-face--x-stable-baselines3-v20\n",
"\n",
"Here's an example of a Model Card (with a PyBullet environment):\n",
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/modelcardpybullet.png\" alt=\"Model Card Pybullet\"/>"
],
"metadata": {
"id": "44L9LVQaavR8"
}
},
{
"cell_type": "markdown",
"source": [
"By using `package_to_hub`, as we already mentionned in the former units, **you evaluate, record a replay, generate a model card of your agent and push it to the hub**.\n",
"\n",
"This way:\n",
"- You can **showcase our work** 🔥\n",
"- You can **visualize your agent playing** 👀\n",
"- You can **share with the community an agent that others can use** 💾\n",
"- You can **access a leaderboard 🏆 to see how well your agent is performing compared to your classmates** 👉 https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard\n"
],
"metadata": {
"id": "MkMk99m8bgaQ"
}
},
{
"cell_type": "markdown",
"metadata": {
"id": "JquRrWytA6eo"
},
"source": [
"To be able to share your model with the community there are three more steps to follow:\n",
"\n",
"1⃣ (If it's not already done) create an account to HF ➡ https://huggingface.co/join\n",
"\n",
"2⃣ Sign in and then, you need to store your authentication token from the Hugging Face website.\n",
"- Create a new token (https://huggingface.co/settings/tokens) **with write role**\n",
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/create-token.jpg\" alt=\"Create HF Token\">\n",
"\n",
"- Copy the token \n",
"- Run the cell below and paste the token"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GZiFBBlzxzxY"
},
"outputs": [],
"source": [
"notebook_login()\n",
"!git config --global credential.helper store"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_tsf2uv0g_4p"
},
"source": [
"If you don't want to use a Google Colab or a Jupyter Notebook, you need to use this command instead: `huggingface-cli login`"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FGNh9VsZok0i"
},
"source": [
"3⃣ We're now ready to push our trained agent to the 🤗 Hub 🔥 using `package_to_hub()` function"
]
},
{
"cell_type": "code",
"source": [
"package_to_hub(\n",
" model=model,\n",
" model_name=f\"a2c-{env_id}\",\n",
" model_architecture=\"A2C\",\n",
" env_id=env_id,\n",
" eval_env=eval_env,\n",
" repo_id=f\"ThomasSimonini/a2c-{env_id}\", # Change the username\n",
" commit_message=\"Initial commit\",\n",
")"
],
"metadata": {
"id": "ueuzWVCUTkfS"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Environment 2: HalfCheetahBulletEnv-v0\n",
"\n",
"For this environment, you need to follow the same process that the first one. **Don't hesitate here to save this notebook to your Google Drive** since timeout can happen. You may also want to **complete this notebook in two times**.\n",
"\n",
"In order to see that you understood the complete process from environment definition to `package_to_hub` why not trying to do **it yourself first without solution?**\n",
"\n",
"1. Define the enviroment called HalfCheetahBulletEnv-v0\n",
"2. Make a vectorized environment\n",
"3. Add a wrapper to normalize the observations and rewards. [Check the documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)\n",
"4. Create the A2C Model\n",
"5. Train it for 2M Timesteps\n",
"6. Save the model and VecNormalize statistics when saving the agent\n",
"7. Evaluate your agent\n",
"8. Publish your trained model on the Hub 🔥 with `package_to_hub`"
],
"metadata": {
"id": "-voECBK3An9j"
}
},
{
"cell_type": "markdown",
"source": [
"## Take a coffee break ☕\n",
"- You already trained two robotics environments that learned to move congratutlations 🥳!\n",
"- It's **time to take a break**. Don't hesitate to **save this notebook** `File > Save a copy to Drive` to work on this second part later.\n"
],
"metadata": {
"id": "Qk9ykOk9D6Qh"
}
},
{
"cell_type": "markdown",
"source": [
"# Part 2: Robotic Arm Environments with `panda-gym`\n"
],
"metadata": {
"id": "5VWfwAA7EJg7"
}
},
{
"cell_type": "markdown",
"source": [],
"metadata": {
"id": "fW_CdlUsEVP2"
}
},
{
"cell_type": "markdown",
"source": [
"## Some additional challenges 🏆\n",
"The best way to learn **is to try things by your own**! Why not trying `HalfCheetahBulletEnv-v0`?\n",
"\n",
"In the [Leaderboard](https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top?\n",
"\n",
"Here are some ideas to achieve so:\n",
"* Train more steps\n",
"* Try different hyperparameters by looking at what your classmates have done 👉 https://huggingface.co/models?other=https://huggingface.co/models?other=AntBulletEnv-v0\n",
"* **Push your new trained model** on the Hub 🔥\n"
],
"metadata": {
"id": "G3xy3Nf3c2O1"
}
},
{
"cell_type": "markdown",
"source": [
"See you on Unit 8! 🔥\n",
"## Keep learning, stay awesome 🤗"
],
"metadata": {
"id": "usatLaZ8dM4P"
}
}
]
}