Créé avec Colaboratory

This commit is contained in:
Thomas Simonini
2023-01-17 07:34:49 +01:00
parent 368b54970f
commit d406e5bb08

View File

@@ -5,18 +5,7 @@
"colab": {
"provenance": [],
"private_outputs": true,
"collapsed_sections": [
"MoubJX20oKaQ",
"DoUNkTExoUED",
"BTuQAUAPoa5E",
"tF42HvI7-gs5",
"nWAuOOLh-oQf",
"-voECBK3An9j",
"Qk9ykOk9D6Qh",
"G3xy3Nf3c2O1",
"usatLaZ8dM4P"
],
"authorship_tag": "ABX9TyPovbUwEqbQAH1J8OxiHKDm",
"authorship_tag": "ABX9TyNTCZRW9WsSED/roRBW2oQ5",
"include_colab_link": true
},
"kernelspec": {
@@ -47,17 +36,15 @@
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/thumbnail.png\" alt=\"Thumbnail\"/>\n",
"\n",
"In this small notebook you'll learn to use A2C with PyBullet and Panda-Gym two set of robotics environments. \n",
"In this notebook, you'll learn to use A2C with PyBullet and Panda-Gym, two set of robotics environments. \n",
"\n",
"With [PyBullet](https://github.com/bulletphysics/bullet3), you're going to **train robots to walk and run**:\n",
"- `AntBulletEnv-v0` 🕸️ More precisely a spider (they say Ant but come on... it's a spider 😆) 🕸️\n",
"- `HalfCheetahBulletEnv-v0`\n",
"With [PyBullet](https://github.com/bulletphysics/bullet3), you're going to **train a robot to move**:\n",
"- `AntBulletEnv-v0` 🕸️ More precisely, a spider (they say Ant but come on... it's a spider 😆) 🕸️\n",
"\n",
"Then, with [Panda-Gym](https://github.com/qgallouedec/panda-gym), you're going **to train a robotic arm** (Franka Emika Panda robot) to perform some tasks:\n",
"Then, with [Panda-Gym](https://github.com/qgallouedec/panda-gym), you're going **to train a robotic arm** (Franka Emika Panda robot) to perform a task:\n",
"- `Reach`: the robot must place its end-effector at a target position.\n",
"- `Slide`: the robot has to slide an object to a target position.\n",
"\n",
"After that, you'll be able to train other robotics environments."
"After that, you'll be able **to train in other robotics environments**.\n"
],
"metadata": {
"id": "-PTReiOw-RAN"
@@ -66,7 +53,7 @@
{
"cell_type": "markdown",
"source": [
"TODO: ADD VIDEO OF WHAT IT LOOKS LIKE"
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/environments.gif\" alt=\"Robotics environments\"/>"
],
"metadata": {
"id": "2VGL_0ncoAJI"
@@ -162,12 +149,15 @@
{
"cell_type": "markdown",
"source": [
"To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to:\n",
"To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to push three models:\n",
"\n",
"TODO ADD CERTIFICATION RECOMMENDATION\n",
"- `AntBulletEnv-v0` get a result of >= 650.\n",
"- `PandaReachDense-v2` get a result of >= -3.5.\n",
"\n",
"To find your result, go to the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) and find your model, **the result = mean_reward - std of reward**\n",
"\n",
"If you don't find your model, **go to the bottom of the page and click on the refresh button**\n",
"\n",
"For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process"
],
"metadata": {
@@ -225,18 +215,6 @@
"!pip3 install pyvirtualdisplay"
]
},
{
"cell_type": "code",
"source": [
"# Additional dependencies for RL Baselines3 Zoo\n",
"!apt-get install swig cmake freeglut3-dev "
],
"metadata": {
"id": "fWyKJCy_NJBX"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
@@ -262,24 +240,12 @@
"- `panda-gym`: Contains the robotics arm environments.\n",
"- `stable-baselines3[extra]`: The SB3 deep reinforcement learning library.\n",
"- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.\n",
"- `huggingface_hub`: Library allowing anyone to work with the Hub repositories.\n",
"- `gym==0.21`: The classical version of gym."
"- `huggingface_hub`: Library allowing anyone to work with the Hub repositories."
],
"metadata": {
"id": "e1obkbdJ_KnG"
}
},
{
"cell_type": "code",
"source": [
"!pip install -r https://huggingface.co/spaces/ThomasSimonini/temp-space-requirements/raw/main/requirements/requirements-unit6.txt"
],
"metadata": {
"id": "69jUeXrLryos"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"execution_count": null,
@@ -288,7 +254,7 @@
},
"outputs": [],
"source": [
"TODO: CHANGE TO THE ONE COMMENTED#!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit6/requirements-unit6.txt"
"!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit6/requirements-unit6.txt"
]
},
{
@@ -303,11 +269,9 @@
{
"cell_type": "code",
"source": [
"import gym\n",
"import pybullet_envs\n",
"\n",
"import gymnasium\n",
"import panda_gym\n",
"import gym\n",
"\n",
"import os\n",
"\n",
@@ -326,15 +290,6 @@
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Part 1: PyBullet Environments\n"
],
"metadata": {
"id": "KIqf-N-otczo"
}
},
{
"cell_type": "markdown",
"source": [
@@ -350,23 +305,13 @@
"source": [
"### Create the AntBulletEnv-v0\n",
"#### The environment 🎮\n",
"In this environment, the agent needs to use correctly its different joints to walk correctly."
"In this environment, the agent needs to use correctly its different joints to walk correctly.\n",
"You can find a detailled explanation of this environment here: https://hackmd.io/@jeffreymo/SJJrSJh5_#PyBullet"
],
"metadata": {
"id": "frVXOrnlBerQ"
}
},
{
"cell_type": "code",
"source": [
"import gym"
],
"metadata": {
"id": "RJ0XJccTt9FX"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
@@ -400,7 +345,9 @@
{
"cell_type": "markdown",
"source": [
"TODO: Add explanation obs space"
"The observation Space (from [Jeffrey Y Mo](https://hackmd.io/@jeffreymo/SJJrSJh5_#PyBullet)):\n",
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/obs_space.png\" alt=\"PyBullet Ant Obs space\"/>\n"
],
"metadata": {
"id": "QzMmsdMJS7jh"
@@ -422,7 +369,9 @@
{
"cell_type": "markdown",
"source": [
"Todo: Add explanation action space"
"The action Space (from [Jeffrey Y Mo](https://hackmd.io/@jeffreymo/SJJrSJh5_#PyBullet)):\n",
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/action_space.png\" alt=\"PyBullet Ant Obs space\"/>\n"
],
"metadata": {
"id": "3RfsHhzZS9Pw"
@@ -440,7 +389,9 @@
{
"cell_type": "markdown",
"source": [
"A good practice in reinforcement learning is to [normalize input features](https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html). For that, a wrapper exists and will compute a running average and standard deviation of input features.\n",
"A good practice in reinforcement learning is to [normalize input features](https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html). \n",
"\n",
"For that, a wrapper exists and will compute a running average and standard deviation of input features.\n",
"\n",
"We also normalize rewards with this same wrapper by adding `norm_reward = True`\n",
"\n",
@@ -493,6 +444,8 @@
"\n",
"In this case, because we have a vector of 28 values as input, we'll use an MLP (multi-layer perceptron) as policy.\n",
"\n",
"For more information about A2C implementation with StableBaselines3 check: https://stable-baselines3.readthedocs.io/en/master/modules/a2c.html#notes\n",
"\n",
"To find the best parameters I checked the [official trained agents by Stable-Baselines3 team](https://huggingface.co/sb3)."
],
"metadata": {
@@ -531,7 +484,6 @@
" n_steps = 8,\n",
" vf_coef = 0.4,\n",
" ent_coef = 0.0,\n",
" tensorboard_log = \"./tensorboard\",\n",
" policy_kwargs=dict(\n",
" log_std_init=-2, ortho_init=False),\n",
" normalize_advantage=False,\n",
@@ -717,33 +669,11 @@
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Environment 2: HalfCheetahBulletEnv-v0\n",
"\n",
"For this environment, you must follow the same process as the first one. **Don't hesitate to save this notebook to your Google Drive** since timeout can happen. You may also want to **complete this notebook two times**.\n",
"\n",
"To see that you understood the complete process from environment definition to `package_to_hub` why not try to do **it yourself first without the solution?**\n",
"\n",
"1. Define the environment called HalfCheetahBulletEnv-v0\n",
"2. Make a vectorized environment\n",
"3. Add a wrapper to normalize the observations and rewards. [Check the documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)\n",
"4. Create the A2C Model\n",
"5. Train it for 2M Timesteps\n",
"6. Save the model and VecNormalize statistics when saving the agent\n",
"7. Evaluate your agent\n",
"8. Publish your trained model on the Hub 🔥 with `package_to_hub`"
],
"metadata": {
"id": "-voECBK3An9j"
}
},
{
"cell_type": "markdown",
"source": [
"## Take a coffee break ☕\n",
"- You already trained two robotics environments that learned to move congratutlations 🥳!\n",
"- You already trained your first robot that learned to move congratutlations 🥳!\n",
"- It's **time to take a break**. Don't hesitate to **save this notebook** `File > Save a copy to Drive` to work on this second part later.\n"
],
"metadata": {
@@ -753,16 +683,15 @@
{
"cell_type": "markdown",
"source": [
"# Part 2: Robotic Arm Environments with `panda-gym`\n",
"## Environment 2: PandaReachDense-v2 🦾\n",
"\n",
"The second set of robotics environments we're going to train are a robotic arm that needs to do controls (moving the arm and using the end-effector).\n",
"The agent we're going to train is a robotic arm that needs to do controls (moving the arm and using the end-effector).\n",
"\n",
"In robotics, the *end-effector* is the device at the end of a robotic arm designed to interact with the environment.\n",
"\n",
"1. In the first environment, `PandaReach`, the robot must place its end-effector at a target position (green ball).\n",
"2. In the second environment, `PandaSlide`, the robot has to slide an object to a target position.\n",
"In `PandaReach`, the robot must place its end-effector at a target position (green ball).\n",
"\n",
"We're going to use the dense version of the environments. It means we'll get a *dense reward function* that **will provide a reward at each timestep** (the closer the agent is to complete the task, the higher the reward). Contrary to a *sparse reward function* where the environment **return a reward if and only if the task is completed**.\n",
"We're going to use the dense version of this environment. It means we'll get a *dense reward function* that **will provide a reward at each timestep** (the closer the agent is to completing the task, the higher the reward). Contrary to a *sparse reward function* where the environment **return a reward if and only if the task is completed**.\n",
"\n",
"Also, we're going to use the *End-effector displacement control*, it means the **action corresponds to the displacement of the end-effector**. We don't control the individual motion of each joint (joint control).\n",
"\n",
@@ -776,10 +705,24 @@
"id": "5VWfwAA7EJg7"
}
},
{
"cell_type": "markdown",
"source": [
"\n",
"\n",
"In `PandaReachDense-v2` the robotic arm must place its end-effector at a target position (green ball).\n",
"\n"
],
"metadata": {
"id": "oZ7FyDEi7G3T"
}
},
{
"cell_type": "code",
"source": [
"env_id = \"PandaReachDense-v2\"\n",
"import gym\n",
"\n",
"env_id = \"PandaPushDense-v2\"\n",
"\n",
"# Create the env\n",
"env = gym.make(env_id)\n",
@@ -810,11 +753,12 @@
{
"cell_type": "markdown",
"source": [
"The observation space is a dictionary with 3 different element:\n",
"The observation space **is a dictionary with 3 different element**:\n",
"- `achieved_goal`: (x,y,z) position of the goal.\n",
"- `desired_goal`: (x,y,z) distance between the goal position and the current object position.\n",
"- `observation`: position (x,y,z) and velocity of the end-effector (vx, vy, vz).\n",
"\n"
"\n",
"Given it's a dictionary as observation, **we will need to use a MultiInputPolicy policy instead of MlpPolicy**."
],
"metadata": {
"id": "g_JClfElGFnF"
@@ -836,35 +780,103 @@
{
"cell_type": "markdown",
"source": [
"TODO: ADd action space"
"The action space is a vector with 3 values:\n",
"- Control x, y, z movement"
],
"metadata": {
"id": "5MHTHEHZS4yp"
}
},
{
"cell_type": "code",
"cell_type": "markdown",
"source": [
"Now it's your turn:\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"model = A2C(\"MultiInputPolicy\", env)\n",
"model.learn(total_timesteps=100000)"
"1. Define the environment called \"PandaReachDense-v2\"\n",
"2. Make a vectorized environment\n",
"3. Add a wrapper to normalize the observations and rewards. [Check the documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)\n",
"4. Create the A2C Model (don't forget verbose=1 to print the training logs).\n",
"5. Train it for 2M Timesteps\n",
"6. Save the model and VecNormalize statistics when saving the agent\n",
"7. Evaluate your agent\n",
"8. Publish your trained model on the Hub 🔥 with `package_to_hub`"
],
"metadata": {
"id": "C-3SfbJr0N7I"
"id": "nIhPoc5t9HjG"
}
},
{
"cell_type": "markdown",
"source": [
"### Solution (fill the todo)"
],
"metadata": {
"id": "sKGbFXZq9ikN"
}
},
{
"cell_type": "code",
"source": [
"# 1 - 2\n",
"env_id = \"PandaReachDense-v2\"\n",
"env = make_vec_env(env_id, n_envs=4)\n",
"\n",
"# 3\n",
"env = VecNormalize(env, norm_obs=True, norm_reward=False, clip_obs=10.)\n",
"\n",
"# 4\n",
"model = A2C(policy = \"MultiInputPolicy\",\n",
" env = env,\n",
" verbose=1)\n",
"# 5\n",
"model.learn(1_000_000)"
],
"metadata": {
"id": "J-cC-Feg9iMm"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [],
"source": [
"# 6\n",
"model_name = \"a2c-PandaReachDense-v2\"; \n",
"model.save(model_name)\n",
"env.save(\"vec_normalize.pkl\")\n",
"\n",
"# 7\n",
"from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize\n",
"\n",
"# Load the saved statistics\n",
"eval_env = DummyVecEnv([lambda: gym.make(\"PandaReachDense-v2\")])\n",
"eval_env = VecNormalize.load(\"vec_normalize.pkl\", eval_env)\n",
"\n",
"# do not update them at test time\n",
"eval_env.training = False\n",
"# reward normalization is not needed at test time\n",
"eval_env.norm_reward = False\n",
"\n",
"# Load the agent\n",
"model = A2C.load(model_name)\n",
"\n",
"mean_reward, std_reward = evaluate_policy(model, env)\n",
"\n",
"print(f\"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}\")\n",
"\n",
"# 8\n",
"package_to_hub(\n",
" model=model,\n",
" model_name=f\"a2c-{env_id}\",\n",
" model_architecture=\"A2C\",\n",
" env_id=env_id,\n",
" eval_env=eval_env,\n",
" repo_id=f\"ThomasSimonini/a2c-{env_id}\", # TODO: Change the username\n",
" commit_message=\"Initial commit\",\n",
")"
],
"metadata": {
"id": "16pttUsKFyZY"
"id": "-UnlKLmpg80p"
},
"execution_count": null,
"outputs": []
@@ -873,9 +885,13 @@
"cell_type": "markdown",
"source": [
"## Some additional challenges 🏆\n",
"The best way to learn **is to try things by your own**! Why not trying `HalfCheetahBulletEnv-v0`?\n",
"The best way to learn **is to try things by your own**! Why not trying `HalfCheetahBulletEnv-v0` for PyBullet?\n",
"\n",
"In the [Leaderboard](https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top?\n",
"If you want to try more advanced tasks for panda-gym you need to check what was done using **TQC or SAC** (a more sample efficient algorithm suited for robotics tasks). In real robotics, you'll use more sample-efficient algorithm for a simple reason: contrary to a simulation **if you move your robotic arm too much you have a risk to break it**.\n",
"\n",
"PandaPickAndPlace-v1: https://huggingface.co/sb3/tqc-PandaPickAndPlace-v1\n",
"\n",
"And don't hesitate to check panda-gym documentation here: https://panda-gym.readthedocs.io/en/latest/usage/train_with_sb3.html\n",
"\n",
"Here are some ideas to achieve so:\n",
"* Train more steps\n",
@@ -889,7 +905,7 @@
{
"cell_type": "markdown",
"source": [
"See you on Unit 8! 🔥\n",
"See you on Unit 7! 🔥\n",
"## Keep learning, stay awesome 🤗"
],
"metadata": {