mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-02-10 05:35:13 +08:00
Créé avec Colaboratory
This commit is contained in:
@@ -5,18 +5,7 @@
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"private_outputs": true,
|
||||
"collapsed_sections": [
|
||||
"MoubJX20oKaQ",
|
||||
"DoUNkTExoUED",
|
||||
"BTuQAUAPoa5E",
|
||||
"tF42HvI7-gs5",
|
||||
"nWAuOOLh-oQf",
|
||||
"-voECBK3An9j",
|
||||
"Qk9ykOk9D6Qh",
|
||||
"G3xy3Nf3c2O1",
|
||||
"usatLaZ8dM4P"
|
||||
],
|
||||
"authorship_tag": "ABX9TyPovbUwEqbQAH1J8OxiHKDm",
|
||||
"authorship_tag": "ABX9TyNTCZRW9WsSED/roRBW2oQ5",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
@@ -47,17 +36,15 @@
|
||||
"\n",
|
||||
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/thumbnail.png\" alt=\"Thumbnail\"/>\n",
|
||||
"\n",
|
||||
"In this small notebook you'll learn to use A2C with PyBullet and Panda-Gym two set of robotics environments. \n",
|
||||
"In this notebook, you'll learn to use A2C with PyBullet and Panda-Gym, two set of robotics environments. \n",
|
||||
"\n",
|
||||
"With [PyBullet](https://github.com/bulletphysics/bullet3), you're going to **train robots to walk and run**:\n",
|
||||
"- `AntBulletEnv-v0` 🕸️ More precisely a spider (they say Ant but come on... it's a spider 😆) 🕸️\n",
|
||||
"- `HalfCheetahBulletEnv-v0`\n",
|
||||
"With [PyBullet](https://github.com/bulletphysics/bullet3), you're going to **train a robot to move**:\n",
|
||||
"- `AntBulletEnv-v0` 🕸️ More precisely, a spider (they say Ant but come on... it's a spider 😆) 🕸️\n",
|
||||
"\n",
|
||||
"Then, with [Panda-Gym](https://github.com/qgallouedec/panda-gym), you're going **to train a robotic arm** (Franka Emika Panda robot) to perform some tasks:\n",
|
||||
"Then, with [Panda-Gym](https://github.com/qgallouedec/panda-gym), you're going **to train a robotic arm** (Franka Emika Panda robot) to perform a task:\n",
|
||||
"- `Reach`: the robot must place its end-effector at a target position.\n",
|
||||
"- `Slide`: the robot has to slide an object to a target position.\n",
|
||||
"\n",
|
||||
"After that, you'll be able to train other robotics environments."
|
||||
"After that, you'll be able **to train in other robotics environments**.\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "-PTReiOw-RAN"
|
||||
@@ -66,7 +53,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"TODO: ADD VIDEO OF WHAT IT LOOKS LIKE"
|
||||
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/environments.gif\" alt=\"Robotics environments\"/>"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "2VGL_0ncoAJI"
|
||||
@@ -162,12 +149,15 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to:\n",
|
||||
"To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to push three models:\n",
|
||||
"\n",
|
||||
"TODO ADD CERTIFICATION RECOMMENDATION\n",
|
||||
"- `AntBulletEnv-v0` get a result of >= 650.\n",
|
||||
"- `PandaReachDense-v2` get a result of >= -3.5.\n",
|
||||
"\n",
|
||||
"To find your result, go to the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) and find your model, **the result = mean_reward - std of reward**\n",
|
||||
"\n",
|
||||
"If you don't find your model, **go to the bottom of the page and click on the refresh button**\n",
|
||||
"\n",
|
||||
"For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process"
|
||||
],
|
||||
"metadata": {
|
||||
@@ -225,18 +215,6 @@
|
||||
"!pip3 install pyvirtualdisplay"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Additional dependencies for RL Baselines3 Zoo\n",
|
||||
"!apt-get install swig cmake freeglut3-dev "
|
||||
],
|
||||
"metadata": {
|
||||
"id": "fWyKJCy_NJBX"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
@@ -262,24 +240,12 @@
|
||||
"- `panda-gym`: Contains the robotics arm environments.\n",
|
||||
"- `stable-baselines3[extra]`: The SB3 deep reinforcement learning library.\n",
|
||||
"- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.\n",
|
||||
"- `huggingface_hub`: Library allowing anyone to work with the Hub repositories.\n",
|
||||
"- `gym==0.21`: The classical version of gym."
|
||||
"- `huggingface_hub`: Library allowing anyone to work with the Hub repositories."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "e1obkbdJ_KnG"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"!pip install -r https://huggingface.co/spaces/ThomasSimonini/temp-space-requirements/raw/main/requirements/requirements-unit6.txt"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "69jUeXrLryos"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@@ -288,7 +254,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"TODO: CHANGE TO THE ONE COMMENTED#!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit6/requirements-unit6.txt"
|
||||
"!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit6/requirements-unit6.txt"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -303,11 +269,9 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"import gym\n",
|
||||
"import pybullet_envs\n",
|
||||
"\n",
|
||||
"import gymnasium\n",
|
||||
"import panda_gym\n",
|
||||
"import gym\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
@@ -326,15 +290,6 @@
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# Part 1: PyBullet Environments\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "KIqf-N-otczo"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
@@ -350,23 +305,13 @@
|
||||
"source": [
|
||||
"### Create the AntBulletEnv-v0\n",
|
||||
"#### The environment 🎮\n",
|
||||
"In this environment, the agent needs to use correctly its different joints to walk correctly."
|
||||
"In this environment, the agent needs to use correctly its different joints to walk correctly.\n",
|
||||
"You can find a detailled explanation of this environment here: https://hackmd.io/@jeffreymo/SJJrSJh5_#PyBullet"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "frVXOrnlBerQ"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"import gym"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "RJ0XJccTt9FX"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
@@ -400,7 +345,9 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"TODO: Add explanation obs space"
|
||||
"The observation Space (from [Jeffrey Y Mo](https://hackmd.io/@jeffreymo/SJJrSJh5_#PyBullet)):\n",
|
||||
"\n",
|
||||
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/obs_space.png\" alt=\"PyBullet Ant Obs space\"/>\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "QzMmsdMJS7jh"
|
||||
@@ -422,7 +369,9 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Todo: Add explanation action space"
|
||||
"The action Space (from [Jeffrey Y Mo](https://hackmd.io/@jeffreymo/SJJrSJh5_#PyBullet)):\n",
|
||||
"\n",
|
||||
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/action_space.png\" alt=\"PyBullet Ant Obs space\"/>\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "3RfsHhzZS9Pw"
|
||||
@@ -440,7 +389,9 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"A good practice in reinforcement learning is to [normalize input features](https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html). For that, a wrapper exists and will compute a running average and standard deviation of input features.\n",
|
||||
"A good practice in reinforcement learning is to [normalize input features](https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html). \n",
|
||||
"\n",
|
||||
"For that, a wrapper exists and will compute a running average and standard deviation of input features.\n",
|
||||
"\n",
|
||||
"We also normalize rewards with this same wrapper by adding `norm_reward = True`\n",
|
||||
"\n",
|
||||
@@ -493,6 +444,8 @@
|
||||
"\n",
|
||||
"In this case, because we have a vector of 28 values as input, we'll use an MLP (multi-layer perceptron) as policy.\n",
|
||||
"\n",
|
||||
"For more information about A2C implementation with StableBaselines3 check: https://stable-baselines3.readthedocs.io/en/master/modules/a2c.html#notes\n",
|
||||
"\n",
|
||||
"To find the best parameters I checked the [official trained agents by Stable-Baselines3 team](https://huggingface.co/sb3)."
|
||||
],
|
||||
"metadata": {
|
||||
@@ -531,7 +484,6 @@
|
||||
" n_steps = 8,\n",
|
||||
" vf_coef = 0.4,\n",
|
||||
" ent_coef = 0.0,\n",
|
||||
" tensorboard_log = \"./tensorboard\",\n",
|
||||
" policy_kwargs=dict(\n",
|
||||
" log_std_init=-2, ortho_init=False),\n",
|
||||
" normalize_advantage=False,\n",
|
||||
@@ -717,33 +669,11 @@
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Environment 2: HalfCheetahBulletEnv-v0\n",
|
||||
"\n",
|
||||
"For this environment, you must follow the same process as the first one. **Don't hesitate to save this notebook to your Google Drive** since timeout can happen. You may also want to **complete this notebook two times**.\n",
|
||||
"\n",
|
||||
"To see that you understood the complete process from environment definition to `package_to_hub` why not try to do **it yourself first without the solution?**\n",
|
||||
"\n",
|
||||
"1. Define the environment called HalfCheetahBulletEnv-v0\n",
|
||||
"2. Make a vectorized environment\n",
|
||||
"3. Add a wrapper to normalize the observations and rewards. [Check the documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)\n",
|
||||
"4. Create the A2C Model\n",
|
||||
"5. Train it for 2M Timesteps\n",
|
||||
"6. Save the model and VecNormalize statistics when saving the agent\n",
|
||||
"7. Evaluate your agent\n",
|
||||
"8. Publish your trained model on the Hub 🔥 with `package_to_hub`"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "-voECBK3An9j"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Take a coffee break ☕\n",
|
||||
"- You already trained two robotics environments that learned to move congratutlations 🥳!\n",
|
||||
"- You already trained your first robot that learned to move congratutlations 🥳!\n",
|
||||
"- It's **time to take a break**. Don't hesitate to **save this notebook** `File > Save a copy to Drive` to work on this second part later.\n"
|
||||
],
|
||||
"metadata": {
|
||||
@@ -753,16 +683,15 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# Part 2: Robotic Arm Environments with `panda-gym`\n",
|
||||
"## Environment 2: PandaReachDense-v2 🦾\n",
|
||||
"\n",
|
||||
"The second set of robotics environments we're going to train are a robotic arm that needs to do controls (moving the arm and using the end-effector).\n",
|
||||
"The agent we're going to train is a robotic arm that needs to do controls (moving the arm and using the end-effector).\n",
|
||||
"\n",
|
||||
"In robotics, the *end-effector* is the device at the end of a robotic arm designed to interact with the environment.\n",
|
||||
"\n",
|
||||
"1. In the first environment, `PandaReach`, the robot must place its end-effector at a target position (green ball).\n",
|
||||
"2. In the second environment, `PandaSlide`, the robot has to slide an object to a target position.\n",
|
||||
"In `PandaReach`, the robot must place its end-effector at a target position (green ball).\n",
|
||||
"\n",
|
||||
"We're going to use the dense version of the environments. It means we'll get a *dense reward function* that **will provide a reward at each timestep** (the closer the agent is to complete the task, the higher the reward). Contrary to a *sparse reward function* where the environment **return a reward if and only if the task is completed**.\n",
|
||||
"We're going to use the dense version of this environment. It means we'll get a *dense reward function* that **will provide a reward at each timestep** (the closer the agent is to completing the task, the higher the reward). Contrary to a *sparse reward function* where the environment **return a reward if and only if the task is completed**.\n",
|
||||
"\n",
|
||||
"Also, we're going to use the *End-effector displacement control*, it means the **action corresponds to the displacement of the end-effector**. We don't control the individual motion of each joint (joint control).\n",
|
||||
"\n",
|
||||
@@ -776,10 +705,24 @@
|
||||
"id": "5VWfwAA7EJg7"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"\n",
|
||||
"\n",
|
||||
"In `PandaReachDense-v2` the robotic arm must place its end-effector at a target position (green ball).\n",
|
||||
"\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "oZ7FyDEi7G3T"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"env_id = \"PandaReachDense-v2\"\n",
|
||||
"import gym\n",
|
||||
"\n",
|
||||
"env_id = \"PandaPushDense-v2\"\n",
|
||||
"\n",
|
||||
"# Create the env\n",
|
||||
"env = gym.make(env_id)\n",
|
||||
@@ -810,11 +753,12 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"The observation space is a dictionary with 3 different element:\n",
|
||||
"The observation space **is a dictionary with 3 different element**:\n",
|
||||
"- `achieved_goal`: (x,y,z) position of the goal.\n",
|
||||
"- `desired_goal`: (x,y,z) distance between the goal position and the current object position.\n",
|
||||
"- `observation`: position (x,y,z) and velocity of the end-effector (vx, vy, vz).\n",
|
||||
"\n"
|
||||
"\n",
|
||||
"Given it's a dictionary as observation, **we will need to use a MultiInputPolicy policy instead of MlpPolicy**."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "g_JClfElGFnF"
|
||||
@@ -836,35 +780,103 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"TODO: ADd action space"
|
||||
"The action space is a vector with 3 values:\n",
|
||||
"- Control x, y, z movement"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "5MHTHEHZS4yp"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Now it's your turn:\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"model = A2C(\"MultiInputPolicy\", env)\n",
|
||||
"model.learn(total_timesteps=100000)"
|
||||
"1. Define the environment called \"PandaReachDense-v2\"\n",
|
||||
"2. Make a vectorized environment\n",
|
||||
"3. Add a wrapper to normalize the observations and rewards. [Check the documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)\n",
|
||||
"4. Create the A2C Model (don't forget verbose=1 to print the training logs).\n",
|
||||
"5. Train it for 2M Timesteps\n",
|
||||
"6. Save the model and VecNormalize statistics when saving the agent\n",
|
||||
"7. Evaluate your agent\n",
|
||||
"8. Publish your trained model on the Hub 🔥 with `package_to_hub`"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "C-3SfbJr0N7I"
|
||||
"id": "nIhPoc5t9HjG"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"### Solution (fill the todo)"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "sKGbFXZq9ikN"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# 1 - 2\n",
|
||||
"env_id = \"PandaReachDense-v2\"\n",
|
||||
"env = make_vec_env(env_id, n_envs=4)\n",
|
||||
"\n",
|
||||
"# 3\n",
|
||||
"env = VecNormalize(env, norm_obs=True, norm_reward=False, clip_obs=10.)\n",
|
||||
"\n",
|
||||
"# 4\n",
|
||||
"model = A2C(policy = \"MultiInputPolicy\",\n",
|
||||
" env = env,\n",
|
||||
" verbose=1)\n",
|
||||
"# 5\n",
|
||||
"model.learn(1_000_000)"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "J-cC-Feg9iMm"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [],
|
||||
"source": [
|
||||
"# 6\n",
|
||||
"model_name = \"a2c-PandaReachDense-v2\"; \n",
|
||||
"model.save(model_name)\n",
|
||||
"env.save(\"vec_normalize.pkl\")\n",
|
||||
"\n",
|
||||
"# 7\n",
|
||||
"from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize\n",
|
||||
"\n",
|
||||
"# Load the saved statistics\n",
|
||||
"eval_env = DummyVecEnv([lambda: gym.make(\"PandaReachDense-v2\")])\n",
|
||||
"eval_env = VecNormalize.load(\"vec_normalize.pkl\", eval_env)\n",
|
||||
"\n",
|
||||
"# do not update them at test time\n",
|
||||
"eval_env.training = False\n",
|
||||
"# reward normalization is not needed at test time\n",
|
||||
"eval_env.norm_reward = False\n",
|
||||
"\n",
|
||||
"# Load the agent\n",
|
||||
"model = A2C.load(model_name)\n",
|
||||
"\n",
|
||||
"mean_reward, std_reward = evaluate_policy(model, env)\n",
|
||||
"\n",
|
||||
"print(f\"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}\")\n",
|
||||
"\n",
|
||||
"# 8\n",
|
||||
"package_to_hub(\n",
|
||||
" model=model,\n",
|
||||
" model_name=f\"a2c-{env_id}\",\n",
|
||||
" model_architecture=\"A2C\",\n",
|
||||
" env_id=env_id,\n",
|
||||
" eval_env=eval_env,\n",
|
||||
" repo_id=f\"ThomasSimonini/a2c-{env_id}\", # TODO: Change the username\n",
|
||||
" commit_message=\"Initial commit\",\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "16pttUsKFyZY"
|
||||
"id": "-UnlKLmpg80p"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
@@ -873,9 +885,13 @@
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Some additional challenges 🏆\n",
|
||||
"The best way to learn **is to try things by your own**! Why not trying `HalfCheetahBulletEnv-v0`?\n",
|
||||
"The best way to learn **is to try things by your own**! Why not trying `HalfCheetahBulletEnv-v0` for PyBullet?\n",
|
||||
"\n",
|
||||
"In the [Leaderboard](https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top?\n",
|
||||
"If you want to try more advanced tasks for panda-gym you need to check what was done using **TQC or SAC** (a more sample efficient algorithm suited for robotics tasks). In real robotics, you'll use more sample-efficient algorithm for a simple reason: contrary to a simulation **if you move your robotic arm too much you have a risk to break it**.\n",
|
||||
"\n",
|
||||
"PandaPickAndPlace-v1: https://huggingface.co/sb3/tqc-PandaPickAndPlace-v1\n",
|
||||
"\n",
|
||||
"And don't hesitate to check panda-gym documentation here: https://panda-gym.readthedocs.io/en/latest/usage/train_with_sb3.html\n",
|
||||
"\n",
|
||||
"Here are some ideas to achieve so:\n",
|
||||
"* Train more steps\n",
|
||||
@@ -889,7 +905,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"See you on Unit 8! 🔥\n",
|
||||
"See you on Unit 7! 🔥\n",
|
||||
"## Keep learning, stay awesome 🤗"
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
Reference in New Issue
Block a user