mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-08 04:59:36 +08:00
Update notebook
This commit is contained in:
@@ -5,7 +5,18 @@
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"private_outputs": true,
|
||||
"authorship_tag": "ABX9TyM4Z04oGTU1B2rRuxHfuNly",
|
||||
"collapsed_sections": [
|
||||
"MoubJX20oKaQ",
|
||||
"DoUNkTExoUED",
|
||||
"BTuQAUAPoa5E",
|
||||
"tF42HvI7-gs5",
|
||||
"nWAuOOLh-oQf",
|
||||
"-voECBK3An9j",
|
||||
"Qk9ykOk9D6Qh",
|
||||
"G3xy3Nf3c2O1",
|
||||
"usatLaZ8dM4P"
|
||||
],
|
||||
"authorship_tag": "ABX9TyPovbUwEqbQAH1J8OxiHKDm",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
@@ -34,7 +45,7 @@
|
||||
"source": [
|
||||
"# Unit 6: Advantage Actor Critic (A2C) using Robotics Simulations with PyBullet and Panda-Gym 🤖\n",
|
||||
"\n",
|
||||
"TODO: ADD THUMBNAIL\n",
|
||||
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/thumbnail.png\" alt=\"Thumbnail\"/>\n",
|
||||
"\n",
|
||||
"In this small notebook you'll learn to use A2C with PyBullet and Panda-Gym two set of robotics environments. \n",
|
||||
"\n",
|
||||
@@ -252,10 +263,7 @@
|
||||
"- `stable-baselines3[extra]`: The SB3 deep reinforcement learning library.\n",
|
||||
"- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.\n",
|
||||
"- `huggingface_hub`: Library allowing anyone to work with the Hub repositories.\n",
|
||||
"\n",
|
||||
"We're going to install **two versions of gym**:\n",
|
||||
"- `gym==0.21`: The classical version of gym for PyBullet environments.\n",
|
||||
"- `gymnasium`: [The new Gym library by Farama Foundation](https://github.com/Farama-Foundation/Gymnasium) for Panda Gym environments."
|
||||
"- `gym==0.21`: The classical version of gym."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "e1obkbdJ_KnG"
|
||||
@@ -295,12 +303,12 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"import gymnasium as gymnasium\n",
|
||||
"import panda_gym\n",
|
||||
"\n",
|
||||
"import gym\n",
|
||||
"import pybullet_envs\n",
|
||||
"\n",
|
||||
"import gymnasium\n",
|
||||
"import panda_gym\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from huggingface_sb3 import load_from_hub, package_to_hub\n",
|
||||
@@ -351,7 +359,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"import gym # As mentionned we use gym for PyBullet and gymnasium for panda-gym"
|
||||
"import gym"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "RJ0XJccTt9FX"
|
||||
@@ -389,6 +397,15 @@
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"TODO: Add explanation obs space"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "QzMmsdMJS7jh"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
@@ -402,6 +419,15 @@
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Todo: Add explanation action space"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "3RfsHhzZS9Pw"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
@@ -696,11 +722,11 @@
|
||||
"source": [
|
||||
"## Environment 2: HalfCheetahBulletEnv-v0\n",
|
||||
"\n",
|
||||
"For this environment, you need to follow the same process that the first one. **Don't hesitate here to save this notebook to your Google Drive** since timeout can happen. You may also want to **complete this notebook in two times**.\n",
|
||||
"For this environment, you must follow the same process as the first one. **Don't hesitate to save this notebook to your Google Drive** since timeout can happen. You may also want to **complete this notebook two times**.\n",
|
||||
"\n",
|
||||
"In order to see that you understood the complete process from environment definition to `package_to_hub` why not trying to do **it yourself first without solution?**\n",
|
||||
"To see that you understood the complete process from environment definition to `package_to_hub` why not try to do **it yourself first without the solution?**\n",
|
||||
"\n",
|
||||
"1. Define the enviroment called HalfCheetahBulletEnv-v0\n",
|
||||
"1. Define the environment called HalfCheetahBulletEnv-v0\n",
|
||||
"2. Make a vectorized environment\n",
|
||||
"3. Add a wrapper to normalize the observations and rewards. [Check the documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)\n",
|
||||
"4. Create the A2C Model\n",
|
||||
@@ -727,18 +753,121 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# Part 2: Robotic Arm Environments with `panda-gym`\n"
|
||||
"# Part 2: Robotic Arm Environments with `panda-gym`\n",
|
||||
"\n",
|
||||
"The second set of robotics environments we're going to train are a robotic arm that needs to do controls (moving the arm and using the end-effector).\n",
|
||||
"\n",
|
||||
"In robotics, the *end-effector* is the device at the end of a robotic arm designed to interact with the environment.\n",
|
||||
"\n",
|
||||
"1. In the first environment, `PandaReach`, the robot must place its end-effector at a target position (green ball).\n",
|
||||
"2. In the second environment, `PandaSlide`, the robot has to slide an object to a target position.\n",
|
||||
"\n",
|
||||
"We're going to use the dense version of the environments. It means we'll get a *dense reward function* that **will provide a reward at each timestep** (the closer the agent is to complete the task, the higher the reward). Contrary to a *sparse reward function* where the environment **return a reward if and only if the task is completed**.\n",
|
||||
"\n",
|
||||
"Also, we're going to use the *End-effector displacement control*, it means the **action corresponds to the displacement of the end-effector**. We don't control the individual motion of each joint (joint control).\n",
|
||||
"\n",
|
||||
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/robotics.jpg\" alt=\"Robotics\"/>\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"This way, **the training will be easier**.\n",
|
||||
"\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "5VWfwAA7EJg7"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"env_id = \"PandaReachDense-v2\"\n",
|
||||
"\n",
|
||||
"# Create the env\n",
|
||||
"env = gym.make(env_id)\n",
|
||||
"\n",
|
||||
"# Get the state space and action space\n",
|
||||
"s_size = env.observation_space.shape\n",
|
||||
"a_size = env.action_space"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "zXzAu3HYF1WD"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"print(\"_____OBSERVATION SPACE_____ \\n\")\n",
|
||||
"print(\"The State Space is: \", s_size)\n",
|
||||
"print(\"Sample observation\", env.observation_space.sample()) # Get a random observation"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "E-U9dexcF-FB"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"The observation space is a dictionary with 3 different element:\n",
|
||||
"- `achieved_goal`: (x,y,z) position of the goal.\n",
|
||||
"- `desired_goal`: (x,y,z) distance between the goal position and the current object position.\n",
|
||||
"- `observation`: position (x,y,z) and velocity of the end-effector (vx, vy, vz).\n",
|
||||
"\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "g_JClfElGFnF"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"print(\"\\n _____ACTION SPACE_____ \\n\")\n",
|
||||
"print(\"The Action Space is: \", a_size)\n",
|
||||
"print(\"Action Space Sample\", env.action_space.sample()) # Take a random action"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "ib1Kxy4AF-FC"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"TODO: ADd action space"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "5MHTHEHZS4yp"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"model = A2C(\"MultiInputPolicy\", env)\n",
|
||||
"model.learn(total_timesteps=100000)"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "C-3SfbJr0N7I"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [],
|
||||
"metadata": {
|
||||
"id": "fW_CdlUsEVP2"
|
||||
}
|
||||
"id": "16pttUsKFyZY"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
|
||||
Reference in New Issue
Block a user