Update Observation Space

This commit is contained in:
Thomas Simonini
2023-08-05 15:52:24 +02:00
parent 163f4e9047
commit 5c910ecb27

View File

@@ -7,7 +7,7 @@
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/huggingface/deep-rl-class/blob/GymnasiumUpdate%2FUnit1/notebooks/unit1.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
"<a href=\"https://colab.research.google.com/github/huggingface/deep-rl-class/blob/main/notebooks/unit1/unit1.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
@@ -101,10 +101,10 @@
"\n",
"- 📖 Study Deep Reinforcement Learning in **theory and practice**.\n",
"- 🧑‍💻 Learn to **use famous Deep RL libraries** such as Stable Baselines3, RL Baselines3 Zoo, CleanRL and Sample Factory 2.0.\n",
"- 🤖 Train **agents in unique environments** \n",
"- 🤖 Train **agents in unique environments**\n",
"- 🎓 **Earn a certificate of completion** by completing 80% of the assignments.\n",
"\n",
"And more! \n",
"And more!\n",
"\n",
"Check 📚 the syllabus 👉 https://simoninithomas.github.io/deep-rl-course\n",
"\n",
@@ -248,7 +248,7 @@
{
"cell_type": "markdown",
"source": [
"During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames). \n",
"During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames).\n",
"\n",
"Hence the following cell will install virtual screen libraries and create and run a virtual screen 🖥"
],
@@ -428,7 +428,7 @@
" # Do this action in the environment and get\n",
" # next_state, reward, terminated, truncated and info\n",
" observation, reward, terminated, truncated, info = env.step(action)\n",
" \n",
"\n",
" # If the game is terminated (in our case we land, crashed) or truncated (timeout)\n",
" if terminated or truncated:\n",
" # Reset the environment\n",
@@ -453,7 +453,7 @@
"---\n",
"\n",
"\n",
"💡 A good habit when you start to use an environment is to check its documentation \n",
"💡 A good habit when you start to use an environment is to check its documentation\n",
"\n",
"👉 https://gymnasium.farama.org/environments/box2d/lunar_lander/\n",
"\n",
@@ -498,8 +498,8 @@
"- Vertical speed (y)\n",
"- Angle\n",
"- Angular speed\n",
"- If the left leg contact point has touched the land\n",
"- If the right leg contact point has touched the land\n"
"- If the left leg contact point has touched the land (boolean)\n",
"- If the right leg contact point has touched the land (boolean)\n"
]
},
{
@@ -521,7 +521,7 @@
"id": "MyxXwkI2Magx"
},
"source": [
"The action space (the set of possible actions the agent can take) is discrete with 4 actions available 🎮: \n",
"The action space (the set of possible actions the agent can take) is discrete with 4 actions available 🎮:\n",
"\n",
"- Action 0: Do nothing,\n",
"- Action 1: Fire left orientation engine,\n",
@@ -648,7 +648,7 @@
"# TODO: Define a PPO MlpPolicy architecture\n",
"# We use MultiLayerPerceptron (MLPPolicy) because the input is a vector,\n",
"# if we had frames as input we would use CnnPolicy\n",
"model = "
"model ="
]
},
{
@@ -762,7 +762,7 @@
"eval_env =\n",
"\n",
"# Evaluate the model with 10 evaluation episodes and deterministic=True\n",
"mean_reward, std_reward = \n",
"mean_reward, std_reward =\n",
"\n",
"# Print the results\n",
"\n"
@@ -844,7 +844,7 @@
"\n",
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/create-token.jpg\" alt=\"Create HF Token\">\n",
"\n",
"- Copy the token \n",
"- Copy the token\n",
"- Run the cell below and paste the token"
]
},
@@ -913,10 +913,10 @@
"\n",
"## TODO: Define a repo_id\n",
"## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2\n",
"repo_id = \n",
"repo_id =\n",
"\n",
"# TODO: Define the name of the environment\n",
"env_id = \n",
"env_id =\n",
"\n",
"# Create the evaluation env and set the render_mode=\"rgb_array\"\n",
"eval_env = DummyVecEnv([lambda: Monitor(gym.make(env_id, render_mode=\"rgb_array\"))])\n",
@@ -930,7 +930,7 @@
"\n",
"# method save, evaluate, generate a model card and record a replay video of your agent before pushing the repo to the hub\n",
"package_to_hub(model=model, # Our trained model\n",
" model_name=model_name, # The name of our trained model \n",
" model_name=model_name, # The name of our trained model\n",
" model_architecture=model_architecture, # The model architecture we used: in our case PPO\n",
" env_id=env_id, # Name of the environment\n",
" eval_env=eval_env, # Evaluation Environment\n",
@@ -978,7 +978,7 @@
"\n",
"# PLACE the package_to_hub function you've just filled here\n",
"package_to_hub(model=model, # Our trained model\n",
" model_name=model_name, # The name of our trained model \n",
" model_name=model_name, # The name of our trained model\n",
" model_architecture=model_architecture, # The model architecture we used: in our case PPO\n",
" env_id=env_id, # Name of the environment\n",
" eval_env=eval_env, # Evaluation Environment\n",
@@ -995,7 +995,7 @@
"cell_type": "markdown",
"source": [
"Congrats 🥳 you've just trained and uploaded your first Deep Reinforcement Learning agent. The script above should have displayed a link to a model repository such as https://huggingface.co/osanseviero/test_sb3. When you go to this link, you can:\n",
"* See a video preview of your agent at the right. \n",
"* See a video preview of your agent at the right.\n",
"* Click \"Files and versions\" to see all the files in the repository.\n",
"* Click \"Use in stable-baselines3\" to get a code snippet that shows how to load the model.\n",
"* A model card (`README.md` file) which gives a description of the model\n",
@@ -1017,7 +1017,7 @@
"## Load a saved LunarLander model from the Hub 🤗\n",
"Thanks to [ironbar](https://github.com/ironbar) for the contribution.\n",
"\n",
"Loading a saved model from the Hub is really easy. \n",
"Loading a saved model from the Hub is really easy.\n",
"\n",
"You go to https://huggingface.co/models?library=stable-baselines3 to see the list of all the Stable-baselines3 saved models.\n",
"1. You select one and copy its repo_id\n",
@@ -1115,7 +1115,7 @@
},
"source": [
"## Some additional challenges 🏆\n",
"The best way to learn **is to try things by your own**! As you saw, the current agent is not doing great. As a first suggestion, you can train for more steps. With 1,000,000 steps, we saw some great results! \n",
"The best way to learn **is to try things by your own**! As you saw, the current agent is not doing great. As a first suggestion, you can train for more steps. With 1,000,000 steps, we saw some great results!\n",
"\n",
"In the [Leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top?\n",
"\n",
@@ -1190,4 +1190,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
}
}