Update Observation Space

2026-05-16 13:55:52 +08:00 · 2023-08-05 15:52:24 +02:00
parent 163f4e9047
commit 5c910ecb27
1 changed files with 20 additions and 20 deletions
--- a/notebooks/unit1/unit1.ipynb
+++ b/notebooks/unit1/unit1.ipynb
@@ -7,7 +7,7 @@
        "colab_type": "text"
      },
      "source": [
-        "<a href=\"https://colab.research.google.com/github/huggingface/deep-rl-class/blob/GymnasiumUpdate%2FUnit1/notebooks/unit1.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+        "<a href=\"https://colab.research.google.com/github/huggingface/deep-rl-class/blob/main/notebooks/unit1/unit1.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
@@ -101,10 +101,10 @@
        "\n",
        "- 📖 Study Deep Reinforcement Learning in **theory and practice**.\n",
        "- 🧑‍💻 Learn to **use famous Deep RL libraries** such as Stable Baselines3, RL Baselines3 Zoo, CleanRL and Sample Factory 2.0.\n",
-        "- 🤖 Train **agents in unique environments** \n",
+        "- 🤖 Train **agents in unique environments**\n",
        "- 🎓 **Earn a certificate of completion** by completing 80% of the assignments.\n",
        "\n",
-        "And more! \n",
+        "And more!\n",
        "\n",
        "Check 📚 the syllabus 👉 https://simoninithomas.github.io/deep-rl-course\n",
        "\n",
@@ -248,7 +248,7 @@
    {
      "cell_type": "markdown",
      "source": [
-        "During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames). \n",
+        "During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames).\n",
        "\n",
        "Hence the following cell will install virtual screen libraries and create and run a virtual screen 🖥"
      ],
@@ -428,7 +428,7 @@
        "  # Do this action in the environment and get\n",
        "  # next_state, reward, terminated, truncated and info\n",
        "  observation, reward, terminated, truncated, info = env.step(action)\n",
-        "  \n",
+        "\n",
        "  # If the game is terminated (in our case we land, crashed) or truncated (timeout)\n",
        "  if terminated or truncated:\n",
        "      # Reset the environment\n",
@@ -453,7 +453,7 @@
        "---\n",
        "\n",
        "\n",
-        "💡 A good habit when you start to use an environment is to check its documentation \n",
+        "💡 A good habit when you start to use an environment is to check its documentation\n",
        "\n",
        "👉 https://gymnasium.farama.org/environments/box2d/lunar_lander/\n",
        "\n",
@@ -498,8 +498,8 @@
        "- Vertical speed (y)\n",
        "- Angle\n",
        "- Angular speed\n",
-        "- If the left leg contact point has touched the land\n",
-        "- If the right leg contact point has touched the land\n"
+        "- If the left leg contact point has touched the land (boolean)\n",
+        "- If the right leg contact point has touched the land (boolean)\n"
      ]
    },
    {
@@ -521,7 +521,7 @@
        "id": "MyxXwkI2Magx"
      },
      "source": [
-        "The action space (the set of possible actions the agent can take) is discrete with 4 actions available 🎮: \n",
+        "The action space (the set of possible actions the agent can take) is discrete with 4 actions available 🎮:\n",
        "\n",
        "- Action 0: Do nothing,\n",
        "- Action 1: Fire left orientation engine,\n",
@@ -648,7 +648,7 @@
        "# TODO: Define a PPO MlpPolicy architecture\n",
        "# We use MultiLayerPerceptron (MLPPolicy) because the input is a vector,\n",
        "# if we had frames as input we would use CnnPolicy\n",
-        "model = "
+        "model ="
      ]
    },
    {
@@ -762,7 +762,7 @@
        "eval_env =\n",
        "\n",
        "# Evaluate the model with 10 evaluation episodes and deterministic=True\n",
-        "mean_reward, std_reward = \n",
+        "mean_reward, std_reward =\n",
        "\n",
        "# Print the results\n",
        "\n"
@@ -844,7 +844,7 @@
        "\n",
        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/create-token.jpg\" alt=\"Create HF Token\">\n",
        "\n",
-        "- Copy the token \n",
+        "- Copy the token\n",
        "- Run the cell below and paste the token"
      ]
    },
@@ -913,10 +913,10 @@
        "\n",
        "## TODO: Define a repo_id\n",
        "## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2\n",
-        "repo_id = \n",
+        "repo_id =\n",
        "\n",
        "# TODO: Define the name of the environment\n",
-        "env_id = \n",
+        "env_id =\n",
        "\n",
        "# Create the evaluation env and set the render_mode=\"rgb_array\"\n",
        "eval_env = DummyVecEnv([lambda: Monitor(gym.make(env_id, render_mode=\"rgb_array\"))])\n",
@@ -930,7 +930,7 @@
        "\n",
        "# method save, evaluate, generate a model card and record a replay video of your agent before pushing the repo to the hub\n",
        "package_to_hub(model=model, # Our trained model\n",
-        "               model_name=model_name, # The name of our trained model \n",
+        "               model_name=model_name, # The name of our trained model\n",
        "               model_architecture=model_architecture, # The model architecture we used: in our case PPO\n",
        "               env_id=env_id, # Name of the environment\n",
        "               eval_env=eval_env, # Evaluation Environment\n",
@@ -978,7 +978,7 @@
        "\n",
        "# PLACE the package_to_hub function you've just filled here\n",
        "package_to_hub(model=model, # Our trained model\n",
-        "               model_name=model_name, # The name of our trained model \n",
+        "               model_name=model_name, # The name of our trained model\n",
        "               model_architecture=model_architecture, # The model architecture we used: in our case PPO\n",
        "               env_id=env_id, # Name of the environment\n",
        "               eval_env=eval_env, # Evaluation Environment\n",
@@ -995,7 +995,7 @@
      "cell_type": "markdown",
      "source": [
        "Congrats 🥳 you've just trained and uploaded your first Deep Reinforcement Learning agent. The script above should have displayed a link to a model repository such as https://huggingface.co/osanseviero/test_sb3. When you go to this link, you can:\n",
-        "* See a video preview of your agent at the right. \n",
+        "* See a video preview of your agent at the right.\n",
        "* Click \"Files and versions\" to see all the files in the repository.\n",
        "* Click \"Use in stable-baselines3\" to get a code snippet that shows how to load the model.\n",
        "* A model card (`README.md` file) which gives a description of the model\n",
@@ -1017,7 +1017,7 @@
        "## Load a saved LunarLander model from the Hub 🤗\n",
        "Thanks to [ironbar](https://github.com/ironbar) for the contribution.\n",
        "\n",
-        "Loading a saved model from the Hub is really easy. \n",
+        "Loading a saved model from the Hub is really easy.\n",
        "\n",
        "You go to https://huggingface.co/models?library=stable-baselines3 to see the list of all the Stable-baselines3 saved models.\n",
        "1. You select one and copy its repo_id\n",
@@ -1115,7 +1115,7 @@
      },
      "source": [
        "## Some additional challenges 🏆\n",
-        "The best way to learn **is to try things by your own**! As you saw, the current agent is not doing great. As a first suggestion, you can train for more steps. With 1,000,000 steps, we saw some great results! \n",
+        "The best way to learn **is to try things by your own**! As you saw, the current agent is not doing great. As a first suggestion, you can train for more steps. With 1,000,000 steps, we saw some great results!\n",
        "\n",
        "In the [Leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top?\n",
        "\n",
@@ -1190,4 +1190,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 0
-}
+}