From 5c910ecb27189e6ba32de34474f7367c3e2ab670 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 5 Aug 2023 15:52:24 +0200 Subject: [PATCH] Update Observation Space --- notebooks/unit1/unit1.ipynb | 40 ++++++++++++++++++------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/notebooks/unit1/unit1.ipynb b/notebooks/unit1/unit1.ipynb index 8283dd3..95562ff 100644 --- a/notebooks/unit1/unit1.ipynb +++ b/notebooks/unit1/unit1.ipynb @@ -7,7 +7,7 @@ "colab_type": "text" }, "source": [ - "\"Open" + "\"Open" ] }, { @@ -101,10 +101,10 @@ "\n", "- 📖 Study Deep Reinforcement Learning in **theory and practice**.\n", "- 🧑‍💻 Learn to **use famous Deep RL libraries** such as Stable Baselines3, RL Baselines3 Zoo, CleanRL and Sample Factory 2.0.\n", - "- 🤖 Train **agents in unique environments** \n", + "- 🤖 Train **agents in unique environments**\n", "- 🎓 **Earn a certificate of completion** by completing 80% of the assignments.\n", "\n", - "And more! \n", + "And more!\n", "\n", "Check 📚 the syllabus 👉 https://simoninithomas.github.io/deep-rl-course\n", "\n", @@ -248,7 +248,7 @@ { "cell_type": "markdown", "source": [ - "During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames). \n", + "During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames).\n", "\n", "Hence the following cell will install virtual screen libraries and create and run a virtual screen 🖥" ], @@ -428,7 +428,7 @@ " # Do this action in the environment and get\n", " # next_state, reward, terminated, truncated and info\n", " observation, reward, terminated, truncated, info = env.step(action)\n", - " \n", + "\n", " # If the game is terminated (in our case we land, crashed) or truncated (timeout)\n", " if terminated or truncated:\n", " # Reset the environment\n", @@ -453,7 +453,7 @@ "---\n", "\n", "\n", - "💡 A good habit when you start to use an environment is to check its documentation \n", + "💡 A good habit when you start to use an environment is to check its documentation\n", "\n", "👉 https://gymnasium.farama.org/environments/box2d/lunar_lander/\n", "\n", @@ -498,8 +498,8 @@ "- Vertical speed (y)\n", "- Angle\n", "- Angular speed\n", - "- If the left leg contact point has touched the land\n", - "- If the right leg contact point has touched the land\n" + "- If the left leg contact point has touched the land (boolean)\n", + "- If the right leg contact point has touched the land (boolean)\n" ] }, { @@ -521,7 +521,7 @@ "id": "MyxXwkI2Magx" }, "source": [ - "The action space (the set of possible actions the agent can take) is discrete with 4 actions available 🎮: \n", + "The action space (the set of possible actions the agent can take) is discrete with 4 actions available 🎮:\n", "\n", "- Action 0: Do nothing,\n", "- Action 1: Fire left orientation engine,\n", @@ -648,7 +648,7 @@ "# TODO: Define a PPO MlpPolicy architecture\n", "# We use MultiLayerPerceptron (MLPPolicy) because the input is a vector,\n", "# if we had frames as input we would use CnnPolicy\n", - "model = " + "model =" ] }, { @@ -762,7 +762,7 @@ "eval_env =\n", "\n", "# Evaluate the model with 10 evaluation episodes and deterministic=True\n", - "mean_reward, std_reward = \n", + "mean_reward, std_reward =\n", "\n", "# Print the results\n", "\n" @@ -844,7 +844,7 @@ "\n", "\"Create\n", "\n", - "- Copy the token \n", + "- Copy the token\n", "- Run the cell below and paste the token" ] }, @@ -913,10 +913,10 @@ "\n", "## TODO: Define a repo_id\n", "## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2\n", - "repo_id = \n", + "repo_id =\n", "\n", "# TODO: Define the name of the environment\n", - "env_id = \n", + "env_id =\n", "\n", "# Create the evaluation env and set the render_mode=\"rgb_array\"\n", "eval_env = DummyVecEnv([lambda: Monitor(gym.make(env_id, render_mode=\"rgb_array\"))])\n", @@ -930,7 +930,7 @@ "\n", "# method save, evaluate, generate a model card and record a replay video of your agent before pushing the repo to the hub\n", "package_to_hub(model=model, # Our trained model\n", - " model_name=model_name, # The name of our trained model \n", + " model_name=model_name, # The name of our trained model\n", " model_architecture=model_architecture, # The model architecture we used: in our case PPO\n", " env_id=env_id, # Name of the environment\n", " eval_env=eval_env, # Evaluation Environment\n", @@ -978,7 +978,7 @@ "\n", "# PLACE the package_to_hub function you've just filled here\n", "package_to_hub(model=model, # Our trained model\n", - " model_name=model_name, # The name of our trained model \n", + " model_name=model_name, # The name of our trained model\n", " model_architecture=model_architecture, # The model architecture we used: in our case PPO\n", " env_id=env_id, # Name of the environment\n", " eval_env=eval_env, # Evaluation Environment\n", @@ -995,7 +995,7 @@ "cell_type": "markdown", "source": [ "Congrats 🥳 you've just trained and uploaded your first Deep Reinforcement Learning agent. The script above should have displayed a link to a model repository such as https://huggingface.co/osanseviero/test_sb3. When you go to this link, you can:\n", - "* See a video preview of your agent at the right. \n", + "* See a video preview of your agent at the right.\n", "* Click \"Files and versions\" to see all the files in the repository.\n", "* Click \"Use in stable-baselines3\" to get a code snippet that shows how to load the model.\n", "* A model card (`README.md` file) which gives a description of the model\n", @@ -1017,7 +1017,7 @@ "## Load a saved LunarLander model from the Hub 🤗\n", "Thanks to [ironbar](https://github.com/ironbar) for the contribution.\n", "\n", - "Loading a saved model from the Hub is really easy. \n", + "Loading a saved model from the Hub is really easy.\n", "\n", "You go to https://huggingface.co/models?library=stable-baselines3 to see the list of all the Stable-baselines3 saved models.\n", "1. You select one and copy its repo_id\n", @@ -1115,7 +1115,7 @@ }, "source": [ "## Some additional challenges 🏆\n", - "The best way to learn **is to try things by your own**! As you saw, the current agent is not doing great. As a first suggestion, you can train for more steps. With 1,000,000 steps, we saw some great results! \n", + "The best way to learn **is to try things by your own**! As you saw, the current agent is not doing great. As a first suggestion, you can train for more steps. With 1,000,000 steps, we saw some great results!\n", "\n", "In the [Leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top?\n", "\n", @@ -1190,4 +1190,4 @@ }, "nbformat": 4, "nbformat_minor": 0 -} +} \ No newline at end of file