From 89e97f0196dbb647c6f7e31a5a5fd0d0ca3571a4 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Wed, 4 Jan 2023 14:10:57 +0100 Subject: [PATCH] Update hands-on.mdx --- units/en/unit4/hands-on.mdx | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/units/en/unit4/hands-on.mdx b/units/en/unit4/hands-on.mdx index 859022e..ff782db 100644 --- a/units/en/unit4/hands-on.mdx +++ b/units/en/unit4/hands-on.mdx @@ -67,6 +67,7 @@ To test its robustness, we're going to train it in 2 different simple environmen We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the GitHub Repo](https://github.com/huggingface/deep-rl-class/issues). ## Objectives of this notebook 🏆 + At the end of the notebook, you will: - Be able to **code from scratch a Reinforce algorithm using PyTorch.** @@ -74,23 +75,13 @@ At the end of the notebook, you will: - Be able to **push your trained agent to the Hub** with a nice video replay and an evaluation score 🔥. ## Prerequisites 🏗️ + Before diving into the notebook, you need to: 🔲 📚 [Study Policy Gradients by reading Unit 4](https://huggingface.co/deep-rl-course/unit4/introduction) # Let's code Reinforce algorithm from scratch 🔥 - -To validate this hands-on for the certification process, you need to push your trained models to the Hub. - -- Get a result of >= 350 for `Cartpole-v1`. -- Get a result of >= 5 for `PixelCopter`. - -To find your result, go to the leaderboard and find your model, **the result = mean_reward - std of reward**. **If you don't see your model on the leaderboard, go at the bottom of the leaderboard page and click on the refresh button**. - -For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process - - ## An advice 💡 It's better to run this colab in a copy on your Google Drive, so that **if it timeouts** you still have the saved notebook on your Google Drive and do not need to fill everything from scratch. @@ -209,7 +200,8 @@ As explained in [Reinforcement Learning Tips and Tricks](https://stable-baseline > Validate the implementation by making it run on harder and harder envs (you can compare results against the RL zoo). You usually need to run hyperparameter optimization for that step. -___ + + ### The CartPole-v1 environment > A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum is placed upright on the cart and the goal is to balance the pole by applying forces in the left and right direction on the cart. @@ -788,8 +780,6 @@ def push_to_hub(repo_id, model, hyperparameters, eval_env, video_fps=30, local_r print(f"Your model is pushed to the hub. You can view your model here: {repo_url}") ``` -### . - By using `push_to_hub` **you evaluate, record a replay, generate a model card of your agent and push it to the Hub**. This way: @@ -982,6 +972,7 @@ push_to_hub( ``` ## Some additional challenges 🏆 + The best way to learn **is to try things on your own**! As you saw, the current agent is not doing great. As a first suggestion, you can train for more steps. But also trying to find better parameters. In the [Leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top?