From bd378d03196ba838661d8b4c54c096176f7e00e3 Mon Sep 17 00:00:00 2001 From: simoninithomas Date: Sat, 25 Feb 2023 15:01:22 +0100 Subject: [PATCH 1/7] Add Leaderboard update --- units/en/unit1/hands-on.mdx | 2 ++ units/en/unit2/hands-on.mdx | 2 ++ units/en/unit3/hands-on.mdx | 2 ++ units/en/unit4/hands-on.mdx | 2 ++ units/en/unit6/hands-on.mdx | 2 ++ 5 files changed, 10 insertions(+) diff --git a/units/en/unit1/hands-on.mdx b/units/en/unit1/hands-on.mdx index e36ad53..07c95b6 100644 --- a/units/en/unit1/hands-on.mdx +++ b/units/en/unit1/hands-on.mdx @@ -22,6 +22,8 @@ To validate this hands-on for the [certification process](https://huggingface.co To find your result, go to the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) and find your model, **the result = mean_reward - std of reward** +**If you don't find your model, go to the bottom of the page and click on the refresh button.** + For more information about the certification process, check this section ๐Ÿ‘‰ https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process And you can check your progress here ๐Ÿ‘‰ https://huggingface.co/spaces/ThomasSimonini/Check-my-progress-Deep-RL-Course diff --git a/units/en/unit2/hands-on.mdx b/units/en/unit2/hands-on.mdx index 473047b..baddea9 100644 --- a/units/en/unit2/hands-on.mdx +++ b/units/en/unit2/hands-on.mdx @@ -16,6 +16,8 @@ Now that we studied the Q-Learning algorithm, let's implement it from scratch an Thanks to a [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard), you'll be able to compare your results with other classmates and exchange the best practices to improve your agent's scores. Who will win the challenge for Unit 2? +**If you don't find your model, go to the bottom of the page and click on the refresh button.** + To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to push your trained Taxi model to the Hub and **get a result of >= 4.5**. To find your result, go to the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) and find your model, **the result = mean_reward - std of reward** diff --git a/units/en/unit3/hands-on.mdx b/units/en/unit3/hands-on.mdx index 118d913..39a436e 100644 --- a/units/en/unit3/hands-on.mdx +++ b/units/en/unit3/hands-on.mdx @@ -22,6 +22,8 @@ To validate this hands-on for the certification process, you need to push your t To find your result, go to the leaderboard and find your model, **the result = mean_reward - std of reward** +**If you don't find your model, go to the bottom of the page and click on the refresh button.** + For more information about the certification process, check this section ๐Ÿ‘‰ https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process And you can check your progress here ๐Ÿ‘‰ https://huggingface.co/spaces/ThomasSimonini/Check-my-progress-Deep-RL-Course diff --git a/units/en/unit4/hands-on.mdx b/units/en/unit4/hands-on.mdx index e4deb34..1859210 100644 --- a/units/en/unit4/hands-on.mdx +++ b/units/en/unit4/hands-on.mdx @@ -26,6 +26,8 @@ To validate this hands-on for the certification process, you need to push your t To find your result, go to the leaderboard and find your model, **the result = mean_reward - std of reward**. **If you don't see your model on the leaderboard, go at the bottom of the leaderboard page and click on the refresh button**. +**If you don't find your model, go to the bottom of the page and click on the refresh button.** + For more information about the certification process, check this section ๐Ÿ‘‰ https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process And you can check your progress here ๐Ÿ‘‰ https://huggingface.co/spaces/ThomasSimonini/Check-my-progress-Deep-RL-Course diff --git a/units/en/unit6/hands-on.mdx b/units/en/unit6/hands-on.mdx index 37a0d93..794877c 100644 --- a/units/en/unit6/hands-on.mdx +++ b/units/en/unit6/hands-on.mdx @@ -28,6 +28,8 @@ To validate this hands-on for the certification process, you need to push your t To find your result, [go to the leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) and find your model, **the result = mean_reward - std of reward** +**If you don't find your model, go to the bottom of the page and click on the refresh button.** + For more information about the certification process, check this section ๐Ÿ‘‰ https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process **To start the hands-on click on Open In Colab button** ๐Ÿ‘‡ : From d0967799b4f6c850dd3df9c2abd00abf13615695 Mon Sep 17 00:00:00 2001 From: simoninithomas Date: Sat, 25 Feb 2023 15:21:21 +0100 Subject: [PATCH 2/7] Update Unit 3 --- notebooks/unit3/unit3.ipynb | 2 +- units/en/unit3/hands-on.mdx | 4 ++-- units/en/unit4/hands-on.mdx | 10 +++++----- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/notebooks/unit3/unit3.ipynb b/notebooks/unit3/unit3.ipynb index e776208..5c21dca 100644 --- a/notebooks/unit3/unit3.ipynb +++ b/notebooks/unit3/unit3.ipynb @@ -301,7 +301,7 @@ "## Train our Deep Q-Learning Agent to Play Space Invaders ๐Ÿ‘พ\n", "\n", "To train an agent with RL-Baselines3-Zoo, we just need to do two things:\n", - "1. We define the hyperparameters in `rl-baselines3-zoo/hyperparams/dqn.yml`\n", + "1. We define the hyperparameters in `/content/rl-baselines3-zoo/hyperparams/dqn.yml`\n", "\n", "\"DQN\n" ] diff --git a/units/en/unit3/hands-on.mdx b/units/en/unit3/hands-on.mdx index 39a436e..409d410 100644 --- a/units/en/unit3/hands-on.mdx +++ b/units/en/unit3/hands-on.mdx @@ -38,7 +38,7 @@ And you can check your progress here ๐Ÿ‘‰ https://huggingface.co/spaces/ThomasSi Unit 3 Thumbnail -In this notebook, **you'll train a Deep Q-Learning agent** playing Space Invaders using [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), a training framework based on [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) that provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos. +In this notebook, **you'll train a Deep Q-Learning agent** playing Space Invaders using [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), a training framework based on [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) that provides scripts for training, evaluating agents, tuning arameters, plotting results and recording videos. We're using the [RL-Baselines-3 Zoo integration, a vanilla version of Deep Q-Learning](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html) with no extensions such as Double-DQN, Dueling-DQN, and Prioritized Experience Replay. @@ -133,7 +133,7 @@ pip install -r requirements.txt ## Train our Deep Q-Learning Agent to Play Space Invaders ๐Ÿ‘พ To train an agent with RL-Baselines3-Zoo, we just need to do two things: -1. We define the hyperparameters in `rl-baselines3-zoo/hyperparams/dqn.yml` +1. We define the hyperparameters in `/content/rl-baselines3-zoo/hyperparams/dqn.yml` DQN Hyperparameters diff --git a/units/en/unit4/hands-on.mdx b/units/en/unit4/hands-on.mdx index 1859210..857887e 100644 --- a/units/en/unit4/hands-on.mdx +++ b/units/en/unit4/hands-on.mdx @@ -375,11 +375,11 @@ The second question you may ask is **why do we minimize the loss**? Did you talk - We want to maximize our utility function $J(\theta)$, but in PyTorch and TensorFlow, it's better to **minimize an objective function.** - So let's say we want to reinforce action 3 at a certain timestep. Before training this action P is 0.25. - - So we want to modify $\theta$ such that $\pi_\theta(a_3|s; \theta) > 0.25$ - - Because all P must sum to 1, max $\pi_\theta(a_3|s; \theta)$ will **minimize other action probability.** - - So we should tell PyTorch **to min $1 - \pi_\theta(a_3|s; \theta)$.** - - This loss function approaches 0 as $\pi_\theta(a_3|s; \theta)$ nears 1. - - So we are encouraging the gradient to max $\pi_\theta(a_3|s; \theta)$ + - So we want to modify \\(theta \\) such that \\(\pi_\theta(a_3|s; \theta) > 0.25 \\) + - Because all P must sum to 1, max \\(pi_\theta(a_3|s; \theta)\\) will **minimize other action probability.** + - So we should tell PyTorch **to min \\(1 - \pi_\theta(a_3|s; \theta)\\).** + - This loss function approaches 0 as \\(\pi_\theta(a_3|s; \theta)\\) nears 1. + - So we are encouraging the gradient to max \\(\pi_\theta(a_3|s; \theta)\\) ```python From f744071184024e85aed54cd42ca6ae5d963e5d0f Mon Sep 17 00:00:00 2001 From: simoninithomas Date: Sat, 25 Feb 2023 15:23:02 +0100 Subject: [PATCH 3/7] Update Actor Critic --- units/en/unit6/advantage-actor-critic.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/units/en/unit6/advantage-actor-critic.mdx b/units/en/unit6/advantage-actor-critic.mdx index 8b7863c..64f07fc 100644 --- a/units/en/unit6/advantage-actor-critic.mdx +++ b/units/en/unit6/advantage-actor-critic.mdx @@ -16,7 +16,7 @@ On the other hand, your friend (Critic) will also update their way to provide fe This is the idea behind Actor-Critic. We learn two function approximations: -- *A policy* that **controls how our agent acts**: \\( \pi_{\theta}(s,a) \\) +- *A policy* that **controls how our agent acts**: \\( \pi_{\theta}(s) \\) - *A value function* to assist the policy update by measuring how good the action taken is: \\( \hat{q}_{w}(s,a) \\) @@ -24,7 +24,7 @@ This is the idea behind Actor-Critic. We learn two function approximations: Now that we have seen the Actor Critic's big picture, let's dive deeper to understand how Actor and Critic improve together during the training. As we saw, with Actor-Critic methods, there are two function approximations (two neural networks): -- *Actor*, a **policy function** parameterized by theta: \\( \pi_{\theta}(s,a) \\) +- *Actor*, a **policy function** parameterized by theta: \\( \pi_{\theta}(s) \\) - *Critic*, a **value function** parameterized by w: \\( \hat{q}_{w}(s,a) \\) Let's see the training process to understand how Actor and Critic are optimized: From d041fd29ea444d308d14954b44e3a643f12e0ad5 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 25 Feb 2023 18:16:01 +0100 Subject: [PATCH 4/7] Update hands-on.mdx --- units/en/unit2/hands-on.mdx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/units/en/unit2/hands-on.mdx b/units/en/unit2/hands-on.mdx index baddea9..4c201a2 100644 --- a/units/en/unit2/hands-on.mdx +++ b/units/en/unit2/hands-on.mdx @@ -261,7 +261,8 @@ print("There are ", action_space, " possible actions") ``` ```python -# Let's create our Qtable of size (state_space, action_space) and initialized each values at 0 using np.zeros +# Let's create our Qtable of size (state_space, action_space) and initialized each values at 0 using np.zeros. np.zeros needs a tuple (a,b) + def initialize_q_table(state_space, action_space): Qtable = return Qtable From 8e7b5c9c144257d5df471d5d8cc274dd0d5158fc Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 25 Feb 2023 18:16:54 +0100 Subject: [PATCH 5/7] Update unit2.ipynb --- notebooks/unit2/unit2.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/notebooks/unit2/unit2.ipynb b/notebooks/unit2/unit2.ipynb index 90baea8..13a86f0 100644 --- a/notebooks/unit2/unit2.ipynb +++ b/notebooks/unit2/unit2.ipynb @@ -511,7 +511,7 @@ }, "outputs": [], "source": [ - "# Let's create our Qtable of size (state_space, action_space) and initialized each values at 0 using np.zeros\n", + "# Let's create our Qtable of size (state_space, action_space) and initialized each values at 0 using np.zeros. np.zeros needs a tuple (a,b)\n", "def initialize_q_table(state_space, action_space):\n", " Qtable = \n", " return Qtable" From 0b017c9aa8b0f13356a85c1b1343bcf402bce42f Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 25 Feb 2023 18:21:06 +0100 Subject: [PATCH 6/7] Update unit1.ipynb --- notebooks/unit1/unit1.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/notebooks/unit1/unit1.ipynb b/notebooks/unit1/unit1.ipynb index ec814b0..0e7b4f1 100644 --- a/notebooks/unit1/unit1.ipynb +++ b/notebooks/unit1/unit1.ipynb @@ -1099,7 +1099,7 @@ "\n", "Take time to really **grasp the material before continuing and try the additional challenges**. Itโ€™s important to master these elements and having a solid foundations.\n", "\n", - "Naturally, during the course, weโ€™re going to use and deeper explain again these terms but **itโ€™s better to have a good understanding of them now before diving into the next chapters.**\n" + "Naturally, during the course, weโ€™re going to dive deeper into these concepts but **itโ€™s better to have a good understanding of them now before diving into the next chapters.**\n\n" ] }, { From 80bb91e764c075b8a99129d323f3c241f102b0aa Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Sat, 25 Feb 2023 18:21:36 +0100 Subject: [PATCH 7/7] Update hands-on.mdx --- units/en/unit1/hands-on.mdx | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/units/en/unit1/hands-on.mdx b/units/en/unit1/hands-on.mdx index 07c95b6..5c181a0 100644 --- a/units/en/unit1/hands-on.mdx +++ b/units/en/unit1/hands-on.mdx @@ -659,8 +659,7 @@ If youโ€™re still feel confused with all these elements...it's totally normal! * Take time to really **grasp the material before continuing and try the additional challenges**. Itโ€™s important to master these elements and having a solid foundations. -Naturally, during the course, weโ€™re going to use and deeper explain again these terms but **itโ€™s better to have a good understanding of them now before diving into the next chapters.** - +Naturally, during the course, weโ€™re going to dive deeper into these concepts but **itโ€™s better to have a good understanding of them now before diving into the next chapters.** Next time, in the bonus unit 1, you'll train Huggy the Dog to fetch the stick.