From d0967799b4f6c850dd3df9c2abd00abf13615695 Mon Sep 17 00:00:00 2001 From: simoninithomas Date: Sat, 25 Feb 2023 15:21:21 +0100 Subject: [PATCH] Update Unit 3 --- notebooks/unit3/unit3.ipynb | 2 +- units/en/unit3/hands-on.mdx | 4 ++-- units/en/unit4/hands-on.mdx | 10 +++++----- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/notebooks/unit3/unit3.ipynb b/notebooks/unit3/unit3.ipynb index e776208..5c21dca 100644 --- a/notebooks/unit3/unit3.ipynb +++ b/notebooks/unit3/unit3.ipynb @@ -301,7 +301,7 @@ "## Train our Deep Q-Learning Agent to Play Space Invaders 👾\n", "\n", "To train an agent with RL-Baselines3-Zoo, we just need to do two things:\n", - "1. We define the hyperparameters in `rl-baselines3-zoo/hyperparams/dqn.yml`\n", + "1. We define the hyperparameters in `/content/rl-baselines3-zoo/hyperparams/dqn.yml`\n", "\n", "\"DQN\n" ] diff --git a/units/en/unit3/hands-on.mdx b/units/en/unit3/hands-on.mdx index 39a436e..409d410 100644 --- a/units/en/unit3/hands-on.mdx +++ b/units/en/unit3/hands-on.mdx @@ -38,7 +38,7 @@ And you can check your progress here 👉 https://huggingface.co/spaces/ThomasSi Unit 3 Thumbnail -In this notebook, **you'll train a Deep Q-Learning agent** playing Space Invaders using [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), a training framework based on [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) that provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos. +In this notebook, **you'll train a Deep Q-Learning agent** playing Space Invaders using [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), a training framework based on [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) that provides scripts for training, evaluating agents, tuning arameters, plotting results and recording videos. We're using the [RL-Baselines-3 Zoo integration, a vanilla version of Deep Q-Learning](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html) with no extensions such as Double-DQN, Dueling-DQN, and Prioritized Experience Replay. @@ -133,7 +133,7 @@ pip install -r requirements.txt ## Train our Deep Q-Learning Agent to Play Space Invaders 👾 To train an agent with RL-Baselines3-Zoo, we just need to do two things: -1. We define the hyperparameters in `rl-baselines3-zoo/hyperparams/dqn.yml` +1. We define the hyperparameters in `/content/rl-baselines3-zoo/hyperparams/dqn.yml` DQN Hyperparameters diff --git a/units/en/unit4/hands-on.mdx b/units/en/unit4/hands-on.mdx index 1859210..857887e 100644 --- a/units/en/unit4/hands-on.mdx +++ b/units/en/unit4/hands-on.mdx @@ -375,11 +375,11 @@ The second question you may ask is **why do we minimize the loss**? Did you talk - We want to maximize our utility function $J(\theta)$, but in PyTorch and TensorFlow, it's better to **minimize an objective function.** - So let's say we want to reinforce action 3 at a certain timestep. Before training this action P is 0.25. - - So we want to modify $\theta$ such that $\pi_\theta(a_3|s; \theta) > 0.25$ - - Because all P must sum to 1, max $\pi_\theta(a_3|s; \theta)$ will **minimize other action probability.** - - So we should tell PyTorch **to min $1 - \pi_\theta(a_3|s; \theta)$.** - - This loss function approaches 0 as $\pi_\theta(a_3|s; \theta)$ nears 1. - - So we are encouraging the gradient to max $\pi_\theta(a_3|s; \theta)$ + - So we want to modify \\(theta \\) such that \\(\pi_\theta(a_3|s; \theta) > 0.25 \\) + - Because all P must sum to 1, max \\(pi_\theta(a_3|s; \theta)\\) will **minimize other action probability.** + - So we should tell PyTorch **to min \\(1 - \pi_\theta(a_3|s; \theta)\\).** + - This loss function approaches 0 as \\(\pi_\theta(a_3|s; \theta)\\) nears 1. + - So we are encouraging the gradient to max \\(\pi_\theta(a_3|s; \theta)\\) ```python