Update Unit 3

2026-06-14 22:17:15 +08:00 · 2023-02-25 15:21:21 +01:00
parent bd378d0319
commit d0967799b4
3 changed files with 8 additions and 8 deletions
--- a/notebooks/unit3/unit3.ipynb
+++ b/notebooks/unit3/unit3.ipynb
@@ -301,7 +301,7 @@
        "## Train our Deep Q-Learning Agent to Play Space Invaders 👾\n",
        "\n",
        "To train an agent with RL-Baselines3-Zoo, we just need to do two things:\n",
-        "1. We define the hyperparameters in `rl-baselines3-zoo/hyperparams/dqn.yml`\n",
+        "1. We define the hyperparameters in `/content/rl-baselines3-zoo/hyperparams/dqn.yml`\n",
        "\n",
        "<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit3/hyperparameters.png\" alt=\"DQN Hyperparameters\">\n"
      ]
--- a/units/en/unit3/hands-on.mdx
+++ b/units/en/unit3/hands-on.mdx
@@ -38,7 +38,7 @@ And you can check your progress here 👉 https://huggingface.co/spaces/ThomasSi

 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit4/thumbnail.jpg" alt="Unit 3 Thumbnail">

-In this notebook, **you'll train a Deep Q-Learning agent** playing Space Invaders using [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), a training framework based on [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) that provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.
+In this notebook, **you'll train a Deep Q-Learning agent** playing Space Invaders using [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), a training framework based on [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) that provides scripts for training, evaluating agents, tuning arameters, plotting results and recording videos.

 We're using the [RL-Baselines-3 Zoo integration, a vanilla version of Deep Q-Learning](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html) with no extensions such as Double-DQN, Dueling-DQN, and Prioritized Experience Replay.

@@ -133,7 +133,7 @@ pip install -r requirements.txt
 ## Train our Deep Q-Learning Agent to Play Space Invaders 👾

 To train an agent with RL-Baselines3-Zoo, we just need to do two things:
-1. We define the hyperparameters in `rl-baselines3-zoo/hyperparams/dqn.yml`
+1. We define the hyperparameters in `/content/rl-baselines3-zoo/hyperparams/dqn.yml`

 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit3/hyperparameters.png" alt="DQN Hyperparameters">

--- a/units/en/unit4/hands-on.mdx
+++ b/units/en/unit4/hands-on.mdx
@@ -375,11 +375,11 @@ The second question you may ask is **why do we minimize the loss**? Did you talk

 - We want to maximize our utility function $J(\theta)$, but in PyTorch and TensorFlow, it's better to **minimize an objective function.**
    - So let's say we want to reinforce action 3 at a certain timestep. Before training this action P is 0.25.
-    - So we want to modify $\theta$ such that $\pi_\theta(a_3|s; \theta) > 0.25$
-    - Because all P must sum to 1, max $\pi_\theta(a_3|s; \theta)$ will **minimize other action probability.**
-    - So we should tell PyTorch **to min $1 - \pi_\theta(a_3|s; \theta)$.**
-    - This loss function approaches 0 as $\pi_\theta(a_3|s; \theta)$ nears 1.
-    - So we are encouraging the gradient to max $\pi_\theta(a_3|s; \theta)$
+    - So we want to modify \\(theta \\) such that \\(\pi_\theta(a_3|s; \theta) > 0.25 \\)
+    - Because all P must sum to 1, max \\(pi_\theta(a_3|s; \theta)\\) will **minimize other action probability.**
+    - So we should tell PyTorch **to min \\(1 - \pi_\theta(a_3|s; \theta)\\).**
+    - This loss function approaches 0 as \\(\pi_\theta(a_3|s; \theta)\\) nears 1.
+    - So we are encouraging the gradient to max \\(\pi_\theta(a_3|s; \theta)\\)


 ```python