Merge pull request #116 from huggingface/ThomasSimonini/Unit2-updates

Small Updates Unit 2
2026-06-15 06:27:24 +08:00 · 2022-12-15 08:21:17 +01:00
parent e6e6b1f9af 3080ad3fc1
commit 123b695882
3 changed files with 10 additions and 7 deletions
--- a/units/en/unit1/hands-on.mdx
+++ b/units/en/unit1/hands-on.mdx
@@ -1,4 +1,5 @@
-# Hands on [[hands-on]]
+# Train your first Deep Reinforcement Learning Agent 🤖 [[hands-on]]
+



--- a/units/en/unit2/hands-on.mdx
+++ b/units/en/unit2/hands-on.mdx
@@ -1,10 +1,10 @@
 # Hands-on [[hands-on]]

-<CourseFloatingBanner classNames="absolute z-10 right-0 top-0"
-notebooks={[
-  {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/unit2/unit2.ipynb"}
-  ]}
-askForHelpUrl="http://hf.co/join/discord" />
+      <CourseFloatingBanner classNames="absolute z-10 right-0 top-0"
+      notebooks={[
+        {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/unit2/unit2.ipynb"}
+        ]}
+        askForHelpUrl="http://hf.co/join/discord" />



@@ -21,6 +21,7 @@ Thanks to a [leaderboard](https://huggingface.co/spaces/huggingface-projects/Dee

 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/unit2/unit2.ipynb)

+
 # Unit 2: Q-Learning with FrozenLake-v1 ⛄ and Taxi-v3 🚕

 <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/thumbnail.jpg" alt="Unit 2 Thumbnail">
--- a/units/en/unit2/mid-way-quiz.mdx
+++ b/units/en/unit2/mid-way-quiz.mdx
@@ -37,7 +37,8 @@ The best way to learn and [to avoid the illusion of competence](https://www.cour

 **The Bellman equation is a recursive equation** that works like this: instead of starting for each state from the beginning and calculating the return, we can consider the value of any state as:

-\\(Rt+1 + (\gamma * V(St+1)))\\
+Rt+1 + gamma * V(St+1)
+
 The immediate reward + the discounted value of the state that follows

 </details>