mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-14 02:11:17 +08:00
Merge pull request #116 from huggingface/ThomasSimonini/Unit2-updates
Small Updates Unit 2
This commit is contained in:
@@ -1,4 +1,5 @@
|
||||
# Hands on [[hands-on]]
|
||||
# Train your first Deep Reinforcement Learning Agent 🤖 [[hands-on]]
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -1,10 +1,10 @@
|
||||
# Hands-on [[hands-on]]
|
||||
|
||||
<CourseFloatingBanner classNames="absolute z-10 right-0 top-0"
|
||||
notebooks={[
|
||||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/unit2/unit2.ipynb"}
|
||||
]}
|
||||
askForHelpUrl="http://hf.co/join/discord" />
|
||||
<CourseFloatingBanner classNames="absolute z-10 right-0 top-0"
|
||||
notebooks={[
|
||||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/unit2/unit2.ipynb"}
|
||||
]}
|
||||
askForHelpUrl="http://hf.co/join/discord" />
|
||||
|
||||
|
||||
|
||||
@@ -21,6 +21,7 @@ Thanks to a [leaderboard](https://huggingface.co/spaces/huggingface-projects/Dee
|
||||
|
||||
[](https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/unit2/unit2.ipynb)
|
||||
|
||||
|
||||
# Unit 2: Q-Learning with FrozenLake-v1 ⛄ and Taxi-v3 🚕
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/thumbnail.jpg" alt="Unit 2 Thumbnail">
|
||||
|
||||
@@ -37,7 +37,8 @@ The best way to learn and [to avoid the illusion of competence](https://www.cour
|
||||
|
||||
**The Bellman equation is a recursive equation** that works like this: instead of starting for each state from the beginning and calculating the return, we can consider the value of any state as:
|
||||
|
||||
\\(Rt+1 + (\gamma * V(St+1)))\\
|
||||
Rt+1 + gamma * V(St+1)
|
||||
|
||||
The immediate reward + the discounted value of the state that follows
|
||||
|
||||
</details>
|
||||
|
||||
Reference in New Issue
Block a user