mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-01 01:30:56 +08:00
27 lines
1.6 KiB
Plaintext
27 lines
1.6 KiB
Plaintext
# Introduction to Q-Learning [[introduction-q-learning]]
|
||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/thumbnail.jpg" alt="Unit 2 thumbnail" width="100%">
|
||
|
||
|
||
In the first unit of this class, we learned about Reinforcement Learning (RL), the RL process, and the different methods to solve an RL problem. We also **trained our first agents and uploaded them to the Hugging Face Hub.**
|
||
|
||
In this unit, we're going to **dive deeper into one of the Reinforcement Learning methods: value-based methods** and study our first RL algorithm: **Q-Learning.**
|
||
|
||
We'll also **implement our first RL agent from scratch**, a Q-Learning agent, and will train it in two environments:
|
||
|
||
1. Frozen-Lake-v1 (non-slippery version): where our agent will need to **go from the starting state (S) to the goal state (G)** by walking only on frozen tiles (F) and avoiding holes (H).
|
||
2. An autonomous taxi: where our agent will need **to learn to navigate** a city to **transport its passengers from point A to point B.**
|
||
|
||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/envs.gif" alt="Environments"/>
|
||
|
||
Concretely, we will:
|
||
|
||
- Learn about **value-based methods**.
|
||
- Learn about the **differences between Monte Carlo and Temporal Difference Learning**.
|
||
- Study and implement **our first RL algorithm**: Q-Learning.
|
||
|
||
This unit is **fundamental if you want to be able to work on Deep Q-Learning**: the first Deep RL algorithm that played Atari games and beat the human level on some of them (breakout, space invaders, etc).
|
||
|
||
So let's get started! 🚀
|