mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-03-30 08:40:27 +08:00
26 lines
1.5 KiB
Plaintext
26 lines
1.5 KiB
Plaintext
# Q-Learning Recap [[q-learning-recap]]
|
||
|
||
|
||
*Q-Learning* **is the RL algorithm that** :
|
||
|
||
- Trains a *Q-function*, an **action-value function** encoded, in internal memory, by a *Q-table* **containing all the state-action pair values.**
|
||
|
||
- Given a state and action, our Q-function **will search its Q-table for the corresponding value.**
|
||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/Q-function-2.jpg" alt="Q function" width="100%"/>
|
||
|
||
- When the training is done, **we have an optimal Q-function, or, equivalently, an optimal Q-table.**
|
||
|
||
- And if we **have an optimal Q-function**, we
|
||
have an optimal policy, since we **know, for each state, the best action to take.**
|
||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/link-value-policy.jpg" alt="Link value policy" width="100%"/>
|
||
|
||
But, in the beginning, our **Q-table is useless since it gives arbitrary values for each state-action pair (most of the time we initialize the Q-table to 0 values)**. But, as we explore the environment and update our Q-table it will give us a better and better approximation.
|
||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit2/q-learning.jpeg" alt="q-learning.jpeg" width="100%"/>
|
||
|
||
This is the Q-Learning pseudocode:
|
||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/Q-learning-2.jpg" alt="Q-Learning" width="100%"/>
|