mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-05-11 10:34:20 +08:00
26 lines
1.5 KiB
Plaintext
26 lines
1.5 KiB
Plaintext
# Q-Learning Recap [[q-learning-recap]]
|
||
|
||
|
||
The *Q-Learning* **is the RL algorithm that** :
|
||
|
||
- Trains *Q-Function*, an **action-value function** that contains, as internal memory, a *Q-table* **that contains all the state-action pair values.**
|
||
|
||
- Given a state and action, our Q-Function **will search into its Q-table the corresponding value.**
|
||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/Q-function-2.jpg" alt="Q function" width="100%"/>
|
||
|
||
- When the training is done,**we have an optimal Q-Function, so an optimal Q-Table.**
|
||
|
||
- And if we **have an optimal Q-function**, we
|
||
have an optimal policy,since we **know for each state, what is the best action to take.**
|
||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/link-value-policy.jpg" alt="Link value policy" width="100%"/>
|
||
|
||
But, in the beginning, our **Q-Table is useless since it gives arbitrary value for each state-action pair (most of the time we initialize the Q-Table to 0 values)**. But, as we’ll explore the environment and update our Q-Table it will give us better and better approximations
|
||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit2/q-learning.jpeg" alt="q-learning.jpeg" width="100%"/>
|
||
|
||
This is the Q-Learning pseudocode:
|
||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/Q-learning-2.jpg" alt="Q-Learning" width="100%"/>
|