Files
deep-rl-class/units/en/unit2/q-learning-recap.mdx
Dylan Wilson 85b932e409 Typos Unit2
2023-04-18 15:05:47 -05:00

26 lines
1.5 KiB
Plaintext
Raw Permalink Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Q-Learning Recap [[q-learning-recap]]
*Q-Learning* **is the RL algorithm that** :
- Trains a *Q-function*, an **action-value function** encoded, in internal memory, by a *Q-table* **containing all the state-action pair values.**
- Given a state and action, our Q-function **will search its Q-table for the corresponding value.**
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/Q-function-2.jpg" alt="Q function" width="100%"/>
- When the training is done, **we have an optimal Q-function, or, equivalently, an optimal Q-table.**
- And if we **have an optimal Q-function**, we
have an optimal policy, since we **know, for each state, the best action to take.**
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/link-value-policy.jpg" alt="Link value policy" width="100%"/>
But, in the beginning, our **Q-table is useless since it gives arbitrary values for each state-action pair (most of the time we initialize the Q-table to 0 values)**. But, as we explore the environment and update our Q-table it will give us a better and better approximation.
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit2/q-learning.jpeg" alt="q-learning.jpeg" width="100%"/>
This is the Q-Learning pseudocode:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/Q-learning-2.jpg" alt="Q-Learning" width="100%"/>