Merge pull request #168 from dario248/main

Unit 3 Glossary
2026-04-07 20:49:45 +08:00 · 2022-12-31 21:51:20 +01:00
parent bc9bb6c52f 5a4117630b
commit ab8598b772
2 changed files with 41 additions and 0 deletions
--- a/units/en/_toctree.yml
+++ b/units/en/_toctree.yml
@@ -92,6 +92,8 @@
    title: The Deep Q-Network (DQN)
  - local: unit3/deep-q-algorithm
    title: The Deep Q Algorithm
+  - local: unit3/glossary
+    title: Glossary
  - local: unit3/hands-on
    title: Hands-on
  - local: unit3/quiz
--- a/units/en/unit3/glossary.mdx
+++ b/units/en/unit3/glossary.mdx
@@ -0,0 +1,39 @@
+# Glossary 
+
+This is a community-created glossary. Contributions are welcomed!
+
+- **Tabular Method:** type of problem in which the state and action spaces are small enough to approximate value functions to be represented as arrays and tables. 
+**Q-learning** is an example of tabular method since a table is used to represent the value for different state-action pairs.
+
+- **Deep Q-Learning:** method that trains a neural network to approximate, given a state, the different **Q-values** for each possible action at that state.
+Is used to solve problems when observational space is too big to apply a tabular Q-Learning approach. 
+
+- **Temporal Limitation:** is a difficulty presented when the environment state is represented by frames. A frame by itself does not provide temporal information. 
+In order to obtain temporal information, we need to **stack** a number of frames together.  
+
+- **Phases of Deep Q-Learning:**
+  - **Sampling:** actions are performed, and observed experience tuples are stored in a **replay memory**.
+  - **Training:** batches of tuples are selected randomly and the neural network updates its weights using gradient descent. 
+  
+- **Solutions to stabilize Deep Q-Learning:**
+  - **Experience Replay:** a replay memory is created to save experiences samples that can be reused during training. 
+  This allows the agent to learn from the same experiences multiple times. Also, it makes the agent avoid to forget previous experiences as it get new ones.
+  **Random sampling** from replay buffer allows to remove correlation in the observation sequences and prevents action values from oscillating or diverging
+  catastrophically.
+
+  - **Fixed Q-Target:** In order to calculate the **Q-Target** we need to estimate the discounted optimal **Q-value** of the next state by using Bellman equation. The problem
+  is that the same network weigths are used to calculate the **Q-Target** and the **Q-value**. This means that everytime we are modifying the **Q-value**, the **Q-Target** also moves with it.
+  To avoid this issue, a separate network with fixed parameters is used for estimating the Temporal Difference Target. The target network is updated by copying parameters from
+  our Deep Q-Network after certain **C steps**. 
+  
+  - **Double DQN:** method to handle **overstimation** of **Q-Values**. This solution uses two networks to decouple the action selection from the target **-Value generation**:
+     -**DQN Network** to select the best action to take for the next state (the action with the highest **Q-Value**)
+     -**Target Network** to calculate the target **Q-Value** of taking that action at the next state. 
+     This approach reduce the **Q-Values** overstimation, it helps to train faster and have more stable learning.
+
+
+If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls)
+
+This glossary was made possible thanks to:
+
+- [Dario Paez](https://github.com/dario248)