Merge pull request #282 from spolisar/patch-1

Corrected spelling and grammar in Unit 3 Glossary
2026-06-15 06:27:24 +08:00 · 2023-04-14 09:13:27 +02:00
parent b24a4a7962 9a04abd6c9
commit 7e60c46aba
1 changed files with 3 additions and 3 deletions
--- a/units/en/unit3/glossary.mdx
+++ b/units/en/unit3/glossary.mdx
@@ -17,19 +17,19 @@ In order to obtain temporal information, we need to **stack** a number of frames
  
 - **Solutions to stabilize Deep Q-Learning:**
  - **Experience Replay:** A replay memory is created to save experiences samples that can be reused during training. 
-  This allows the agent to learn from the same experiences multiple times. Also, it makes the agent avoid to forget previous experiences as it get new ones.
+  This allows the agent to learn from the same experiences multiple times. Also, it helps the agent avoid forgetting previous experiences as it gets new ones.
  - **Random sampling** from replay buffer allows to remove correlation in the observation sequences and prevents action values from oscillating or diverging
  catastrophically.

  - **Fixed Q-Target:** In order to calculate the **Q-Target** we need to estimate the discounted optimal **Q-value** of the next state by using Bellman equation. The problem
-  is that the same network weigths are used to calculate the **Q-Target** and the **Q-value**. This means that everytime we are modifying the **Q-value**, the **Q-Target** also moves with it.
+  is that the same network weights are used to calculate the **Q-Target** and the **Q-value**. This means that everytime we are modifying the **Q-value**, the **Q-Target** also moves with it.
  To avoid this issue, a separate network with fixed parameters is used for estimating the Temporal Difference Target. The target network is updated by copying parameters from
  our Deep Q-Network after certain **C steps**. 
  
  - **Double DQN:** Method to handle **overestimation** of **Q-Values**. This solution uses two networks to decouple the action selection from the target **Value generation**:
     - **DQN Network** to select the best action to take for the next state (the action with the highest **Q-Value**)
     - **Target Network** to calculate the target **Q-Value** of taking that action at the next state. 
-This approach reduce the **Q-Values** overestimation, it helps to train faster and have more stable learning.
+This approach reduces the **Q-Values** overestimation, it helps to train faster and have more stable learning.

 If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls)