From 3c9dac69a3b742ca457dacb2121f06e44f2ea011 Mon Sep 17 00:00:00 2001
From: Diego Carpintero <6709785+dcarpintero@users.noreply.github.com>
Date: Wed, 5 Jul 2023 16:10:57 +0200
Subject: [PATCH] Add glossary to unit4

---
 units/en/_toctree.yml       |  2 ++
 units/en/unit4/glossary.mdx | 25 +++++++++++++++++++++++++
 2 files changed, 27 insertions(+)
 create mode 100644 units/en/unit4/glossary.mdx

diff --git a/units/en/_toctree.yml b/units/en/_toctree.yml
index 95a1b95..31351f4 100644
--- a/units/en/_toctree.yml
+++ b/units/en/_toctree.yml
@@ -122,6 +122,8 @@
     title: Diving deeper into policy-gradient
   - local: unit4/pg-theorem
     title: (Optional) the Policy Gradient Theorem
+  - local: unit4/glossary
+    title: Glossary
   - local: unit4/hands-on
     title: Hands-on
   - local: unit4/quiz
diff --git a/units/en/unit4/glossary.mdx b/units/en/unit4/glossary.mdx
new file mode 100644
index 0000000..e2ea67f
--- /dev/null
+++ b/units/en/unit4/glossary.mdx
@@ -0,0 +1,25 @@
+# Glossary 
+
+This is a community-created glossary. Contributions are welcome!
+
+- **Deep Q-Learning:** A value-based deep reinforcement learning algorithm that uses a deep neural network to approximate Q-values for actions in a given state. The goal of Deep Q-learning is to find the optimal policy that maximizes the expected cumulative reward by learning the action-values.
+
+- **Value-based methods:** Reinforcement Learning methods that estimate a value function as an intermediate step towards finding an optimal policy.
+
+- **Policy-based methods:** Reinforcement Learning methods that directly learn to approximate the optimal policy without learning a value function. In practice they output a probability distribution over actions. 
+
+    The benefits of using policy-gradient methods over value-based methods include: 
+    - simplicity of integration: no need to store action values;
+    - ability to learn a stochastic policy: the agent explores the state space without always taking the same trajectory, and avoids the problem of perceptual aliasing;
+    - effectiveness in high-dimensional and continuous action spaces; and
+    - improved convergence properties.
+
+- **Policy Gradient:** A subset of policy-based methods where the objective is to maximize the performance of a parameterized policy using gradient ascent. The goal of a policy-gradient is to control the probability distribution of actions by tuning the policy such that good actions (that maximize the return) are sampled more frequently in the future. 
+
+- **Monte Carlo Reinforce:** A policy-gradient algorithm that uses an estimated return from an entire episode to update the policy parameter.
+
+If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls)
+
+This glossary was made possible thanks to:
+
+- [Diego Carpintero](https://github.com/dcarpintero)
\ No newline at end of file