From 3c9dac69a3b742ca457dacb2121f06e44f2ea011 Mon Sep 17 00:00:00 2001 From: Diego Carpintero <6709785+dcarpintero@users.noreply.github.com> Date: Wed, 5 Jul 2023 16:10:57 +0200 Subject: [PATCH] Add glossary to unit4 --- units/en/_toctree.yml | 2 ++ units/en/unit4/glossary.mdx | 25 +++++++++++++++++++++++++ 2 files changed, 27 insertions(+) create mode 100644 units/en/unit4/glossary.mdx diff --git a/units/en/_toctree.yml b/units/en/_toctree.yml index 95a1b95..31351f4 100644 --- a/units/en/_toctree.yml +++ b/units/en/_toctree.yml @@ -122,6 +122,8 @@ title: Diving deeper into policy-gradient - local: unit4/pg-theorem title: (Optional) the Policy Gradient Theorem + - local: unit4/glossary + title: Glossary - local: unit4/hands-on title: Hands-on - local: unit4/quiz diff --git a/units/en/unit4/glossary.mdx b/units/en/unit4/glossary.mdx new file mode 100644 index 0000000..e2ea67f --- /dev/null +++ b/units/en/unit4/glossary.mdx @@ -0,0 +1,25 @@ +# Glossary + +This is a community-created glossary. Contributions are welcome! + +- **Deep Q-Learning:** A value-based deep reinforcement learning algorithm that uses a deep neural network to approximate Q-values for actions in a given state. The goal of Deep Q-learning is to find the optimal policy that maximizes the expected cumulative reward by learning the action-values. + +- **Value-based methods:** Reinforcement Learning methods that estimate a value function as an intermediate step towards finding an optimal policy. + +- **Policy-based methods:** Reinforcement Learning methods that directly learn to approximate the optimal policy without learning a value function. In practice they output a probability distribution over actions. + + The benefits of using policy-gradient methods over value-based methods include: + - simplicity of integration: no need to store action values; + - ability to learn a stochastic policy: the agent explores the state space without always taking the same trajectory, and avoids the problem of perceptual aliasing; + - effectiveness in high-dimensional and continuous action spaces; and + - improved convergence properties. + +- **Policy Gradient:** A subset of policy-based methods where the objective is to maximize the performance of a parameterized policy using gradient ascent. The goal of a policy-gradient is to control the probability distribution of actions by tuning the policy such that good actions (that maximize the return) are sampled more frequently in the future. + +- **Monte Carlo Reinforce:** A policy-gradient algorithm that uses an estimated return from an entire episode to update the policy parameter. + +If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls) + +This glossary was made possible thanks to: + +- [Diego Carpintero](https://github.com/dcarpintero) \ No newline at end of file