mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-03-31 17:21:01 +08:00
25 lines
1.8 KiB
Plaintext
25 lines
1.8 KiB
Plaintext
# Glossary
|
|
|
|
This is a community-created glossary. Contributions are welcome!
|
|
|
|
- **Deep Q-Learning:** A value-based deep reinforcement learning algorithm that uses a deep neural network to approximate Q-values for actions in a given state. The goal of Deep Q-learning is to find the optimal policy that maximizes the expected cumulative reward by learning the action-values.
|
|
|
|
- **Value-based methods:** Reinforcement Learning methods that estimate a value function as an intermediate step towards finding an optimal policy.
|
|
|
|
- **Policy-based methods:** Reinforcement Learning methods that directly learn to approximate the optimal policy without learning a value function. In practice they output a probability distribution over actions.
|
|
|
|
The benefits of using policy-gradient methods over value-based methods include:
|
|
- simplicity of integration: no need to store action values;
|
|
- ability to learn a stochastic policy: the agent explores the state space without always taking the same trajectory, and avoids the problem of perceptual aliasing;
|
|
- effectiveness in high-dimensional and continuous action spaces; and
|
|
- improved convergence properties.
|
|
|
|
- **Policy Gradient:** A subset of policy-based methods where the objective is to maximize the performance of a parameterized policy using gradient ascent. The goal of a policy-gradient is to control the probability distribution of actions by tuning the policy such that good actions (that maximize the return) are sampled more frequently in the future.
|
|
|
|
- **Monte Carlo Reinforce:** A policy-gradient algorithm that uses an estimated return from an entire episode to update the policy parameter.
|
|
|
|
If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls)
|
|
|
|
This glossary was made possible thanks to:
|
|
|
|
- [Diego Carpintero](https://github.com/dcarpintero) |