diff --git a/units/en/unit2/glossary.mdx b/units/en/unit2/glossary.mdx index dc055d1..d8b13ef 100644 --- a/units/en/unit2/glossary.mdx +++ b/units/en/unit2/glossary.mdx @@ -13,9 +13,22 @@ This is a community-created glossary. Contributions are welcomed! - **The state-value function.** For each state, the state-value function is the expected return if the agent starts in that state and follows the policy until the end. - **The action-value function.** In contrast to the state-value function, the action-value calculates for each state and action pair the expected return if the agent starts in that state and takes an action. Then it follows the policy forever after. +### Epsilon-greedy strategy: +- Common exploration strategy used in reinforcement learning that involves balancing exploration and exploitation. +- Chooses the action with the highest expected reward with a probability of 1-epsilon. +- Chooses a random action with a probability of epsilon. +- Epsilon is typically decreased over time to shift focus towards exploitation. + +### Greedy strategy: +- Involves always choosing the action that is expected to lead to the highest reward, based on the current knowledge of the environment. (only exploitation) +- Always chooses the action with the highest expected reward. +- Does not include any exploration. +- Can be disadvantageous in environments with uncertainty or unknown optimal actions. + If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls) This glossary was made possible thanks to: - [Ramón Rueda](https://github.com/ramon-rd) +- [Hasarindu Perera](https://github.com/hasarinduperera/)