Merge pull request #170 from HasarinduPerera/main

Update glossary.mdx [Unit 2]
2026-04-15 10:51:13 +08:00 · 2022-12-31 21:46:20 +01:00
parent b9856e2f54 815ae5ba13
commit bc9bb6c52f
1 changed files with 13 additions and 0 deletions
--- a/units/en/unit2/glossary.mdx
+++ b/units/en/unit2/glossary.mdx
@@ -13,9 +13,22 @@ This is a community-created glossary. Contributions are welcomed!
 - **The state-value function.** For each state, the state-value function is the expected return if the agent starts in that state and follows the policy until the end.
 - **The action-value function.** In contrast to the state-value function, the action-value calculates for each state and action pair the expected return if the agent starts in that state and takes an action. Then it follows the policy forever after.

+### Epsilon-greedy strategy:
+- Common exploration strategy used in reinforcement learning that involves balancing exploration and exploitation.
+- Chooses the action with the highest expected reward with a probability of 1-epsilon.
+- Chooses a random action with a probability of epsilon.
+- Epsilon is typically decreased over time to shift focus towards exploitation.
+
+### Greedy strategy:
+- Involves always choosing the action that is expected to lead to the highest reward, based on the current knowledge of the environment. (only exploitation)
+- Always chooses the action with the highest expected reward.
+- Does not include any exploration.
+- Can be disadvantageous in environments with uncertainty or unknown optimal actions.
+

 If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls)

 This glossary was made possible thanks to:

 - [Ramón Rueda](https://github.com/ramon-rd)
+- [Hasarindu Perera](https://github.com/hasarinduperera/)