From 4ba43cf05fa4122da5de1b7d953f287b896d25b8 Mon Sep 17 00:00:00 2001
From: Thomas Simonini <simonini.thomas.pro@gmail.com>
Date: Thu, 8 Dec 2022 13:47:01 +0100
Subject: [PATCH] Update glossary.mdx

---
 units/en/unit1/glossary.mdx | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/units/en/unit1/glossary.mdx b/units/en/unit1/glossary.mdx
index 2871af5..1dc9164 100644
--- a/units/en/unit1/glossary.mdx
+++ b/units/en/unit1/glossary.mdx
@@ -3,41 +3,50 @@
 This is a community-created glossary. Contributions are welcomed!
 
 ### Markov Property
-It implies that the action taken by our agent is conditional solely on the present state and independent of the past states and actions.
+
+It implies that the action taken by our agent is **conditional solely on the present state and independent of the past states and actions**.
 
 ### Observations/State
+
 - **State**:  Complete description of the state of the world.
 - **Observation**: Partial description of the state of the environment/world.
 
 ### Actions
+
 - **Discrete Actions**: Finite number of actions, such as left, right, up, and down.
 - **Continuous Actions**: Infinite possibility of actions; for example, in the case of self-driving cars, the driving scenario has an infinite possibility of actions occurring.
 
 ### Rewards and Discounting
+
 - **Rewards**: Fundamental factor in RL. Tells the agent whether the action taken is good/bad.
 - RL algorithms are focused on maximizing the **cumulative reward**.
 - **Reward Hypothesis**: RL problems can be formulated as a maximisation of (cumulative) return.
 - **Discounting** is performed because rewards obtained at the start are more likely to happen as they are more predictable than long-term rewards.
 
 ### Tasks
+
 - **Episodic**: Has a starting point and an ending point.
 - **Continuous**: Has a starting point but no ending point.
 
 ### Exploration v/s Exploitation Trade-Off
+
 - **Exploration**: It's all about exploring the environment by trying random actions and receiving feedback/returns/rewards from the environment.
 - **Exploitation**: It's about exploiting what we know about the environment to gain maximum rewards.
 - **Exploration-Exploitation Trade-Off**: It balances how much we want to **explore** the environment and how much we want to **exploit** what we know about the environment.
 
 ### Policy
+
 - **Policy**: It is called the agent's brain. It tells us what action to take, given the state.
 - **Optimal Policy**: Policy that **maximizes** the **expected return** when an agent acts according to it. It is learned through *training*.
 
 ### Policy-based Methods:
+
 - An approach to solving RL problems.
 - In this method, the Policy is learned directly. 
 - Will map each state to the best corresponding action at that state. Or a probability distribution over the set of possible actions at that state.
 
 ### Value-based Methods:
+
 - Another approach to solving RL problems.
 - Here, instead of training a policy, we train a **value function** that maps each state to the expected value of being in that state.