diff --git a/units/en/_toctree.yml b/units/en/_toctree.yml
index 2615a89..5483f9c 100644
--- a/units/en/_toctree.yml
+++ b/units/en/_toctree.yml
@@ -58,7 +58,7 @@
     title: Monte Carlo vs Temporal Difference Learning
   - local: unit2/mid-way-recap
     title: Mid-way Recap
-  - local: unit2/quiz1
+  - local: unit2/mid-way-quiz
     title: Mid-way Quiz
   - local: unit2/q-learning
     title: Introducing Q-Learning
@@ -69,7 +69,7 @@
   - local: unit2/hands-on
     title: Hands-on
   - local: unit2/quiz2
-    title: Second Quiz
+    title: Q-Learning Quiz
   - local: unit2/conclusion
     title: Conclusion
   - local: unit2/additional-readings
diff --git a/units/en/unit2/mc-vs-td.mdx b/units/en/unit2/mc-vs-td.mdx
index da47dc5..1d3517f 100644
--- a/units/en/unit2/mc-vs-td.mdx
+++ b/units/en/unit2/mc-vs-td.mdx
@@ -30,6 +30,8 @@ If we take an example:
 - We terminate the episode if the cat eats the mouse or if the mouse moves > 10 steps.
 
 - At the end of the episode, **we have a list of State, Actions, Rewards, and Next States tuples**
+For instance [[State tile 3 bottom, Go Left, +1, State tile 2 bottom], [State tile 2 bottom, Go Left, +0, State tile 1 bottom]...]
+
 - **The agent will sum the total rewards \\(G_t\\)** (to see how well it did).
 - It will then **update \\(V(s_t)\\) based on the formula**
 
diff --git a/units/en/unit2/quiz1.mdx b/units/en/unit2/mid-way-quiz.mdx
similarity index 99%
rename from units/en/unit2/quiz1.mdx
rename to units/en/unit2/mid-way-quiz.mdx
index 80bc321..b1ffe3a 100644
--- a/units/en/unit2/quiz1.mdx
+++ b/units/en/unit2/mid-way-quiz.mdx
@@ -1,4 +1,4 @@
-# Mid-way Quiz [[quiz1]]
+# Mid-way Quiz [[mid-way-quiz]]
 
 The best way to learn and [to avoid the illusion of competence](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf) **is to test yourself.** This will help you to find **where you need to reinforce your knowledge**.
 
diff --git a/units/en/unit2/summary1.mdx b/units/en/unit2/mid-way-recap.mdx
similarity index 97%
rename from units/en/unit2/summary1.mdx
rename to units/en/unit2/mid-way-recap.mdx
index 496c5aa..0bae566 100644
--- a/units/en/unit2/summary1.mdx
+++ b/units/en/unit2/mid-way-recap.mdx
@@ -1,4 +1,4 @@
-# Mid-way Recap [[summary1]]
+# Mid-way Recap [[mid-way-recap]]
 
 Before diving into Q-Learning, let's summarize what we just learned.
 
diff --git a/units/en/unit2/summary2.mdx b/units/en/unit2/q-learning-recap.mdx
similarity index 97%
rename from units/en/unit2/summary2.mdx
rename to units/en/unit2/q-learning-recap.mdx
index a5653ef..55c66bf 100644
--- a/units/en/unit2/summary2.mdx
+++ b/units/en/unit2/q-learning-recap.mdx
@@ -1,4 +1,4 @@
-# Q-Learning Recap [[summary2]]
+# Q-Learning Recap [[q-learning-recap]]
 
 
 The *Q-Learning* **is the RL algorithm that** :
diff --git a/units/en/unit2/q-learning.mdx b/units/en/unit2/q-learning.mdx
index 7a52cc4..52e744a 100644
--- a/units/en/unit2/q-learning.mdx
+++ b/units/en/unit2/q-learning.mdx
@@ -17,6 +17,7 @@ Q-Learning is an **off-policy value-based method that uses a TD approach to tra
 The **Q comes from "the Quality" (the value) of that action at that state.**
 
 Let's recap the difference between value and reward:
+
 - The *value of a state*, or a *state-action pair* is the expected cumulative reward our agent gets if it starts at this state (or state action pair) and then acts accordingly to its policy.
 - The *reward* is the **feedback I get from the environment** after performing an action at a state.
 
@@ -42,7 +43,7 @@ Therefore, Q-function contains a Q-table **that has the value of each-state act
 
 If we recap, *Q-Learning* **is the RL algorithm that:**
 
-- Trains a *Q-Function* (an **action-value function**), which internally is a *Q-table that contains all the state-action pair values.**
+- Trains a *Q-Function* (an **action-value function**), which internally is a **Q-table that contains all the state-action pair values.**
 - Given a state and action, our Q-Function **will search into its Q-table the corresponding value.**
 - When the training is done, **we have an optimal Q-function, which means we have optimal Q-Table.**
 - And if we **have an optimal Q-function**, we **have an optimal policy** since we **know for each state what is the best action to take.**