diff --git a/units/en/_toctree.yml b/units/en/_toctree.yml
index 9994167..a8c93b8 100644
--- a/units/en/_toctree.yml
+++ b/units/en/_toctree.yml
@@ -178,6 +178,22 @@
title: Conclusion
- local: unit7/additional-readings
title: Additional Readings
+- title: Unit 8. Part 1 Proximal Policy Optimization (PPO)
+ sections:
+ - local: unit8/introduction
+ title: Introduction
+ - local: unit8/intuition-behind-ppo
+ title: The intuition behind PPO
+ - local: unit8/clipped-surrogate-objective
+ title: Introducing the Clipped Surrogate Objective Function
+ - local: unit8/visualize
+ title: Visualize the Clipped Surrogate Objective Function
+ - local: unit8/hands-on-cleanrl
+ title: PPO with CleanRL
+ - local: unit8/conclusion
+ title: Conclusion
+ - local: unit8/additional-readings
+ title: Additional Readings
- title: What's next? New Units Publishing Schedule
sections:
- local: communication/publishing-schedule
diff --git a/units/en/unit8/additional-readings.mdx b/units/en/unit8/additional-readings.mdx
new file mode 100644
index 0000000..89196f9
--- /dev/null
+++ b/units/en/unit8/additional-readings.mdx
@@ -0,0 +1,21 @@
+# Additional Readings [[additional-readings]]
+
+These are **optional readings** if you want to go deeper.
+
+## PPO Explained
+
+- [Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization by Daniel Bick](https://fse.studenttheses.ub.rug.nl/25709/1/mAI_2021_BickD.pdf)
+- [What is the way to understand Proximal Policy Optimization Algorithm in RL?](https://stackoverflow.com/questions/46422845/what-is-the-way-to-understand-proximal-policy-optimization-algorithm-in-rl)
+- [Foundations of Deep RL Series, L4 TRPO and PPO by Pieter Abbeel](https://youtu.be/KjWF8VIMGiY)
+- [OpenAI PPO Blogpost](https://openai.com/blog/openai-baselines-ppo/)
+- [Spinning Up RL PPO](https://spinningup.openai.com/en/latest/algorithms/ppo.html)
+- [Paper Proximal Policy Optimization Algorithms](https://arxiv.org/abs/1707.06347)
+
+## PPO Implementation details
+
+- [The 37 Implementation Details of Proximal Policy Optimization](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/)
+- [Part 1 of 3 — Proximal Policy Optimization Implementation: 11 Core Implementation Details](https://www.youtube.com/watch?v=MEt6rrxH8W4)
+
+## Importance Sampling
+
+- [Importance Sampling Explained](https://youtu.be/C3p2wI4RAi8)
diff --git a/units/en/unit8/clipped-surrogate-objective.mdx b/units/en/unit8/clipped-surrogate-objective.mdx
new file mode 100644
index 0000000..9319b3e
--- /dev/null
+++ b/units/en/unit8/clipped-surrogate-objective.mdx
@@ -0,0 +1,69 @@
+# Introducing the Clipped Surrogate Objective Function
+## Recap: The Policy Objective Function
+
+Let’s remember what is the objective to optimize in Reinforce:
+
+
+The idea was that by taking a gradient ascent step on this function (equivalent to taking gradient descent of the negative of this function), we would **push our agent to take actions that lead to higher rewards and avoid harmful actions.**
+
+However, the problem comes from the step size:
+- Too small, **the training process was too slow**
+- Too high, **there was too much variability in the training**
+
+Here with PPO, the idea is to constrain our policy update with a new objective function called the *Clipped surrogate objective function* that **will constrain the policy change in a small range using a clip.**
+
+This new function **is designed to avoid destructive large weights updates** :
+
+
+
+Let’s study each part to understand how it works.
+
+## The Ratio Function
+
+
+This ratio is calculated this way:
+
+
+
+It’s the probability of taking action \\( a_t \\) at state \\( s_t \\) in the current policy divided by the previous one.
+
+As we can see, \\( r_t(\theta) \\) denotes the probability ratio between the current and old policy:
+
+- If \\( r_t(\theta) > 1 \\), the **action \\( a_t \\) at state \\( s_t \\) is more likely in the current policy than the old policy.**
+- If \\( r_t(\theta) \\) is between 0 and 1, the **action is less likely for the current policy than for the old one**.
+
+So this probability ratio is an **easy way to estimate the divergence between old and current policy.**
+
+## The unclipped part of the Clipped Surrogate Objective function
+
+
+This ratio **can replace the log probability we use in the policy objective function**. This gives us the left part of the new objective function: multiplying the ratio by the advantage.
+
+
+
+Consequently, we need to constrain this objective function by penalizing changes that lead to a ratio away from 1 (in the paper, the ratio can only vary from 0.8 to 1.2).
+
+**By clipping the ratio, we ensure that we do not have a too large policy update because the current policy can't be too different from the older one.**
+
+To do that, we have two solutions:
+
+- *TRPO (Trust Region Policy Optimization)* uses KL divergence constraints outside the objective function to constrain the policy update. But this method **is complicated to implement and takes more computation time.**
+- *PPO* clip probability ratio directly in the objective function with its **Clipped surrogate objective function.**
+
+
+
+This clipped part is a version where rt(theta) is clipped between \\( [1 - \epsilon, 1 + \epsilon] \\).
+
+With the Clipped Surrogate Objective function, we have two probability ratios, one non-clipped and one clipped in a range (between \\( [1 - \epsilon, 1 + \epsilon] \\), epsilon is a hyperparameter that helps us to define this clip range (in the paper \\( \epsilon = 0.2 \\).).
+
+Then, we take the minimum of the clipped and non-clipped objective, **so the final objective is a lower bound (pessimistic bound) of the unclipped objective.**
+
+Taking the minimum of the clipped and non-clipped objective means **we'll select either the clipped or the non-clipped objective based on the ratio and advantage situation**.
diff --git a/units/en/unit8/conclusion.mdx b/units/en/unit8/conclusion.mdx
new file mode 100644
index 0000000..7dc56e6
--- /dev/null
+++ b/units/en/unit8/conclusion.mdx
@@ -0,0 +1,9 @@
+# Conclusion [[Conclusion]]
+
+That’s all for today. Congrats on finishing this unit and the tutorial!
+
+The best way to learn is to practice and try stuff. **Why not improving the implementation to handle frames as input?**.
+
+See you on second part of this Unit 🔥,
+
+## Keep Learning, Stay awesome 🤗
diff --git a/units/en/unit8/hands-on-cleanrl.mdx b/units/en/unit8/hands-on-cleanrl.mdx
new file mode 100644
index 0000000..d23b907
--- /dev/null
+++ b/units/en/unit8/hands-on-cleanrl.mdx
@@ -0,0 +1,32 @@
+# Hands-on
+
+Now that we studied the theory behind PPO, the best way to understand how it works **is to implement it from scratch.**
+
+Implementing an architecture from scratch is the best way to understand it, and it's a good habit. We have already done it for a value-based method with Q-Learning and a Policy-based method with Reinforce.
+
+So, to be able to code it, we're going to use two resources:
+- A tutorial made by [Costa Huang](https://github.com/vwxyzjn). Costa is behind [CleanRL](https://github.com/vwxyzjn/cleanrl), a Deep Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features.
+- In addition to the tutorial, to go deeper, you can read the 13 core implementation details: [https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/)
+
+Then, to test its robustness, we're going to train it in 2 different classical environments:
+
+- [Cartpole-v1](https://www.gymlibrary.ml/environments/classic_control/cart_pole/?highlight=cartpole)
+- [LunarLander-v2](https://www.gymlibrary.ml/environments/box2d/lunar_lander/)
+
+
+
+
+
+
+
+
+That was quite complex. Take time to understand these situations by looking at the table and the graph. **You must understand why this makes sense.** If you want to go deeper, the best resource is the article [Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization" by Daniel Bick, especially part 3.4](https://fse.studenttheses.ub.rug.nl/25709/1/mAI_2021_BickD.pdf).