From 81558a6a0e63b4c6c6c4c1fa52991160de680cc5 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Thu, 4 Aug 2022 21:34:24 +0200 Subject: [PATCH] Update README.md --- unit8/README.md | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/unit8/README.md b/unit8/README.md index 3c47b34..60ab065 100644 --- a/unit8/README.md +++ b/unit8/README.md @@ -1,15 +1,12 @@ -# Unit 8: Proximal Policy Optimization (PPO) using Robotics Simulations with PyBullet πŸ€– +# Unit 8: Proximal Policy Optimization (PPO) with PyTorch -One of the major industries that use Reinforcement Learning is robotics. Unfortunately, **having access to robot equipment is very expensive**. Fortunately, some simulations exist to train Robots: -1. PyBullet -2. MuJoco -3. Unity Simulations +Today we'll learn about Proximal Policy Optimization (PPO), an architecture that improves our agent's training stability by avoiding too large policy updates. To do that, we use a ratio that will indicates the difference between our current and old policy and clip this ratio from a specific range $[1 - \epsilon, 1 + \epsilon]$. Doing this will ensure that our policy update will not be too large and that the training is more stable. -We're going to learn about Advantage Actor Critic (A2C) and how to use PyBullet. And train a spider agent to walk. +And then, after the theory, we'll code a PPO architecture from scratch using PyTorch and bulletproof our implementation with CartPole-v1 and LunarLander-v2. πŸ† You'll then be able to **compare your agent’s results with other classmates thanks to a leaderboard** πŸ”₯ πŸ‘‰ https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard -![cover](https://github.com/huggingface/deep-rl-class/blob/main/unit7/assets/img/pybullet-envs.gif?raw=true) +LunarLander Let's get started πŸ₯³ @@ -23,9 +20,9 @@ Here are the steps for this Unit: 1️⃣ πŸ“– [Read Proximal Policy Optimization Chapter](https://huggingface.co/blog/deep-rl-ppo). -2️⃣ πŸ‘©β€πŸ’» Then dive on the hands-on where you'll train two robots to walk. +2️⃣ πŸ‘©β€πŸ’» Then dive on the hands-on: -The hands-on πŸ‘‰ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/deep-rl-class/blob/main/unit7/unit7.ipynb) +The hands-on πŸ‘‰ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/deep-rl-class/blob/main/unit8/unit8.ipynb) Thanks to a leaderboard, you'll be able to compare your results with other classmates and exchange the best practices to improve your agent's scores Who will win the challenge for Unit 7 πŸ†?