From 4b5599257cab91359e5bfe401a1cb7d2b84da879 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Mon, 13 Feb 2023 07:26:14 +0100 Subject: [PATCH] Apply suggestions from code review Co-authored-by: Omar Sanseviero --- units/en/unitbonus3/decision-transformers.mdx | 2 +- units/en/unitbonus3/envs-to-try.mdx | 6 +++--- units/en/unitbonus3/introduction.mdx | 4 ++-- units/en/unitbonus3/model-based.mdx | 8 ++++---- units/en/unitbonus3/offline-online.mdx | 6 +++--- units/en/unitbonus3/rl-documentation.mdx | 8 ++++---- 6 files changed, 17 insertions(+), 17 deletions(-) diff --git a/units/en/unitbonus3/decision-transformers.mdx b/units/en/unitbonus3/decision-transformers.mdx index 5ec9f96..a7e0d37 100644 --- a/units/en/unitbonus3/decision-transformers.mdx +++ b/units/en/unitbonus3/decision-transformers.mdx @@ -7,7 +7,7 @@ It’s an autoregressive model conditioned on the desired return, past states, a This is a complete shift in the Reinforcement Learning paradigm since we use generative trajectory modeling (modeling the joint distribution of the sequence of states, actions, and rewards) to replace conventional RL algorithms. It means that in Decision Transformers, we don’t maximize the return but rather generate a series of future actions that achieve the desired return. -And, at Hugging Face, we integrated the Decision Transformer, an Offline Reinforcement Learning method, into the 🤗 transformers library and the Hugging Face Hub. +The 🤗 Transformers team integrated the Decision Transformer, an Offline Reinforcement Learning method, into the library as well as the Hugging Face Hub. ## Learn about Decision Transformers diff --git a/units/en/unitbonus3/envs-to-try.mdx b/units/en/unitbonus3/envs-to-try.mdx index 9168136..da1a607 100644 --- a/units/en/unitbonus3/envs-to-try.mdx +++ b/units/en/unitbonus3/envs-to-try.mdx @@ -7,7 +7,7 @@ We provide here a list of interesting environments you can try to train your age MineRL -MineRL is a python library that provides a Gym interface for interacting with the video game Minecraft, accompanied by datasets of human gameplay. +MineRL is a Python library that provides a Gym interface for interacting with the video game Minecraft, accompanied by datasets of human gameplay. Every year, there are challenges with this library. Check the [website](https://minerl.io/) To start using this environment, check these resources: @@ -19,7 +19,7 @@ To start using this environment, check these resources: Donkey Car Donkey is a Self Driving Car Platform for hobby remote control cars. -This simulator version is built on the Unity game platform. It uses their internal physics and graphics, and connects to a donkey Python process to use our trained model to control the simulated Donkey (car). +This simulator version is built on the Unity game platform. It uses their internal physics and graphics and connects to a donkey Python process to use our trained model to control the simulated Donkey (car). To start using this environment, check these resources: @@ -38,7 +38,7 @@ To start using this environment, check these resources: Alphastar -Starcraft II is a famous *real time strategy game*. This game has been used by DeepMind for their Deep Reinforcement Learning researches with [Alphastar](https://www.deepmind.com/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii) +Starcraft II is a famous *real-time strategy game*. DeepMind has used this game for their Deep Reinforcement Learning research with [Alphastar](https://www.deepmind.com/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii) To start using this environment, check these resources: - [Starcraft gym](http://starcraftgym.com/) diff --git a/units/en/unitbonus3/introduction.mdx b/units/en/unitbonus3/introduction.mdx index 1b2bc19..930c4a1 100644 --- a/units/en/unitbonus3/introduction.mdx +++ b/units/en/unitbonus3/introduction.mdx @@ -3,7 +3,7 @@ Unit bonus 3 thumbnail -Congratulations on finishing this course! **You have now a solid background in Deep Reinforcement Learning**. -But this course was just a beginning for your Deep Reinforcement Learning journey, there are so much subsections to discover. And in this optional unit we **give you some resources to go deeper into multiple concepts and research topics in Reinforcement Learning**. +Congratulations on finishing this course! **You now have a solid background in Deep Reinforcement Learning**. +But this course was just the beginning of your Deep Reinforcement Learning journey, there are so many subsections to discover. In this optional unit, we **give you resources to explore multiple concepts and research topics in Reinforcement Learning**. Sounds fun? Let's get started 🔥, diff --git a/units/en/unitbonus3/model-based.mdx b/units/en/unitbonus3/model-based.mdx index c044736..a76ffe3 100644 --- a/units/en/unitbonus3/model-based.mdx +++ b/units/en/unitbonus3/model-based.mdx @@ -1,15 +1,15 @@ # Model Based Reinforcement Learning (MBRL) -Model-based reinforcement learning only differs from it’s model-free counterpart in the learning of a *dynamics model*, but that has substantial downstream effects on how the decisions are made. +Model-based reinforcement learning only differs from its model-free counterpart in learning a *dynamics model*, but that has substantial downstream effects on how the decisions are made. -The dynamics models most often model the environment transition dynamics, \\( s_{t+1} = f_\theta (s_t, a_t) \\), but things like inverse dynamics models (mapping from states to actions) or reward models (predicting rewards) can be used in this framework. +The dynamics models usually model the environment transition dynamics, \\( s_{t+1} = f_\theta (s_t, a_t) \\), but things like inverse dynamics models (mapping from states to actions) or reward models (predicting rewards) can be used in this framework. ## Simple definition - There is an agent that repeatedly tries to solve a problem, **accumulating state and action data**. -- With that data, the agent creates a structured learning tool *a dynamics model* to reason about the world. -- With the dynamics model, the agent **decides how to act by predicting into the future**. +- With that data, the agent creates a structured learning tool, *a dynamics model*, to reason about the world. +- With the dynamics model, the agent **decides how to act by predicting the future**. - With those actions, **the agent collects more data, improves said model, and hopefully improves future actions**. ## Academic definition diff --git a/units/en/unitbonus3/offline-online.mdx b/units/en/unitbonus3/offline-online.mdx index a2eff56..c087c38 100644 --- a/units/en/unitbonus3/offline-online.mdx +++ b/units/en/unitbonus3/offline-online.mdx @@ -11,9 +11,9 @@ Deep Reinforcement Learning agents **learn with batches of experience**. The que
A comparison between Reinforcement Learning in an Online and Offline setting, figure taken from this post
-- In *online reinforcement learning*, the agent **gathers data directly**: it collects a batch of experience by **interacting with the environment**. Then, it uses this experience immediately (or via some replay buffer) to learn from it (update its policy). +- In *online reinforcement learning*, which is what we've learned during this course, the agent **gathers data directly**: it collects a batch of experience by **interacting with the environment**. Then, it uses this experience immediately (or via some replay buffer) to learn from it (update its policy). -But this implies that either you **train your agent directly in the real world or have a simulator**. If you don’t have one, you need to build it, which can be very complex (how to reflect the complex reality of the real world in an environment?), expensive, and insecure since if the simulator has flaws, the agent will exploit them if they provide a competitive advantage. +But this implies that either you **train your agent directly in the real world or have a simulator**. If you don’t have one, you need to build it, which can be very complex (how to reflect the complex reality of the real world in an environment?), expensive, and insecure (if the simulator has flaws that may provide a competitive advantage, the agent will exploit them). - On the other hand, in *offline reinforcement learning*, the agent only **uses data collected from other agents or human demonstrations**. It does **not interact with the environment**. @@ -23,7 +23,7 @@ The process is as follows: This method has one drawback: the *counterfactual queries problem*. What do we do if our agent **decides to do something for which we don’t have the data?** For instance, turning right on an intersection but we don’t have this trajectory. -There’s already exists some solutions on this topic, but if you want to know more about offline reinforcement learning you can [watch this video](https://www.youtube.com/watch?v=k08N5a0gG0A) +There exist some solutions on this topic, but if you want to know more about offline reinforcement learning, you can [watch this video](https://www.youtube.com/watch?v=k08N5a0gG0A) ## Further reading diff --git a/units/en/unitbonus3/rl-documentation.mdx b/units/en/unitbonus3/rl-documentation.mdx index 7b6567c..30b7ada 100644 --- a/units/en/unitbonus3/rl-documentation.mdx +++ b/units/en/unitbonus3/rl-documentation.mdx @@ -1,6 +1,6 @@ # Brief introduction to RL documentation -In this advanced topic, we address the question: **how should we monitor and keep track of powerful reinforcement learning agents that we are training in the real-world and +In this advanced topic, we address the question: **how should we monitor and keep track of powerful reinforcement learning agents that we are training in the real world and interfacing with humans?** As machine learning systems have increasingly impacted modern life, **call for documentation of these systems has grown**. @@ -8,10 +8,10 @@ As machine learning systems have increasingly impacted modern life, **call for d Such documentation can cover aspects such as the training data used — where it is stored, when it was collected, who was involved, etc. — or the model optimization framework — the architecture, evaluation metrics, relevant papers, etc. — and more. -Today, model cards and datasheets are becoming increasingly available, in thanks to the Hub, +Today, model cards and datasheets are becoming increasingly available. For example, on the Hub (see documentation [here](https://huggingface.co/docs/hub/model-cards)). -If you click on a [popular model on the hub](https://huggingface.co/models), you can learn about its creation process. +If you click on a [popular model on the Hub](https://huggingface.co/models), you can learn about its creation process. These model and data specific logs are designed to be completed when the model or dataset are created, leaving them to go un-updated when these models are built into evolving systems in the future. ​ @@ -38,7 +38,7 @@ At a minimum, Reward Reports are an opportunity for RL practitioners to delibera The core piece specific to documentation designed for RL and feedback-driven ML systems is a *change-log*. The change-log updates information from the designer (changed training parameters, data, etc.) along with noticed changes from the user (harmful behavior, unexpected responses, etc.). -The change-log is accompanied by update triggers that encourage monitoring of these effects. +The change log is accompanied by update triggers that encourage monitoring these effects. ## Contributing