From 6ff09a4971fd0f91c8f2e9601e1cb2fc82061381 Mon Sep 17 00:00:00 2001 From: Dylan Wilson Date: Wed, 19 Apr 2023 12:27:29 -0500 Subject: [PATCH] Typos Bonus3 --- units/en/unitbonus3/curriculum-learning.mdx | 8 ++++---- units/en/unitbonus3/decision-transformers.mdx | 6 +++--- units/en/unitbonus3/envs-to-try.mdx | 4 ++-- units/en/unitbonus3/godotrl.mdx | 10 +++++----- units/en/unitbonus3/introduction.mdx | 2 +- units/en/unitbonus3/language-models.mdx | 10 +++++----- units/en/unitbonus3/model-based.mdx | 2 +- units/en/unitbonus3/rl-documentation.mdx | 6 +++--- units/en/unitbonus3/rlhf.mdx | 2 +- 9 files changed, 25 insertions(+), 25 deletions(-) diff --git a/units/en/unitbonus3/curriculum-learning.mdx b/units/en/unitbonus3/curriculum-learning.mdx index dbe8e64..fa26427 100644 --- a/units/en/unitbonus3/curriculum-learning.mdx +++ b/units/en/unitbonus3/curriculum-learning.mdx @@ -1,6 +1,6 @@ # (Automatic) Curriculum Learning for RL -While most of the RL methods seen in this course work well in practice, there are some cases where using them alone fails. It is for instance the case where: +While most of the RL methods seen in this course work well in practice, there are some cases where using them alone fails. This can happen, for instance, when: - the task to learn is hard and requires an **incremental acquisition of skills** (for instance when one wants to make a bipedal agent learn to go through hard obstacles, it must first learn to stand, then walk, then maybe jump…) - there are variations in the environment (that affect the difficulty) and one wants its agent to be **robust** to them @@ -11,9 +11,9 @@ While most of the RL methods seen in this course work well in practice, there ar
TeachMyAgent
-In such cases, it seems needed to propose different tasks to our RL agent and organize them such that it allows the agent to progressively acquire skills. This approach is called **Curriculum Learning** and usually implies a hand-designed curriculum (or set of tasks organized in a specific order). In practice, one can for instance control the generation of the environment, the initial states, or use Self-Play an control the level of opponents proposed to the RL agent. +In such cases, it seems needed to propose different tasks to our RL agent and organize them such that the agent progressively acquires skills. This approach is called **Curriculum Learning** and usually implies a hand-designed curriculum (or set of tasks organized in a specific order). In practice, one can, for instance, control the generation of the environment, the initial states, or use Self-Play and control the level of opponents proposed to the RL agent. -As designing such a curriculum is not always trivial, the field of **Automatic Curriculum Learning (ACL) proposes to design approaches that learn to create such and organization of tasks in order to maximize the RL agent’s performances**. Portelas et al. proposed to define ACL as: +As designing such a curriculum is not always trivial, the field of **Automatic Curriculum Learning (ACL) proposes to design approaches that learn to create such an organization of tasks in order to maximize the RL agent’s performances**. Portelas et al. proposed to define ACL as: > … a family of mechanisms that automatically adapt the distribution of training data by learning to adjust the selection of learning situations to the capabilities of RL agents. > @@ -36,7 +36,7 @@ Finally, you can play with the robustness of agents trained in the Language @@ -12,17 +12,17 @@ A natural question recently studied was could such knowledge benefit agents such ## LMs and RL -There is therefore a potential synergy between LMs which can bring knowledge about the world, and RL which can align and correct these knowledge by interacting with an environment. It is especially interesting from a RL point-of-view as the RL field mostly relies on the **Tabula-rasa** setup where everything is learned from scratch by agent leading to: +There is therefore a potential synergy between LMs which can bring knowledge about the world, and RL which can align and correct this knowledge by interacting with an environment. It is especially interesting from a RL point-of-view as the RL field mostly relies on the **Tabula-rasa** setup where everything is learned from scratch by the agent leading to: 1) Sample inefficiency 2) Unexpected behaviors from humans’ eyes -As a first attempt, the paper [“Grounding Large Language Models with Online Reinforcement Learning”](https://arxiv.org/abs/2302.02662v1) tackled the problem of **adapting or aligning a LM to a textual environment using PPO**. They showed that the knowledge encoded in the LM lead to a fast adaptation to the environment (opening avenue for sample efficiency RL agents) but also that such knowledge allowed the LM to better generalize to new tasks once aligned. +As a first attempt, the paper [“Grounding Large Language Models with Online Reinforcement Learning”](https://arxiv.org/abs/2302.02662v1) tackled the problem of **adapting or aligning a LM to a textual environment using PPO**. They showed that the knowledge encoded in the LM lead to a fast adaptation to the environment (opening avenues for sample efficient RL agents) but also that such knowledge allowed the LM to better generalize to new tasks once aligned.