Typos Bonus3

This commit is contained in:
Dylan Wilson
2023-04-19 12:27:29 -05:00
parent 4ecf4785b1
commit 6ff09a4971
9 changed files with 25 additions and 25 deletions

View File

@@ -1,6 +1,6 @@
# (Automatic) Curriculum Learning for RL
While most of the RL methods seen in this course work well in practice, there are some cases where using them alone fails. It is for instance the case where:
While most of the RL methods seen in this course work well in practice, there are some cases where using them alone fails. This can happen, for instance, when:
- the task to learn is hard and requires an **incremental acquisition of skills** (for instance when one wants to make a bipedal agent learn to go through hard obstacles, it must first learn to stand, then walk, then maybe jump…)
- there are variations in the environment (that affect the difficulty) and one wants its agent to be **robust** to them
@@ -11,9 +11,9 @@ While most of the RL methods seen in this course work well in practice, there ar
<figcaption> <a href="https://developmentalsystems.org/TeachMyAgent/">TeachMyAgent</a> </figcaption>
</figure>
In such cases, it seems needed to propose different tasks to our RL agent and organize them such that it allows the agent to progressively acquire skills. This approach is called **Curriculum Learning** and usually implies a hand-designed curriculum (or set of tasks organized in a specific order). In practice, one can for instance control the generation of the environment, the initial states, or use Self-Play an control the level of opponents proposed to the RL agent.
In such cases, it seems needed to propose different tasks to our RL agent and organize them such that the agent progressively acquires skills. This approach is called **Curriculum Learning** and usually implies a hand-designed curriculum (or set of tasks organized in a specific order). In practice, one can, for instance, control the generation of the environment, the initial states, or use Self-Play and control the level of opponents proposed to the RL agent.
As designing such a curriculum is not always trivial, the field of **Automatic Curriculum Learning (ACL) proposes to design approaches that learn to create such and organization of tasks in order to maximize the RL agents performances**. Portelas et al. proposed to define ACL as:
As designing such a curriculum is not always trivial, the field of **Automatic Curriculum Learning (ACL) proposes to design approaches that learn to create such an organization of tasks in order to maximize the RL agents performances**. Portelas et al. proposed to define ACL as:
> … a family of mechanisms that automatically adapt the distribution of training data by learning to adjust the selection of learning situations to the capabilities of RL agents.
>
@@ -36,7 +36,7 @@ Finally, you can play with the robustness of agents trained in the <a href="http
## Further reading
For more information, we recommend you check out the following resources:
For more information, we recommend that you check out the following resources:
### Overview of the field

View File

@@ -5,7 +5,7 @@ The Decision Transformer model was introduced by ["Decision Transformer: Reinfor
The main idea is that instead of training a policy using RL methods, such as fitting a value function, that will tell us what action to take to maximize the return (cumulative reward), **we use a sequence modeling algorithm (Transformer) that, given a desired return, past states, and actions, will generate future actions to achieve this desired return**.
Its an autoregressive model conditioned on the desired return, past states, and actions to generate future actions that achieve the desired return.
This is a complete shift in the Reinforcement Learning paradigm since we use generative trajectory modeling (modeling the joint distribution of the sequence of states, actions, and rewards) to replace conventional RL algorithms. It means that in Decision Transformers, we dont maximize the return but rather generate a series of future actions that achieve the desired return.
This is a complete shift in the Reinforcement Learning paradigm since we use generative trajectory modeling (modeling the joint distribution of the sequence of states, actions, and rewards) to replace conventional RL algorithms. This means that in Decision Transformers, we dont maximize the return but rather generate a series of future actions that achieve the desired return.
The 🤗 Transformers team integrated the Decision Transformer, an Offline Reinforcement Learning method, into the library as well as the Hugging Face Hub.
@@ -15,13 +15,13 @@ To learn more about Decision Transformers, you should read the blogpost we wrote
## Train your first Decision Transformers
Now that you understand how Decision Transformers work thanks to [Introducing Decision Transformers on Hugging Face](https://huggingface.co/blog/decision-transformers). Youre ready to learn to train your first Offline Decision Transformer model from scratch to make a half-cheetah run.
Now that you understand how Decision Transformers work thanks to [Introducing Decision Transformers on Hugging Face](https://huggingface.co/blog/decision-transformers), youre ready to learn to train your first Offline Decision Transformer model from scratch to make a half-cheetah run.
Start the tutorial here 👉 https://huggingface.co/blog/train-decision-transformers
## Further reading
For more information, we recommend you check out the following resources:
For more information, we recommend that you check out the following resources:
- [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345)
- [Online Decision Transformer](https://arxiv.org/abs/2202.05607)

View File

@@ -1,6 +1,6 @@
# Interesting Environments to try
We provide here a list of interesting environments you can try to train your agents on:
Here we provide a list of interesting environments you can try to train your agents on:
## MineRL
@@ -8,7 +8,7 @@ We provide here a list of interesting environments you can try to train your age
MineRL is a Python library that provides a Gym interface for interacting with the video game Minecraft, accompanied by datasets of human gameplay.
Every year, there are challenges with this library. Check the [website](https://minerl.io/)
Every year there are challenges with this library. Check the [website](https://minerl.io/)
To start using this environment, check these resources:
- [What is MineRL?](https://www.youtube.com/watch?v=z6PTrGifupU)

View File

@@ -1,6 +1,6 @@
# Godot RL Agents
[Godot RL Agents](https://github.com/edbeeching/godot_rl_agents) is an Open Source package that allows video game creators, AI researchers and hobbyists the opportunity **to learn complex behaviors for their Non Player Characters or agents**.
[Godot RL Agents](https://github.com/edbeeching/godot_rl_agents) is an Open Source package that allows video game creators, AI researchers, and hobbyists the opportunity **to learn complex behaviors for their Non Player Characters or agents**.
The library provides:
@@ -19,7 +19,7 @@ Installation of the library is simple: `pip install godot-rl`
In this section, you will **learn how to create a custom environment in the Godot Game Engine** and then implement an AI controller that learns to play with Deep Reinforcement Learning.
The example game we create today is simple, **but shows off many of the features of the Godot Engine and the Godot RL Agents library**.You can then dive into the examples for more complex environments and behaviors.
The example game we create today is simple, **but shows off many of the features of the Godot Engine and the Godot RL Agents library**. You can then dive into the examples for more complex environments and behaviors.
The environment we will be building today is called Ring Pong, the game of pong but the pitch is a ring and the paddle moves around the ring. The **objective is to keep the ball bouncing inside the ring**.
@@ -31,7 +31,7 @@ The [Godot game engine](https://godotengine.org/) is an open source tool for the
Godot Engine is a feature-packed, cross-platform game engine designed to create 2D and 3D games from a unified interface. It provides a comprehensive set of common tools, so users **can focus on making games without having to reinvent the wheel**. Games can be exported in one click to a number of platforms, including the major desktop platforms (Linux, macOS, Windows) as well as mobile (Android, iOS) and web-based (HTML5) platforms.
While we will guide you through the steps to implement your agent, you may wish to learn more about the Godot Game Engine. Their [documentation](https://docs.godotengine.org/en/latest/index.html) is thorough, there are many tutorials on YouTube we would also recommend [GDQuest](https://www.gdquest.com/), [KidsCanCode](https://kidscancode.org/godot_recipes/4.x/) and [Bramwell](https://www.youtube.com/channel/UCczi7Aq_dTKrQPF5ZV5J3gg) as sources of information.
While we will guide you through the steps to implement your agent, you may wish to learn more about the Godot Game Engine. Their [documentation](https://docs.godotengine.org/en/latest/index.html) is thorough, and there are many tutorials on YouTube we would also recommend [GDQuest](https://www.gdquest.com/), [KidsCanCode](https://kidscancode.org/godot_recipes/4.x/) and [Bramwell](https://www.youtube.com/channel/UCczi7Aq_dTKrQPF5ZV5J3gg) as sources of information.
In order to create games in Godot, **you must first download the editor**. Godot RL Agents supports the latest version of Godot, Godot 4.0.
@@ -125,7 +125,7 @@ func _process(delta):
pass
```
We will now implement the 4 missing methods, delete this code and replace it with the following:
We will now implement the 4 missing methods, delete this code, and replace it with the following:
```python
extends AIController3D
@@ -191,7 +191,7 @@ func _on_area_3d_body_entered(body):
ai_controller.reward += 1.0
```
We now need to synchronize between the game running in Godot and the neural network being trained in Python. Godot RL agents provides a node that does just that. Open the train.tscn scene, right click on the root node and click “Add child node”. Then, search for “sync” and add a Godot RL Agents Sync node. This node handles the communication between Python and Godot over TCP.
We now need to synchronize between the game running in Godot and the neural network being trained in Python. Godot RL agents provides a node that does just that. Open the train.tscn scene, right click on the root node, and click “Add child node”. Then, search for “sync” and add a Godot RL Agents Sync node. This node handles the communication between Python and Godot over TCP.
You can run training live in the the editor, by first launching the python training with `gdrl`

View File

@@ -8,4 +8,4 @@ But this course was just the beginning of your Deep Reinforcement Learning journ
Contrary to other units, this unit is a collective work of multiple people from Hugging Face. We mention the author for each unit.
Sounds fun? Let's get started 🔥,
Sound fun? Let's get started 🔥,

View File

@@ -1,9 +1,9 @@
# Language models in RL
## LMs encode useful knowledge for agents
**Language models** (LMs) can exhibit impressive abilities when manipulating text such as question-answering or even step-by-step reasoning. Additionally, their training on massive text corpora allowed them to **encode various knowledge including abstract ones about the physical rules of our world** (for instance what is possible to do with an object, what happens when one rotates an object…).
**Language models** (LMs) can exhibit impressive abilities when manipulating text such as question-answering or even step-by-step reasoning. Additionally, their training on massive text corpora allowed them to **encode various types of knowledge including abstract ones about the physical rules of our world** (for instance what is possible to do with an object, what happens when one rotates an object…).
A natural question recently studied was could such knowledge benefit agents such as robots when trying to solve everyday tasks. And while these works showed interesting results, the proposed agents lacked of any learning method. **This limitation prevents these agent from adapting to the environment (e.g. fixing wrong knowledge) or learning new skills.**
A natural question recently studied was whether such knowledge could benefit agents such as robots when trying to solve everyday tasks. And while these works showed interesting results, the proposed agents lacked any learning method. **This limitation prevents these agent from adapting to the environment (e.g. fixing wrong knowledge) or learning new skills.**
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit12/language.png" alt="Language">
@@ -12,17 +12,17 @@ A natural question recently studied was could such knowledge benefit agents such
## LMs and RL
There is therefore a potential synergy between LMs which can bring knowledge about the world, and RL which can align and correct these knowledge by interacting with an environment. It is especially interesting from a RL point-of-view as the RL field mostly relies on the **Tabula-rasa** setup where everything is learned from scratch by agent leading to:
There is therefore a potential synergy between LMs which can bring knowledge about the world, and RL which can align and correct this knowledge by interacting with an environment. It is especially interesting from a RL point-of-view as the RL field mostly relies on the **Tabula-rasa** setup where everything is learned from scratch by the agent leading to:
1) Sample inefficiency
2) Unexpected behaviors from humans eyes
As a first attempt, the paper [“Grounding Large Language Models with Online Reinforcement Learning”](https://arxiv.org/abs/2302.02662v1) tackled the problem of **adapting or aligning a LM to a textual environment using PPO**. They showed that the knowledge encoded in the LM lead to a fast adaptation to the environment (opening avenue for sample efficiency RL agents) but also that such knowledge allowed the LM to better generalize to new tasks once aligned.
As a first attempt, the paper [“Grounding Large Language Models with Online Reinforcement Learning”](https://arxiv.org/abs/2302.02662v1) tackled the problem of **adapting or aligning a LM to a textual environment using PPO**. They showed that the knowledge encoded in the LM lead to a fast adaptation to the environment (opening avenues for sample efficient RL agents) but also that such knowledge allowed the LM to better generalize to new tasks once aligned.
<video src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit12/papier_v4.mp4" type="video/mp4" controls />
Another direction studied in [“Guiding Pretraining in Reinforcement Learning with Large Language Models”](https://arxiv.org/abs/2302.06692) was to keep the LM frozen but leverage its knowledge to **guide an RL agents exploration**. Such method allows the RL agent to be guided towards human-meaningful and plausibly useful behaviors without requiring a human in the loop during training.
Another direction studied in [“Guiding Pretraining in Reinforcement Learning with Large Language Models”](https://arxiv.org/abs/2302.06692) was to keep the LM frozen but leverage its knowledge to **guide an RL agents exploration**. Such a method allows the RL agent to be guided towards human-meaningful and plausibly useful behaviors without requiring a human in the loop during training.
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit12/language2.png" alt="Language">

View File

@@ -2,7 +2,7 @@
Model-based reinforcement learning only differs from its model-free counterpart in learning a *dynamics model*, but that has substantial downstream effects on how the decisions are made.
The dynamics models usually model the environment transition dynamics, \\( s_{t+1} = f_\theta (s_t, a_t) \\), but things like inverse dynamics models (mapping from states to actions) or reward models (predicting rewards) can be used in this framework.
The dynamics model usually models the environment transition dynamics, \\( s_{t+1} = f_\theta (s_t, a_t) \\), but things like inverse dynamics models (mapping from states to actions) or reward models (predicting rewards) can be used in this framework.
## Simple definition

View File

@@ -3,7 +3,7 @@
In this advanced topic, we address the question: **how should we monitor and keep track of powerful reinforcement learning agents that we are training in the real world and
interfacing with humans?**
As machine learning systems have increasingly impacted modern life, **call for documentation of these systems has grown**.
As machine learning systems have increasingly impacted modern life, the **call for the documentation of these systems has grown**.
Such documentation can cover aspects such as the training data used — where it is stored, when it was collected, who was involved, etc.
— or the model optimization framework — the architecture, evaluation metrics, relevant papers, etc. — and more.
@@ -19,7 +19,7 @@ These model and data specific logs are designed to be completed when the model o
Reinforcement learning systems are fundamentally designed to optimize based on measurements of reward and time.
While the notion of a reward function can be mapped nicely to many well-understood fields of supervised learning (via a loss function),
understanding how machine learning systems evolve over time is limited.
understanding of how machine learning systems evolve over time is limited.
To that end, the authors introduce [*Reward Reports for Reinforcement Learning*](https://www.notion.so/Brief-introduction-to-RL-documentation-b8cbda5a6f5242338e0756e6bef72af4) (the pithy naming is designed to mirror the popular papers *Model Cards for Model Reporting* and *Datasheets for Datasets*).
The goal is to propose a type of documentation focused on the **human factors of reward** and **time-varying feedback systems**.
@@ -42,7 +42,7 @@ The change log is accompanied by update triggers that encourage monitoring these
## Contributing
Some of the most impactful RL-driven systems are multi-stakeholder in nature and behind closed doors of private corporations.
Some of the most impactful RL-driven systems are multi-stakeholder in nature and behind the closed doors of private corporations.
These corporations are largely without regulation, so the burden of documentation falls on the public.
If you are interested in contributing, we are building Reward Reports for popular machine learning systems on a public

View File

@@ -14,7 +14,7 @@ To start learning about RLHF:
1. Read this introduction: [Illustrating Reinforcement Learning from Human Feedback (RLHF)](https://huggingface.co/blog/rlhf).
2. Watch the recorded live we did some weeks ago, where Nathan covered the basics of Reinforcement Learning from Human Feedback (RLHF) and how this technology is being used to enable state-of-the-art ML tools like ChatGPT.
Most of the talk is an overview of the interconnected ML models. It covers the basics of Natural Language Processing and RL and how RLHF is used on large language models. We then conclude with the open question in RLHF.
Most of the talk is an overview of the interconnected ML models. It covers the basics of Natural Language Processing and RL and how RLHF is used on large language models. We then conclude with open questions in RLHF.
<Youtube id="2MBJOuVq380" />