Add curriculum learning Clement Part

This commit is contained in:
simoninithomas
2023-02-20 14:49:58 +01:00
parent 968a64331e
commit 575910d970
2 changed files with 52 additions and 0 deletions

View File

@@ -208,6 +208,8 @@
title: Decision Transformers and Offline RL
- local: unitbonus3/language-models
title: Language models in RL
- local: unitbonus3/curriculum-learning
title: (Automatic) Curriculum Learning for RL
- local: unitbonus3/envs-to-try
title: Interesting environments to try
- local: unitbonus3/godotrl

View File

@@ -0,0 +1,50 @@
# (Automatic) Curriculum Learning for RL
While most of the RL methods seen in this course work well in practice, there are some cases where using them alone fails. It is for instance the case where:
- the task to learn is hard and requires an **incremental acquisition of skills** (for instance when one wants to make a bipedal agent learn to go through hard obstacles, it must first learn to stand, then walk, then maybe jump…)
- there are variations in the environment (that affect the difficulty) and one wants its agent to be **robust** to them
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/bipedal.gif" alt="Bipedal"/>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/movable_creepers.gif" alt="Movable creepers"/>
<figcaption> <a href="https://developmentalsystems.org/TeachMyAgent/">TeachMyAgent</a> </figcaption>
</figure>
In such cases, it seems needed to propose different tasks to our RL agent and organize them such that it allows the agent to progressively acquire skills. This approach is called **Curriculum Learning** and usually implies a hand-designed curriculum (or set of tasks organized in a specific order). In practice, one can for instance control the generation of the environment, the initial states, or use Self-Play an control the level of opponents proposed to the RL agent.
As designing such a curriculum is not always trivial, the field of **Automatic Curriculum Learning (ACL) proposes to design approaches that learn to create such and organization of tasks in order to maximize the RL agents performances**. Portelas et al. proposed to define ACL as:
> … a family of mechanisms that automatically adapt the distribution of training data by learning to adjust the selection of learning situations to the capabilities of RL agents.
>
As an example, OpenAI used **Domain Randomization** (they applied random variations on the environment) to make a robot hand solve Rubiks Cubes.
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/dr.jpg" alt="Dr"/>
<figcaption> <a href="https://openai.com/blog/solving-rubiks-cube/">OpenAI - Solving Rubiks Cube with a Robot Hand</a></figcaption>
</figure>
Finally, you can play with the robustness of agents trained in the <a href="https://huggingface.co/spaces/flowers-team/Interactive_DeepRL_Demo">TeachMyAgent</a> benchmark by controlling environment variations or even drawing the terrain 👇
<figure>
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/demo.png" alt="Demo"/>
<figcaption> <a href="https://huggingface.co/spaces/flowers-team/Interactive_DeepRL_Demo">https://huggingface.co/spaces/flowers-team/Interactive_DeepRL_Demo</a></figcaption>
</figure>
## Further reading
For more information, we recommend you check out the following resources:
### Overview of the field
- [Automatic Curriculum Learning For Deep RL: A Short Survey](https://arxiv.org/pdf/2003.04664.pdf)
- [Curriculum for Reinforcement Learning](https://lilianweng.github.io/posts/2020-01-29-curriculum-rl/)
### Recent methods
- [Evolving Curricula with Regret-Based Environment Design](https://arxiv.org/abs/2203.01302)
- [Curriculum Reinforcement Learning via Constrained Optimal Transport](https://proceedings.mlr.press/v162/klink22a.html)
- [Prioritized Level Replay](https://arxiv.org/abs/2010.03934)