From 575910d9705c7ee5ce75ebc5b8d5f032f7c12de7 Mon Sep 17 00:00:00 2001 From: simoninithomas Date: Mon, 20 Feb 2023 14:49:58 +0100 Subject: [PATCH] Add curriculum learning Clement Part --- units/en/_toctree.yml | 2 + units/en/unitbonus3/curriculum-learning.mdx | 50 +++++++++++++++++++++ 2 files changed, 52 insertions(+) create mode 100644 units/en/unitbonus3/curriculum-learning.mdx diff --git a/units/en/_toctree.yml b/units/en/_toctree.yml index cae3e8a..786e016 100644 --- a/units/en/_toctree.yml +++ b/units/en/_toctree.yml @@ -208,6 +208,8 @@ title: Decision Transformers and Offline RL - local: unitbonus3/language-models title: Language models in RL + - local: unitbonus3/curriculum-learning + title: (Automatic) Curriculum Learning for RL - local: unitbonus3/envs-to-try title: Interesting environments to try - local: unitbonus3/godotrl diff --git a/units/en/unitbonus3/curriculum-learning.mdx b/units/en/unitbonus3/curriculum-learning.mdx new file mode 100644 index 0000000..4cc49df --- /dev/null +++ b/units/en/unitbonus3/curriculum-learning.mdx @@ -0,0 +1,50 @@ +# (Automatic) Curriculum Learning for RL + +While most of the RL methods seen in this course work well in practice, there are some cases where using them alone fails. It is for instance the case where: + +- the task to learn is hard and requires an **incremental acquisition of skills** (for instance when one wants to make a bipedal agent learn to go through hard obstacles, it must first learn to stand, then walk, then maybe jump…) +- there are variations in the environment (that affect the difficulty) and one wants its agent to be **robust** to them + +
+Bipedal +Movable creepers +
TeachMyAgent
+
+ +In such cases, it seems needed to propose different tasks to our RL agent and organize them such that it allows the agent to progressively acquire skills. This approach is called **Curriculum Learning** and usually implies a hand-designed curriculum (or set of tasks organized in a specific order). In practice, one can for instance control the generation of the environment, the initial states, or use Self-Play an control the level of opponents proposed to the RL agent. + +As designing such a curriculum is not always trivial, the field of **Automatic Curriculum Learning (ACL) proposes to design approaches that learn to create such and organization of tasks in order to maximize the RL agent’s performances**. Portelas et al. proposed to define ACL as: + +> … a family of mechanisms that automatically adapt the distribution of training data by learning to adjust the selection of learning situations to the capabilities of RL agents. +> + +As an example, OpenAI used **Domain Randomization** (they applied random variations on the environment) to make a robot hand solve Rubik’s Cubes. + + +
+Dr +
OpenAI - Solving Rubik’s Cube with a Robot Hand
+
+ +Finally, you can play with the robustness of agents trained in the TeachMyAgent benchmark by controlling environment variations or even drawing the terrain 👇 + +
+Demo +
https://huggingface.co/spaces/flowers-team/Interactive_DeepRL_Demo
+
+ + +## Further reading + +For more information, we recommend you check out the following resources: + +### Overview of the field + +- [Automatic Curriculum Learning For Deep RL: A Short Survey](https://arxiv.org/pdf/2003.04664.pdf) +- [Curriculum for Reinforcement Learning](https://lilianweng.github.io/posts/2020-01-29-curriculum-rl/) + +### Recent methods + +- [Evolving Curricula with Regret-Based Environment Design](https://arxiv.org/abs/2203.01302) +- [Curriculum Reinforcement Learning via Constrained Optimal Transport](https://proceedings.mlr.press/v162/klink22a.html) +- [Prioritized Level Replay](https://arxiv.org/abs/2010.03934)