mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-05-02 23:53:11 +08:00
55 lines
3.5 KiB
Plaintext
55 lines
3.5 KiB
Plaintext
# (Automatic) Curriculum Learning for RL
|
||
|
||
While most of the RL methods seen in this course work well in practice, there are some cases where using them alone fails. This can happen, for instance, when:
|
||
|
||
- the task to learn is hard and requires an **incremental acquisition of skills** (for instance when one wants to make a bipedal agent learn to go through hard obstacles, it must first learn to stand, then walk, then maybe jump…)
|
||
- there are variations in the environment (that affect the difficulty) and one wants its agent to be **robust** to them
|
||
|
||
<figure>
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/bipedal.gif" alt="Bipedal"/>
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/movable_creepers.gif" alt="Movable creepers"/>
|
||
<figcaption> <a href="https://developmentalsystems.org/TeachMyAgent/">TeachMyAgent</a> </figcaption>
|
||
</figure>
|
||
|
||
In such cases, it seems needed to propose different tasks to our RL agent and organize them such that the agent progressively acquires skills. This approach is called **Curriculum Learning** and usually implies a hand-designed curriculum (or set of tasks organized in a specific order). In practice, one can, for instance, control the generation of the environment, the initial states, or use Self-Play and control the level of opponents proposed to the RL agent.
|
||
|
||
As designing such a curriculum is not always trivial, the field of **Automatic Curriculum Learning (ACL) proposes to design approaches that learn to create such an organization of tasks in order to maximize the RL agent’s performances**. Portelas et al. proposed to define ACL as:
|
||
|
||
> … a family of mechanisms that automatically adapt the distribution of training data by learning to adjust the selection of learning situations to the capabilities of RL agents.
|
||
>
|
||
|
||
As an example, OpenAI used **Domain Randomization** (they applied random variations on the environment) to make a robot hand solve Rubik’s Cubes.
|
||
|
||
|
||
<figure>
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/dr.jpg" alt="Dr"/>
|
||
<figcaption> <a href="https://openai.com/blog/solving-rubiks-cube/">OpenAI - Solving Rubik’s Cube with a Robot Hand</a></figcaption>
|
||
</figure>
|
||
|
||
Finally, you can play with the robustness of agents trained in the <a href="https://huggingface.co/spaces/flowers-team/Interactive_DeepRL_Demo">TeachMyAgent</a> benchmark by controlling environment variations or even drawing the terrain 👇
|
||
|
||
<figure>
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/demo.png" alt="Demo"/>
|
||
<figcaption> <a href="https://huggingface.co/spaces/flowers-team/Interactive_DeepRL_Demo">https://huggingface.co/spaces/flowers-team/Interactive_DeepRL_Demo</a></figcaption>
|
||
</figure>
|
||
|
||
|
||
## Further reading
|
||
|
||
For more information, we recommend that you check out the following resources:
|
||
|
||
### Overview of the field
|
||
|
||
- [Automatic Curriculum Learning For Deep RL: A Short Survey](https://arxiv.org/pdf/2003.04664.pdf)
|
||
- [Curriculum for Reinforcement Learning](https://lilianweng.github.io/posts/2020-01-29-curriculum-rl/)
|
||
|
||
### Recent methods
|
||
|
||
- [Evolving Curricula with Regret-Based Environment Design](https://arxiv.org/abs/2203.01302)
|
||
- [Curriculum Reinforcement Learning via Constrained Optimal Transport](https://proceedings.mlr.press/v162/klink22a.html)
|
||
- [Prioritized Level Replay](https://arxiv.org/abs/2010.03934)
|
||
|
||
## Author
|
||
|
||
This section was written by <a href="https://twitter.com/ClementRomac"> Clément Romac </a>
|