diff --git a/units/en/unitbonus3/curriculum-learning.mdx b/units/en/unitbonus3/curriculum-learning.mdx index 4cc49df..dbe8e64 100644 --- a/units/en/unitbonus3/curriculum-learning.mdx +++ b/units/en/unitbonus3/curriculum-learning.mdx @@ -48,3 +48,7 @@ For more information, we recommend you check out the following resources: - [Evolving Curricula with Regret-Based Environment Design](https://arxiv.org/abs/2203.01302) - [Curriculum Reinforcement Learning via Constrained Optimal Transport](https://proceedings.mlr.press/v162/klink22a.html) - [Prioritized Level Replay](https://arxiv.org/abs/2010.03934) + +## Author + +This section was written by Clément Romac diff --git a/units/en/unitbonus3/decision-transformers.mdx b/units/en/unitbonus3/decision-transformers.mdx index a7e0d37..737564e 100644 --- a/units/en/unitbonus3/decision-transformers.mdx +++ b/units/en/unitbonus3/decision-transformers.mdx @@ -25,3 +25,7 @@ For more information, we recommend you check out the following resources: - [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) - [Online Decision Transformer](https://arxiv.org/abs/2202.05607) + +## Author + +This section was written by Edward Beeching diff --git a/units/en/unitbonus3/envs-to-try.mdx b/units/en/unitbonus3/envs-to-try.mdx index da1a607..404e038 100644 --- a/units/en/unitbonus3/envs-to-try.mdx +++ b/units/en/unitbonus3/envs-to-try.mdx @@ -43,3 +43,7 @@ Starcraft II is a famous *real-time strategy game*. DeepMind has used this game To start using this environment, check these resources: - [Starcraft gym](http://starcraftgym.com/) - [A. I. Learns to Play Starcraft 2 (Reinforcement Learning) tutorial](https://www.youtube.com/watch?v=q59wap1ELQ4) + +## Author + +This section was written by Thomas Simonini diff --git a/units/en/unitbonus3/godotrl.mdx b/units/en/unitbonus3/godotrl.mdx index 07d8e66..8e993a3 100644 --- a/units/en/unitbonus3/godotrl.mdx +++ b/units/en/unitbonus3/godotrl.mdx @@ -202,3 +202,7 @@ Try setting this property up to 8 to speed up training. This can be a great bene ### There’s more! We have only scratched the surface of what can be achieved with Godot RL Agents, the library includes custom sensors and cameras to enrich the information available to the agent. Take a look at the [examples](https://github.com/edbeeching/godot_rl_agents_examples) to find out more! + +## Author + +This section was written by Edward Beeching diff --git a/units/en/unitbonus3/introduction.mdx b/units/en/unitbonus3/introduction.mdx index 930c4a1..50b4bd0 100644 --- a/units/en/unitbonus3/introduction.mdx +++ b/units/en/unitbonus3/introduction.mdx @@ -6,4 +6,6 @@ Congratulations on finishing this course! **You now have a solid background in Deep Reinforcement Learning**. But this course was just the beginning of your Deep Reinforcement Learning journey, there are so many subsections to discover. In this optional unit, we **give you resources to explore multiple concepts and research topics in Reinforcement Learning**. +Contrary to other units, this unit is a collective work of multiple people from Hugging Face. We mention the author for each unit. + Sounds fun? Let's get started 🔥, diff --git a/units/en/unitbonus3/language-models.mdx b/units/en/unitbonus3/language-models.mdx index 3194ec2..8a3daec 100644 --- a/units/en/unitbonus3/language-models.mdx +++ b/units/en/unitbonus3/language-models.mdx @@ -39,3 +39,7 @@ For more information we recommend you check out the following resources: - [Pre-Trained Language Models for Interactive Decision-Making](https://arxiv.org/abs/2202.01771) - [Grounding Large Language Models with Online Reinforcement Learning](https://arxiv.org/abs/2302.02662v1) - [Guiding Pretraining in Reinforcement Learning with Large Language Models](https://arxiv.org/abs/2302.06692) + +## Author + +This section was written by Clément Romac diff --git a/units/en/unitbonus3/model-based.mdx b/units/en/unitbonus3/model-based.mdx index a76ffe3..9983a01 100644 --- a/units/en/unitbonus3/model-based.mdx +++ b/units/en/unitbonus3/model-based.mdx @@ -26,3 +26,7 @@ For more information on MBRL, we recommend you check out the following resources - A [blog post on debugging MBRL](https://www.natolambert.com/writing/debugging-mbrl). - A [recent review paper on MBRL](https://arxiv.org/abs/2006.16712), + +## Author + +This section was written by Nathan Lambert diff --git a/units/en/unitbonus3/offline-online.mdx b/units/en/unitbonus3/offline-online.mdx index c087c38..be6fa37 100644 --- a/units/en/unitbonus3/offline-online.mdx +++ b/units/en/unitbonus3/offline-online.mdx @@ -31,3 +31,7 @@ For more information, we recommend you check out the following resources: - [Offline Reinforcement Learning, Talk by Sergei Levine](https://www.youtube.com/watch?v=qgZPZREor5I) - [Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems](https://arxiv.org/abs/2005.01643) + +## Author + +This section was written by Thomas Simonini diff --git a/units/en/unitbonus3/rl-documentation.mdx b/units/en/unitbonus3/rl-documentation.mdx index 30b7ada..dc4a661 100644 --- a/units/en/unitbonus3/rl-documentation.mdx +++ b/units/en/unitbonus3/rl-documentation.mdx @@ -50,3 +50,7 @@ record on [GitHub](https://github.com/RewardReports/reward-reports). ​ For further reading, you can visit the Reward Reports [paper](https://arxiv.org/abs/2204.10817) or look [an example report](https://github.com/RewardReports/reward-reports/tree/main/examples). + +## Author + +This section was written by Nathan Lambert diff --git a/units/en/unitbonus3/rlhf.mdx b/units/en/unitbonus3/rlhf.mdx index b09c76e..7c473d1 100644 --- a/units/en/unitbonus3/rlhf.mdx +++ b/units/en/unitbonus3/rlhf.mdx @@ -44,3 +44,7 @@ And here is a snapshot of the growing set of papers that show RLHF's performance - [Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned](https://arxiv.org/abs/2209.07858) (Ganguli et al. 2022): A detailed documentation of efforts to “discover, measure, and attempt to reduce [language models] potentially harmful outputs.” - [Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning](https://arxiv.org/abs/2208.02294) (Cohen at al. 2022): Using RL to enhance the conversational skill of an open-ended dialogue agent. - [Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization](https://arxiv.org/abs/2210.01241) (Ramamurthy and Ammanabrolu et al. 2022): Discusses the design space of open-source tools in RLHF and proposes a new algorithm NLPO (Natural Language Policy Optimization) as an alternative to PPO. + +## Author + +This section was written by Nathan Lambert