mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-05 11:38:43 +08:00
Add authors
This commit is contained in:
@@ -48,3 +48,7 @@ For more information, we recommend you check out the following resources:
|
||||
- [Evolving Curricula with Regret-Based Environment Design](https://arxiv.org/abs/2203.01302)
|
||||
- [Curriculum Reinforcement Learning via Constrained Optimal Transport](https://proceedings.mlr.press/v162/klink22a.html)
|
||||
- [Prioritized Level Replay](https://arxiv.org/abs/2010.03934)
|
||||
|
||||
## Author
|
||||
|
||||
This section was written by <a href="https://twitter.com/ClementRomac"> Clément Romac </a>
|
||||
|
||||
@@ -25,3 +25,7 @@ For more information, we recommend you check out the following resources:
|
||||
|
||||
- [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345)
|
||||
- [Online Decision Transformer](https://arxiv.org/abs/2202.05607)
|
||||
|
||||
## Author
|
||||
|
||||
This section was written by <a href="https://twitter.com/edwardbeeching">Edward Beeching</a>
|
||||
|
||||
@@ -43,3 +43,7 @@ Starcraft II is a famous *real-time strategy game*. DeepMind has used this game
|
||||
To start using this environment, check these resources:
|
||||
- [Starcraft gym](http://starcraftgym.com/)
|
||||
- [A. I. Learns to Play Starcraft 2 (Reinforcement Learning) tutorial](https://www.youtube.com/watch?v=q59wap1ELQ4)
|
||||
|
||||
## Author
|
||||
|
||||
This section was written by <a href="https://twitter.com/ThomasSimonini"> Thomas Simonini</a>
|
||||
|
||||
@@ -202,3 +202,7 @@ Try setting this property up to 8 to speed up training. This can be a great bene
|
||||
### There’s more!
|
||||
|
||||
We have only scratched the surface of what can be achieved with Godot RL Agents, the library includes custom sensors and cameras to enrich the information available to the agent. Take a look at the [examples](https://github.com/edbeeching/godot_rl_agents_examples) to find out more!
|
||||
|
||||
## Author
|
||||
|
||||
This section was written by <a href="https://twitter.com/edwardbeeching">Edward Beeching</a>
|
||||
|
||||
@@ -6,4 +6,6 @@
|
||||
Congratulations on finishing this course! **You now have a solid background in Deep Reinforcement Learning**.
|
||||
But this course was just the beginning of your Deep Reinforcement Learning journey, there are so many subsections to discover. In this optional unit, we **give you resources to explore multiple concepts and research topics in Reinforcement Learning**.
|
||||
|
||||
Contrary to other units, this unit is a collective work of multiple people from Hugging Face. We mention the author for each unit.
|
||||
|
||||
Sounds fun? Let's get started 🔥,
|
||||
|
||||
@@ -39,3 +39,7 @@ For more information we recommend you check out the following resources:
|
||||
- [Pre-Trained Language Models for Interactive Decision-Making](https://arxiv.org/abs/2202.01771)
|
||||
- [Grounding Large Language Models with Online Reinforcement Learning](https://arxiv.org/abs/2302.02662v1)
|
||||
- [Guiding Pretraining in Reinforcement Learning with Large Language Models](https://arxiv.org/abs/2302.06692)
|
||||
|
||||
## Author
|
||||
|
||||
This section was written by <a href="https://twitter.com/ClementRomac"> Clément Romac </a>
|
||||
|
||||
@@ -26,3 +26,7 @@ For more information on MBRL, we recommend you check out the following resources
|
||||
|
||||
- A [blog post on debugging MBRL](https://www.natolambert.com/writing/debugging-mbrl).
|
||||
- A [recent review paper on MBRL](https://arxiv.org/abs/2006.16712),
|
||||
|
||||
## Author
|
||||
|
||||
This section was written by <a href="https://twitter.com/natolambert"> Nathan Lambert </a>
|
||||
|
||||
@@ -31,3 +31,7 @@ For more information, we recommend you check out the following resources:
|
||||
|
||||
- [Offline Reinforcement Learning, Talk by Sergei Levine](https://www.youtube.com/watch?v=qgZPZREor5I)
|
||||
- [Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems](https://arxiv.org/abs/2005.01643)
|
||||
|
||||
## Author
|
||||
|
||||
This section was written by <a href="https://twitter.com/ThomasSimonini"> Thomas Simonini</a>
|
||||
|
||||
@@ -50,3 +50,7 @@ record on [GitHub](https://github.com/RewardReports/reward-reports).
|
||||
|
||||
For further reading, you can visit the Reward Reports [paper](https://arxiv.org/abs/2204.10817)
|
||||
or look [an example report](https://github.com/RewardReports/reward-reports/tree/main/examples).
|
||||
|
||||
## Author
|
||||
|
||||
This section was written by <a href="https://twitter.com/natolambert"> Nathan Lambert </a>
|
||||
|
||||
@@ -44,3 +44,7 @@ And here is a snapshot of the growing set of papers that show RLHF's performance
|
||||
- [Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned](https://arxiv.org/abs/2209.07858) (Ganguli et al. 2022): A detailed documentation of efforts to “discover, measure, and attempt to reduce [language models] potentially harmful outputs.”
|
||||
- [Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning](https://arxiv.org/abs/2208.02294) (Cohen at al. 2022): Using RL to enhance the conversational skill of an open-ended dialogue agent.
|
||||
- [Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization](https://arxiv.org/abs/2210.01241) (Ramamurthy and Ammanabrolu et al. 2022): Discusses the design space of open-source tools in RLHF and proposes a new algorithm NLPO (Natural Language Policy Optimization) as an alternative to PPO.
|
||||
|
||||
## Author
|
||||
|
||||
This section was written by <a href="https://twitter.com/natolambert"> Nathan Lambert </a>
|
||||
|
||||
Reference in New Issue
Block a user