mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-05 03:28:05 +08:00
Add Nathan RL Documentation
This commit is contained in:
@@ -191,9 +191,11 @@
|
||||
- local: unitbonus3/decision-transformers
|
||||
title: Decision Transformers and Offline RL
|
||||
- local: unitbonus3/language-models
|
||||
title: Language models in RL
|
||||
title: Interesting environments to try
|
||||
- local: unitbonus3/envs-to-try
|
||||
title: Interesting Environments to try
|
||||
title: Language models in RL
|
||||
- local: unitbonus3/rl-documentation
|
||||
title: Brief introduction to RL documentation
|
||||
- title: What's next? New Units Publishing Schedule
|
||||
sections:
|
||||
- local: communication/publishing-schedule
|
||||
|
||||
52
units/en/unitbonus3/rl-documentation.mdx
Normal file
52
units/en/unitbonus3/rl-documentation.mdx
Normal file
@@ -0,0 +1,52 @@
|
||||
# Brief introduction to RL documentation
|
||||
|
||||
In this advanced topic, we address the question: **how should we monitor and keep track of powerful reinforcement learning agents that we are training in the real-world and
|
||||
interfacing with humans?**
|
||||
|
||||
As machine learning systems have increasingly impacted modern life, **call for documentation of these systems has grown**.
|
||||
|
||||
Such documentation can cover aspects such as the training data used — where it is stored, when it was collected, who was involved, etc.
|
||||
— or the model optimization framework — the architecture, evaluation metrics, relevant papers, etc. — and more.
|
||||
|
||||
Today, model cards and datasheets are becoming increasingly available, in thanks to the Hub,
|
||||
(see documentation [here](https://huggingface.co/docs/hub/model-cards)).
|
||||
|
||||
If you click on a [popular model on the hub](https://huggingface.co/models), you can learn about its creation process.
|
||||
|
||||
These model and data specific logs are designed to be completed when the model or dataset are created, leaving them to go un-updated when these models are built into evolving systems in the future.
|
||||
|
||||
## Motivating Reward Reports
|
||||
|
||||
Reinforcement learning systems are fundamentally designed to optimize based on measurements of reward and time.
|
||||
While the notion of a reward function can be mapped nicely to many well-understood fields of supervised learning (via a loss function),
|
||||
understanding how machine learning systems evolve over time is limited.
|
||||
|
||||
To that end, the authors introduce [*Reward Reports for Reinforcement Learning*](https://www.notion.so/Brief-introduction-to-RL-documentation-b8cbda5a6f5242338e0756e6bef72af4) (the pithy naming is designed to mirror the popular papers *Model Cards for Model Reporting* and *Datasheets for Datasets*).
|
||||
The goal is to propose a type of documentation focused on the **human factors of reward** and **time-varying feedback systems**.
|
||||
|
||||
Building on the documentation frameworks for [model cards](https://arxiv.org/abs/1810.03993) and [datasheets](https://arxiv.org/abs/1803.09010) proposed by Mitchell et al. and Gebru et al., we argue the need for Reward Reports for AI systems.
|
||||
|
||||
**Reward Reports** are living documents for proposed RL deployments that demarcate design choices.
|
||||
|
||||
However, many questions remain about the applicability of this framework to different RL applications, roadblocks to system interpretability,
|
||||
and the resonances between deployed supervised machine learning systems and the sequential decision-making utilized in RL.
|
||||
|
||||
At a minimum, Reward Reports are an opportunity for RL practitioners to deliberate on these questions and begin the work of deciding how to resolve them in practice.
|
||||
|
||||
## Capturing temporal behavior with documentation
|
||||
|
||||
The core piece specific to documentation designed for RL and feedback-driven ML systems is a *change-log*. The change-log updates information
|
||||
from the designer (changed training parameters, data, etc.) along with noticed changes from the user (harmful behavior, unexpected responses, etc.).
|
||||
|
||||
The change-log is accompanied by update triggers that encourage monitoring of these effects.
|
||||
|
||||
## Contributing
|
||||
|
||||
Some of the most impactful RL-driven systems are multi-stakeholder in nature and behind closed doors of private corporations.
|
||||
These corporations are largely without regulation, so the burden of documentation falls on the public.
|
||||
|
||||
If you are interested in contributing, we are building Reward Reports for popular machine learning systems on a public
|
||||
record on [GitHub](https://github.com/RewardReports/reward-reports).
|
||||
|
||||
For further reading, you can visit the Reward Reports [paper](https://arxiv.org/abs/2204.10817)
|
||||
or look [an example report](https://github.com/RewardReports/reward-reports/tree/main/examples).
|
||||
Reference in New Issue
Block a user