From eb55a21d0d0e79d43c3955c384178e2bbed3cf31 Mon Sep 17 00:00:00 2001
From: simoninithomas <simonini_thomas@outlook.fr>
Date: Tue, 7 Feb 2023 09:10:58 +0100
Subject: [PATCH] Add Nathan RL Documentation

---
 units/en/_toctree.yml                    |  6 ++-
 units/en/unitbonus3/rl-documentation.mdx | 52 ++++++++++++++++++++++++
 2 files changed, 56 insertions(+), 2 deletions(-)
 create mode 100644 units/en/unitbonus3/rl-documentation.mdx

diff --git a/units/en/_toctree.yml b/units/en/_toctree.yml
index 2a8b88b..0c085fc 100644
--- a/units/en/_toctree.yml
+++ b/units/en/_toctree.yml
@@ -191,9 +191,11 @@
   - local: unitbonus3/decision-transformers
     title: Decision Transformers and Offline RL
   - local: unitbonus3/language-models
-    title: Language models in RL
+    title: Interesting environments to try
   - local: unitbonus3/envs-to-try
-    title: Interesting Environments to try
+    title: Language models in RL
+  - local: unitbonus3/rl-documentation
+    title: Brief introduction to RL documentation
 - title: What's next? New Units Publishing Schedule
   sections:
   - local: communication/publishing-schedule
diff --git a/units/en/unitbonus3/rl-documentation.mdx b/units/en/unitbonus3/rl-documentation.mdx
new file mode 100644
index 0000000..7b6567c
--- /dev/null
+++ b/units/en/unitbonus3/rl-documentation.mdx
@@ -0,0 +1,52 @@
+# Brief introduction to RL documentation
+
+In this advanced topic, we address the question: **how should we monitor and keep track of powerful reinforcement learning agents that we are training in the real-world and
+interfacing with humans?**
+
+As machine learning systems have increasingly impacted modern life, **call for documentation of these systems has grown**.
+
+Such documentation can cover aspects such as the training data used — where it is stored, when it was collected, who was involved, etc.
+— or the model optimization framework — the architecture, evaluation metrics, relevant papers, etc. — and more.
+
+Today, model cards and datasheets are becoming increasingly available, in thanks to the Hub,
+(see documentation [here](https://huggingface.co/docs/hub/model-cards)).
+
+If you click on a [popular model on the hub](https://huggingface.co/models), you can learn about its creation process.
+
+These model and data specific logs are designed to be completed when the model or dataset are created, leaving them to go un-updated when these models are built into evolving systems in the future.
+​
+## Motivating Reward Reports
+
+Reinforcement learning systems are fundamentally designed to optimize based on measurements of reward and time.
+While the notion of a reward function can be mapped nicely to many well-understood fields of supervised learning (via a loss function),
+understanding how machine learning systems evolve over time is limited.
+
+To that end, the authors introduce [*Reward Reports for Reinforcement Learning*](https://www.notion.so/Brief-introduction-to-RL-documentation-b8cbda5a6f5242338e0756e6bef72af4) (the pithy naming is designed to mirror the popular papers *Model Cards for Model Reporting* and *Datasheets for Datasets*).
+The goal is to propose a type of documentation focused on the **human factors of reward** and **time-varying feedback systems**.
+
+Building on the documentation frameworks for [model cards](https://arxiv.org/abs/1810.03993) and [datasheets](https://arxiv.org/abs/1803.09010) proposed by Mitchell et al. and Gebru et al., we argue the need for Reward Reports for AI systems.
+
+**Reward Reports** are living documents for proposed RL deployments that demarcate design choices.
+
+However, many questions remain about the applicability of this framework to different RL applications, roadblocks to system interpretability,
+and the resonances between deployed supervised machine learning systems and the sequential decision-making utilized in RL.
+
+At a minimum, Reward Reports are an opportunity for RL practitioners to deliberate on these questions and begin the work of deciding how to resolve them in practice.
+​
+## Capturing temporal behavior with documentation
+
+The core piece specific to documentation designed for RL and feedback-driven ML systems is a *change-log*. The change-log updates information
+from the designer (changed training parameters, data, etc.) along with noticed changes from the user (harmful behavior, unexpected responses, etc.).
+
+The change-log is accompanied by update triggers that encourage monitoring of these effects.
+
+## Contributing
+
+Some of the most impactful RL-driven systems are multi-stakeholder in nature and behind closed doors of private corporations.
+These corporations are largely without regulation, so the burden of documentation falls on the public.
+
+If you are interested in contributing, we are building Reward Reports for popular machine learning systems on a public
+record on [GitHub](https://github.com/RewardReports/reward-reports).
+​
+For further reading, you can visit the Reward Reports [paper](https://arxiv.org/abs/2204.10817)
+or look [an example report](https://github.com/RewardReports/reward-reports/tree/main/examples).