From 35c3818ed1d39e2b0020f7628f6b794799084d84 Mon Sep 17 00:00:00 2001 From: Arkady Arkhangorodsky Date: Tue, 28 Mar 2023 23:19:45 -0400 Subject: [PATCH] Add on and off policy algorithms to glossary --- units/en/unit2/glossary.mdx | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/units/en/unit2/glossary.mdx b/units/en/unit2/glossary.mdx index a8365e9..c4b5dce 100644 --- a/units/en/unit2/glossary.mdx +++ b/units/en/unit2/glossary.mdx @@ -27,6 +27,10 @@ This is a community-created glossary. Contributions are welcomed! - Does not include any exploration. - Can be disadvantageous in environments with uncertainty or unknown optimal actions. +### Off-policy vs on-policy algorithms + +- **Off-policy algorithms:** A different policy is used at training time and inference time +- **On-policy algorithms:** The same policy is used during training and inference If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls) @@ -34,3 +38,4 @@ This glossary was made possible thanks to: - [Ramón Rueda](https://github.com/ramon-rd) - [Hasarindu Perera](https://github.com/hasarinduperera/) +- [Arkady Arkhangorodsky](https://github.com/arkadyark/)