Merge pull request #263 from arkadyark/policy-algorithms-unit-2

Add on and off policy algorithms to glossary
2026-06-10 06:06:49 +08:00 · 2023-04-05 09:50:30 +02:00
parent e5b72d5162 35c3818ed1
commit 8b74360a47
1 changed files with 5 additions and 0 deletions
--- a/units/en/unit2/glossary.mdx
+++ b/units/en/unit2/glossary.mdx
@@ -27,6 +27,10 @@ This is a community-created glossary. Contributions are welcomed!
 - Does not include any exploration.
 - Can be disadvantageous in environments with uncertainty or unknown optimal actions.

+### Off-policy vs on-policy algorithms
+
+- **Off-policy algorithms:** A different policy is used at training time and inference time
+- **On-policy algorithms:** The same policy is used during training and inference

 If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls)

@@ -34,3 +38,4 @@ This glossary was made possible thanks to:

 - [Ramón Rueda](https://github.com/ramon-rd)
 - [Hasarindu Perera](https://github.com/hasarinduperera/)
+- [Arkady Arkhangorodsky](https://github.com/arkadyark/)