diff --git a/units/en/unit2/glossary.mdx b/units/en/unit2/glossary.mdx index a8365e9..c4b5dce 100644 --- a/units/en/unit2/glossary.mdx +++ b/units/en/unit2/glossary.mdx @@ -27,6 +27,10 @@ This is a community-created glossary. Contributions are welcomed! - Does not include any exploration. - Can be disadvantageous in environments with uncertainty or unknown optimal actions. +### Off-policy vs on-policy algorithms + +- **Off-policy algorithms:** A different policy is used at training time and inference time +- **On-policy algorithms:** The same policy is used during training and inference If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls) @@ -34,3 +38,4 @@ This glossary was made possible thanks to: - [Ramón Rueda](https://github.com/ramon-rd) - [Hasarindu Perera](https://github.com/hasarinduperera/) +- [Arkady Arkhangorodsky](https://github.com/arkadyark/)