mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-02-13 15:27:48 +08:00
Merge pull request #263 from arkadyark/policy-algorithms-unit-2
Add on and off policy algorithms to glossary
This commit is contained in:
@@ -27,6 +27,10 @@ This is a community-created glossary. Contributions are welcomed!
|
||||
- Does not include any exploration.
|
||||
- Can be disadvantageous in environments with uncertainty or unknown optimal actions.
|
||||
|
||||
### Off-policy vs on-policy algorithms
|
||||
|
||||
- **Off-policy algorithms:** A different policy is used at training time and inference time
|
||||
- **On-policy algorithms:** The same policy is used during training and inference
|
||||
|
||||
If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls)
|
||||
|
||||
@@ -34,3 +38,4 @@ This glossary was made possible thanks to:
|
||||
|
||||
- [Ramón Rueda](https://github.com/ramon-rd)
|
||||
- [Hasarindu Perera](https://github.com/hasarinduperera/)
|
||||
- [Arkady Arkhangorodsky](https://github.com/arkadyark/)
|
||||
|
||||
Reference in New Issue
Block a user