Merge pull request #384 from mrvincenzo/updates/AddMcAndTdToGlossary

Add MC and TD to Unit2 glossary
This commit is contained in:
Thomas Simonini
2023-08-17 10:00:27 +02:00
committed by GitHub

View File

@@ -32,6 +32,12 @@ This is a community-created glossary. Contributions are welcomed!
- **Off-policy algorithms:** A different policy is used at training time and inference time
- **On-policy algorithms:** The same policy is used during training and inference
### Monte Carlo and Temporal Difference learning strategies
- **Monte Carlo (MC):** Learning at the end of the episode. With Monte Carlo, we wait until the episode ends and then we update the value function (or policy function) from a complete episode.
- **Temporal Difference (TD):** Learning at each step. With Temporal Difference Learning, we update the value function (or policy function) at each step without requiring a complete episode.
If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls)
This glossary was made possible thanks to: