diff --git a/units/en/unit2/glossary.mdx b/units/en/unit2/glossary.mdx index 879931e..b44d40f 100644 --- a/units/en/unit2/glossary.mdx +++ b/units/en/unit2/glossary.mdx @@ -32,6 +32,12 @@ This is a community-created glossary. Contributions are welcomed! - **Off-policy algorithms:** A different policy is used at training time and inference time - **On-policy algorithms:** The same policy is used during training and inference +### Monte Carlo and Temporal Difference learning strategies + +- **Monte Carlo (MC):** Learning at the end of the episode. With Monte Carlo, we wait until the episode ends and then we update the value function (or policy function) from a complete episode. + +- **Temporal Difference (TD):** Learning at each step. With Temporal Difference Learning, we update the value function (or policy function) at each step without requiring a complete episode. + If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls) This glossary was made possible thanks to: