Add MC and TD to Unit2 glossary

2026-06-17 07:27:26 +08:00 · 2023-08-11 19:24:20 +03:00
parent f1198dab15
commit 3760815834
1 changed files with 6 additions and 0 deletions
--- a/units/en/unit2/glossary.mdx
+++ b/units/en/unit2/glossary.mdx
@@ -32,6 +32,12 @@ This is a community-created glossary. Contributions are welcomed!
 - **Off-policy algorithms:** A different policy is used at training time and inference time
 - **On-policy algorithms:** The same policy is used during training and inference

+### Monte Carlo and Temporal Difference learning strategies
+
+- **Monte Carlo (MC):** Learning at the end of the episode. With Monte Carlo, we wait until the episode ends and then we update the value functin (or policy function) from a complete episode.
+
+- **Temporal Difference (TD):** Learning at each step. With Temporal Difference Learning, we update the value function (or policy function) at each step without requiring a complete episode.
+
 If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls)

 This glossary was made possible thanks to: