From 37608158345f50ca2ef35c16b8bb044140277036 Mon Sep 17 00:00:00 2001 From: "Katz, Ilia (ik216a)" Date: Fri, 11 Aug 2023 19:24:20 +0300 Subject: [PATCH 1/2] Add MC and TD to Unit2 glossary --- units/en/unit2/glossary.mdx | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/units/en/unit2/glossary.mdx b/units/en/unit2/glossary.mdx index 879931e..5ea27ab 100644 --- a/units/en/unit2/glossary.mdx +++ b/units/en/unit2/glossary.mdx @@ -32,6 +32,12 @@ This is a community-created glossary. Contributions are welcomed! - **Off-policy algorithms:** A different policy is used at training time and inference time - **On-policy algorithms:** The same policy is used during training and inference +### Monte Carlo and Temporal Difference learning strategies + +- **Monte Carlo (MC):** Learning at the end of the episode. With Monte Carlo, we wait until the episode ends and then we update the value functin (or policy function) from a complete episode. + +- **Temporal Difference (TD):** Learning at each step. With Temporal Difference Learning, we update the value function (or policy function) at each step without requiring a complete episode. + If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls) This glossary was made possible thanks to: From bc9a54adcfbe1528c3df652f1330971be7ad1c39 Mon Sep 17 00:00:00 2001 From: "Katz, Ilia (ik216a)" Date: Tue, 15 Aug 2023 14:23:45 +0300 Subject: [PATCH 2/2] Fix typo in glossary.mdx --- units/en/unit2/glossary.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/units/en/unit2/glossary.mdx b/units/en/unit2/glossary.mdx index 5ea27ab..b44d40f 100644 --- a/units/en/unit2/glossary.mdx +++ b/units/en/unit2/glossary.mdx @@ -34,7 +34,7 @@ This is a community-created glossary. Contributions are welcomed! ### Monte Carlo and Temporal Difference learning strategies -- **Monte Carlo (MC):** Learning at the end of the episode. With Monte Carlo, we wait until the episode ends and then we update the value functin (or policy function) from a complete episode. +- **Monte Carlo (MC):** Learning at the end of the episode. With Monte Carlo, we wait until the episode ends and then we update the value function (or policy function) from a complete episode. - **Temporal Difference (TD):** Learning at each step. With Temporal Difference Learning, we update the value function (or policy function) at each step without requiring a complete episode.