From 7dd332d713b70e4af9201d230b17a25e803c153b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ram=C3=B3n=20Rueda?= Date: Thu, 15 Dec 2022 17:44:42 +0100 Subject: [PATCH 1/3] Create glossary.mdx --- units/en/unit2/glossary.mdx | 11 +++++++++++ 1 file changed, 11 insertions(+) create mode 100644 units/en/unit2/glossary.mdx diff --git a/units/en/unit2/glossary.mdx b/units/en/unit2/glossary.mdx new file mode 100644 index 0000000..fcdb3ee --- /dev/null +++ b/units/en/unit2/glossary.mdx @@ -0,0 +1,11 @@ +# Glossary + +### Strategies to find the optimal policy + +- **Policy-based methods.** The policy is usually trained with a neural network to select what action to take given a state. In this case is the neural network which outputs the action that the agent should take instead of using a value function. Depending on the experience received by the environment, the neural network will be re-adjusted and will provide better actions. +- **Value-based methods.** In this case, a value function is trained to output the value of a state or a state-action pair that will represent our policy. However, this value doesn't define what action the agent should take. In contrast, we need to specify the behavior of the agent given the output of the value function. For example, we could decide to adopt a policy to take the action that always leads to the biggest reward (Greedy Policy). In summary, the policy is a Greedy Policy (or whatever decision the user takes) that uses the values of the value-function to decide the actions to take. + +### Among the value-based methods, we can find two main strategies + +- **The state-value function.** For each state, the state-value function is the expected return if the agent starts in that state and follows the policy until the end. +- **The action-value function.** In contrast to the state-value function, the action-value calculates for each state and action pair the expected return if the agent starts in that state and takes an action. Then it follows the policy forever after. From c275b13ddf16c8800f50cd08f454b5f6a7b2fb84 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Tue, 20 Dec 2022 13:04:35 +0100 Subject: [PATCH 2/3] Update _toctree.yml --- units/en/_toctree.yml | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/units/en/_toctree.yml b/units/en/_toctree.yml index 9621222..a46425e 100644 --- a/units/en/_toctree.yml +++ b/units/en/_toctree.yml @@ -68,6 +68,8 @@ title: A Q-Learning example - local: unit2/q-learning-recap title: Q-Learning Recap + - local: unit2/glossary + title: Glossary - local: unit2/hands-on title: Hands-on - local: unit2/quiz2 @@ -76,7 +78,34 @@ title: Conclusion - local: unit2/additional-readings title: Additional Readings +- title: Unit 3. Deep Q-Learning with Atari Games + sections: + - local: unit3/introduction + title: Introduction + - local: unit3/from-q-to-dqn + title: From Q-Learning to Deep Q-Learning + - local: unit3/deep-q-network + title: The Deep Q-Network (DQN) + - local: unit3/deep-q-algorithm + title: The Deep Q Algorithm + - local: unit3/hands-on + title: Hands-on + - local: unit3/quiz + title: Quiz + - local: unit3/conclusion + title: Conclusion + - local: unit3/additional-readings + title: Additional Readings +- title: Unit Bonus 2. Automatic Hyperparameter Tuning with Optuna + sections: + - local: unitbonus2/introduction + title: Introduction + - local: unitbonus2/optuna + title: Optuna + - local: unitbonus2/hands-on + title: Hands-on - title: What's next? New Units Publishing Schedule sections: - local: communication/publishing-schedule title: Publishing Schedule + From a37804cebf9a1a31d6342621db5fdcaccb33f7e6 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Tue, 20 Dec 2022 13:06:31 +0100 Subject: [PATCH 3/3] Update glossary.mdx --- units/en/unit2/glossary.mdx | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/units/en/unit2/glossary.mdx b/units/en/unit2/glossary.mdx index fcdb3ee..dc055d1 100644 --- a/units/en/unit2/glossary.mdx +++ b/units/en/unit2/glossary.mdx @@ -1,4 +1,7 @@ -# Glossary +# Glossary [[glossary]] + +This is a community-created glossary. Contributions are welcomed! + ### Strategies to find the optimal policy @@ -9,3 +12,10 @@ - **The state-value function.** For each state, the state-value function is the expected return if the agent starts in that state and follows the policy until the end. - **The action-value function.** In contrast to the state-value function, the action-value calculates for each state and action pair the expected return if the agent starts in that state and takes an action. Then it follows the policy forever after. + + +If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls) + +This glossary was made possible thanks to: + +- [Ramón Rueda](https://github.com/ramon-rd)