mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-13 18:00:45 +08:00
Update glossary.mdx
This commit is contained in:
@@ -1,4 +1,7 @@
|
||||
# Glossary
|
||||
# Glossary [[glossary]]
|
||||
|
||||
This is a community-created glossary. Contributions are welcomed!
|
||||
|
||||
|
||||
### Strategies to find the optimal policy
|
||||
|
||||
@@ -9,3 +12,10 @@
|
||||
|
||||
- **The state-value function.** For each state, the state-value function is the expected return if the agent starts in that state and follows the policy until the end.
|
||||
- **The action-value function.** In contrast to the state-value function, the action-value calculates for each state and action pair the expected return if the agent starts in that state and takes an action. Then it follows the policy forever after.
|
||||
|
||||
|
||||
If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls)
|
||||
|
||||
This glossary was made possible thanks to:
|
||||
|
||||
- [Ramón Rueda](https://github.com/ramon-rd)
|
||||
|
||||
Reference in New Issue
Block a user