mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-09 13:50:23 +08:00
Update two-types-value-based-methods.mdx
This commit is contained in:
@@ -36,7 +36,7 @@ Consequently, whatever method you use to solve your problem, **you will have a
|
||||
So the difference is:
|
||||
|
||||
- In policy-based, **the optimal policy (denoted π\*) is found by training the policy directly.**
|
||||
- In value-based, **finding an optimal value function (denoted Q\* or V\*, we'll study the difference after) in our leads to having an optimal policy.**
|
||||
- In value-based, **finding an optimal value function (denoted Q\* or V\*, we'll study the difference after) leads to having an optimal policy.**
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/link-value-policy.jpg" alt="Link between value and policy"/>
|
||||
|
||||
|
||||
Reference in New Issue
Block a user