mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-06-28 00:36:23 +08:00
38 lines
2.4 KiB
Plaintext
38 lines
2.4 KiB
Plaintext
# The Exploration/Exploitation trade-off [[exp-exp-tradeoff]]
|
||
|
||
Finally, before looking at the different methods to solve Reinforcement Learning problems, we must cover one more very important topic: *the exploration/exploitation trade-off.*
|
||
|
||
- *Exploration* is exploring the environment by trying random actions in order to **find more information about the environment.**
|
||
- *Exploitation* is **exploiting known information to maximize the reward.**
|
||
|
||
Remember, the goal of our RL agent is to maximize the expected cumulative reward. However, **we can fall into a common trap**.
|
||
|
||
Let’s take an example:
|
||
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/exp_1.jpg" alt="Exploration" width="100%">
|
||
|
||
In this game, our mouse can have an **infinite amount of small cheese** (+1 each). But at the top of the maze, there is a gigantic sum of cheese (+1000).
|
||
|
||
However, if we only focus on exploitation, our agent will never reach the gigantic sum of cheese. Instead, it will only exploit **the nearest source of rewards,** even if this source is small (exploitation).
|
||
|
||
But if our agent does a little bit of exploration, it can **discover the big reward** (the pile of big cheese).
|
||
|
||
This is what we call the exploration/exploitation trade-off. We need to balance how much we **explore the environment** and how much we **exploit what we know about the environment.**
|
||
|
||
Therefore, we must **define a rule that helps to handle this trade-off**. We’ll see the different ways to handle it in the future units.
|
||
|
||
If it’s still confusing, **think of a real problem: the choice of picking a restaurant:**
|
||
|
||
|
||
<figure>
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/exp_2.jpg" alt="Exploration">
|
||
<figcaption>Source: <a href="https://inst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec15_6up.pdf"> Berkley AI Course</a>
|
||
</figcaption>
|
||
</figure>
|
||
|
||
- *Exploitation*: You go every day to the same one that you know is good and **take the risk to miss another better restaurant.**
|
||
- *Exploration*: Try restaurants you never went to before, with the risk of having a bad experience **but the probable opportunity of a fantastic experience.**
|
||
|
||
To recap:
|
||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/expexpltradeoff.jpg" alt="Exploration Exploitation Tradeoff" width="100%">
|