Files
deep-rl-class/unit2/quiz2.md
Thomas Simonini 610b42e681 Update quiz2.md
2022-06-01 19:16:20 +02:00

3.1 KiB

Knowledge Check ✔️

The best way to learn and avoid the illusion of competence is to test yourself. This will help you to find where you need to reinforce your knowledge.

📝 Take a piece of paper and try to answer by writing, then check the solutions.

Q1: What is Q-Learning?

Solution

Q-Learning is the algorithm we use to train our Q-Function, an action-value function that determines the value of being at a particular state and taking a specific action at that state.

📖 If you don't remember, check 👉 https://huggingface.co/blog/deep-rl-q-part2#what-is-q-learning

Q2: What is a Q-Table?

Solution

Q-table is the "internal memory" of our agent where each cell corresponds to a state-action value pair value. Think of this Q-table as the memory or cheat sheet of our Q-function.

📖 If you don't remember, check 👉 https://huggingface.co/blog/deep-rl-q-part2#what-is-q-learning

Q3: Why if we have an optimal Q-function Q* we have an optimal policy?

Solution

Because if we have an optimal Q-function, we have an optimal policy since we know for each state what is the best action to take.

link value policy

📖 If you don't remember, check 👉 https://huggingface.co/blog/deep-rl-q-part2#what-is-q-learning

Q4: Can you explain what is Epsilon-Greedy Strategy?

Solution Epsilon Greedy Strategy is a **policy that handles the exploration/exploitation trade-off**.

The idea is that we define epsilon ɛ = 1.0:

  • With probability 1 — ɛ : we do exploitation (aka our agent selects the action with the highest state-action pair value).
  • With probability ɛ : we do exploration (trying random action).
Epsilon Greedy

📖 If you don't remember, check 👉 https://huggingface.co/blog/deep-rl-q-part2#the-q-learning-algorithm

Q5: How do we update the Q value of a state, action pair?

Q Update exercise
Solution Q Update exercise 📖 If you don't remember, check 👉 https://huggingface.co/blog/deep-rl-q-part2#the-q-learning-algorithm

Q6: What's the difference between on-policy and off-policy

Solution On/off policy 📖 If you don't remember, check 👉 https://huggingface.co/blog/deep-rl-q-part2#off-policy-vs-on-policy

Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read the chapter again to reinforce (😏) your knowledge.

Keep Learning, Stay Awesome