Cases consistency

This commit is contained in:
Artagon
2022-12-17 22:23:08 +01:00
parent a7d74befb0
commit 96714cdb10
3 changed files with 11 additions and 11 deletions

View File

@@ -25,11 +25,11 @@ The reward function goes like this:
To train our agent to have an optimal policy (so a policy that goes right, right, down), **we will use the Q-Learning algorithm**.
## Step 1: We initialize the Q-Table [[step1]]
## Step 1: We initialize the Q-table [[step1]]
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/Example-1.jpg" alt="Maze-Example"/>
So, for now, **our Q-Table is useless**; we need **to train our Q-function using the Q-Learning algorithm.**
So, for now, **our Q-table is useless**; we need **to train our Q-function using the Q-Learning algorithm.**
Let's do it for 2 training timesteps:
@@ -80,4 +80,4 @@ Because I go to the poison state, **I get \\(R_{t+1} = -10\\), and I die.**
Because we're dead, we start a new episode. But what we see here is that **with two explorations steps, my agent became smarter.**
As we continue exploring and exploiting the environment and updating Q-values using TD target, **Q-Table will give us better and better approximations. And thus, at the end of the training, we'll get an estimate of the optimal Q-Function.**
As we continue exploring and exploiting the environment and updating Q-values using TD target, **Q-table will give us better and better approximations. And thus, at the end of the training, we'll get an estimate of the optimal Q-function.**

View File

@@ -3,20 +3,20 @@
The *Q-Learning* **is the RL algorithm that** :
- Trains *Q-Function*, an **action-value function** that contains, as internal memory, a *Q-table* **that contains all the state-action pair values.**
- Trains *Q-function*, an **action-value function** that contains, as internal memory, a *Q-table* **that contains all the state-action pair values.**
- Given a state and action, our Q-Function **will search into its Q-table the corresponding value.**
- Given a state and action, our Q-function **will search into its Q-table the corresponding value.**
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/Q-function-2.jpg" alt="Q function" width="100%"/>
- When the training is done,**we have an optimal Q-Function, so an optimal Q-Table.**
- When the training is done,**we have an optimal Q-function, so an optimal Q-table.**
- And if we **have an optimal Q-function**, we
have an optimal policy,since we **know for each state, what is the best action to take.**
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/link-value-policy.jpg" alt="Link value policy" width="100%"/>
But, in the beginning, our **Q-Table is useless since it gives arbitrary value for each state-action pair (most of the time we initialize the Q-Table to 0 values)**. But, as well explore the environment and update our Q-Table it will give us better and better approximations
But, in the beginning, our **Q-table is useless since it gives arbitrary value for each state-action pair (most of the time we initialize the Q-table to 0 values)**. But, as well explore the environment and update our Q-table it will give us better and better approximations
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit2/q-learning.jpeg" alt="q-learning.jpeg" width="100%"/>

View File

@@ -9,7 +9,7 @@ The best way to learn and [to avoid the illusion of competence](https://www.cour
<Question
choices={[
{
text: "The algorithm we use to train our Q-Function",
text: "The algorithm we use to train our Q-function",
explain: "",
correct: true
},
@@ -24,12 +24,12 @@ The best way to learn and [to avoid the illusion of competence](https://www.cour
},
{
text: "A table",
explain: "Q-Function is not a Q-Table. The Q-Function is the algorithm that will feed the Q-Table."
explain: "Q-function is not a Q-table. The Q-function is the algorithm that will feed the Q-table."
}
]}
/>
### Q2: What is a Q-Table?
### Q2: What is a Q-table?
<Question
choices={[
@@ -43,7 +43,7 @@ The best way to learn and [to avoid the illusion of competence](https://www.cour
correct: true
},
{
text: "In Q-Table each cell corresponds a state value",
text: "In Q-table each cell corresponds a state value",
explain: "Each cell corresponds to a state-action value pair value. Not a state value.",
}
]}