From a4de2131fab6a241c74a47458128ab8e8af2737e Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Wed, 7 Dec 2022 09:13:49 +0100 Subject: [PATCH 01/12] Adding Unit 3 --- units/en/_toctree.yml | 26 +++++++ units/en/unit3/additional-readings.mdx | 8 ++ units/en/unit3/conclusion.mdx | 14 ++++ units/en/unit3/deep-q-algorithm.mdx | 102 ++++++++++++++++++++++++ units/en/unit3/deep-q-network.mdx | 39 ++++++++++ units/en/unit3/from-q-to-dqn.mdx | 33 ++++++++ units/en/unit3/hands-on.mdx | 13 ++++ units/en/unit3/introduction.mdx | 19 +++++ units/en/unit3/quiz.mdx | 104 +++++++++++++++++++++++++ units/en/unitbonus2/hands-on.mdx | 3 + units/en/unitbonus2/introduction.mdx | 7 ++ units/en/unitbonus2/optuna.mdx | 12 +++ 12 files changed, 380 insertions(+) create mode 100644 units/en/unit3/additional-readings.mdx create mode 100644 units/en/unit3/conclusion.mdx create mode 100644 units/en/unit3/deep-q-algorithm.mdx create mode 100644 units/en/unit3/deep-q-network.mdx create mode 100644 units/en/unit3/from-q-to-dqn.mdx create mode 100644 units/en/unit3/hands-on.mdx create mode 100644 units/en/unit3/introduction.mdx create mode 100644 units/en/unit3/quiz.mdx create mode 100644 units/en/unitbonus2/hands-on.mdx create mode 100644 units/en/unitbonus2/introduction.mdx create mode 100644 units/en/unitbonus2/optuna.mdx diff --git a/units/en/_toctree.yml b/units/en/_toctree.yml index 45f4925..f2fbe2a 100644 --- a/units/en/_toctree.yml +++ b/units/en/_toctree.yml @@ -44,3 +44,29 @@ title: Play with Huggy - local: unitbonus1/conclusion title: Conclusion +- title: Unit 3. Deep Q-Learning with Atari Games + sections: + - local: unit3/introduction + title: Introduction + - local: unit3/from-q-to-dqn + title: From Q-Learning to Deep Q-Learning + - local: unit3/deep-q-network + title: The Deep Q-Network (DQN) + - local: unit3/deep-q-algorithm + title: The Deep Q Algorithm + - local: unit3/hands-on + title: Hands-on + - local: unit3/quiz + title: Quiz + - local: unit3/conclusion + title: Conclusion + - local: unit3/additional-readings + title: Additional Readings +- title: Unit Bonus 2. Automatic Hyperparameter Tuning with Optuna + sections: + - local: unitbonus2/introduction + title: Introduction + - local: unitbonus2/optuna + title: Optuna + - local: unitbonus2/hands-on + title: Hands-on \ No newline at end of file diff --git a/units/en/unit3/additional-readings.mdx b/units/en/unit3/additional-readings.mdx new file mode 100644 index 0000000..9c615fc --- /dev/null +++ b/units/en/unit3/additional-readings.mdx @@ -0,0 +1,8 @@ +# Additional Readings [[additional-readings]] + +These are **optional readings** if you want to go deeper. + +- [Foundations of Deep RL Series, L2 Deep Q-Learning by Pieter Abbeel](https://youtu.be/Psrhxy88zww) +- [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/abs/1312.5602) +- [Double Deep Q-Learning](https://papers.nips.cc/paper/2010/hash/091d584fced301b442654dd8c23b3fc9-Abstract.html) +- [Prioritized Experience Replay](https://arxiv.org/abs/1511.05952) diff --git a/units/en/unit3/conclusion.mdx b/units/en/unit3/conclusion.mdx new file mode 100644 index 0000000..1e3592d --- /dev/null +++ b/units/en/unit3/conclusion.mdx @@ -0,0 +1,14 @@ +# Conclusion [[conclusion]] + +Congrats on finishing this chapter! There was a lot of information. And congrats on finishing the tutorial. You’ve just trained your first Deep Q-Learning agent and shared it on the Hub 🥳. + +Take time to really grasp the material before continuing. + +Don't hesitate to train your agent in other environments (Pong, Seaquest, QBert, Ms Pac Man). The **best way to learn is to try things on your own!** + +Environments + + +In the next unit, **we're going to learn about Optuna**. One of the most critical task in Deep Reinforcement Learning is to find a good set of training hyperparameters. And Optuna is a library that helps you to automate the search. + +### Keep Learning, stay awesome 🤗 diff --git a/units/en/unit3/deep-q-algorithm.mdx b/units/en/unit3/deep-q-algorithm.mdx new file mode 100644 index 0000000..d8dd604 --- /dev/null +++ b/units/en/unit3/deep-q-algorithm.mdx @@ -0,0 +1,102 @@ +# The Deep Q-Learning Algorithm [[deep-q-algorithm]] + +We learned that Deep Q-Learning **uses a deep neural network to approximate the different Q-values for each possible action at a state** (value-function estimation). + +The difference is that, during the training phase, instead of updating the Q-value of a state-action pair directly as we have done with Q-Learning: + +Q Loss + +In Deep Q-Learning, we create a **Loss function between our Q-value prediction and the Q-target and use Gradient Descent to update the weights of our Deep Q-Network to approximate our Q-values better**. + +Q-target + +The Deep Q-Learning training algorithm has *two phases*: + +- **Sampling**: we perform actions and **store the observed experiences tuples in a replay memory**. +- **Training**: Select the **small batch of tuple randomly and learn from it using a gradient descent update step**. + +Sampling Training + +But, this is not the only change compared with Q-Learning. Deep Q-Learning training **might suffer from instability**, mainly because of combining a non-linear Q-value function (Neural Network) and bootstrapping (when we update targets with existing estimates and not an actual complete return). + +To help us stabilize the training, we implement three different solutions: +1. *Experience Replay*, to make more **efficient use of experiences**. +2. *Fixed Q-Target* **to stabilize the training**. +3. *Double Deep Q-Learning*, to **handle the problem of the overestimation of Q-values**. + + +## Experience Replay to make more efficient use of experiences [[exp-replay]] + +Why do we create a replay memory? + +Experience Replay in Deep Q-Learning has two functions: + +1. **Make more efficient use of the experiences during the training**. +- Experience replay helps us **make more efficient use of the experiences during the training.** Usually, in online reinforcement learning, we interact in the environment, get experiences (state, action, reward, and next state), learn from them (update the neural network) and discard them. +- But with experience replay, we create a replay buffer that saves experience samples **that we can reuse during the training.** + +Experience Replay + +⇒ This allows us to **learn from individual experiences multiple times**. + +2. **Avoid forgetting previous experiences and reduce the correlation between experiences**. +- The problem we get if we give sequential samples of experiences to our neural network is that it tends to forget **the previous experiences as it overwrites new experiences.** For instance, if we are in the first level and then the second, which is different, our agent can forget how to behave and play in the first level. + +The solution is to create a Replay Buffer that stores experience tuples while interacting with the environment and then sample a small batch of tuples. This prevents **the network from only learning about what it has immediately done.** + +Experience replay also has other benefits. By randomly sampling the experiences, we remove correlation in the observation sequences and avoid **action values from oscillating or diverging catastrophically.** + +In the Deep Q-Learning pseudocode, we see that we **initialize a replay memory buffer D from capacity N** (N is an hyperparameter that you can define). We then store experiences in the memory and sample a minibatch of experiences to feed the Deep Q-Network during the training phase. + +Experience Replay Pseudocode + +## Fixed Q-Target to stabilize the training [[fixed-q]] + +When we want to calculate the TD error (aka the loss), we calculate the **difference between the TD target (Q-Target) and the current Q-value (estimation of Q)**. + +But we **don’t have any idea of the real TD target**. We need to estimate it. Using the Bellman equation, we saw that the TD target is just the reward of taking that action at that state plus the discounted highest Q value for the next state. + +Q-target + +However, the problem is that we are using the same parameters (weights) for estimating the TD target **and** the Q value. Consequently, there is a significant correlation between the TD target and the parameters we are changing. + +Therefore, it means that at every step of training, **our Q values shift but also the target value shifts.** So, we’re getting closer to our target, but the target is also moving. It’s like chasing a moving target! This led to a significant oscillation in training. + +It’s like if you were a cowboy (the Q estimation) and you want to catch the cow (the Q-target), you must get closer (reduce the error). + +Q-target + +At each time step, you’re trying to approach the cow, which also moves at each time step (because you use the same parameters). + +Q-target +Q-target +This leads to a bizarre path of chasing (a significant oscillation in training). +Q-target + +Instead, what we see in the pseudo-code is that we: +- Use a **separate network with a fixed parameter** for estimating the TD Target +- **Copy the parameters from our Deep Q-Network at every C step** to update the target network. + +Fixed Q-target Pseudocode + + + +## Double DQN [[double-dqn]] + +Double DQNs, or Double Learning, were introduced [by Hado van Hasselt](https://papers.nips.cc/paper/3964-double-q-learning). This method **handles the problem of the overestimation of Q-values.** + +To understand this problem, remember how we calculate the TD Target: + +We face a simple problem by calculating the TD target: how are we sure that **the best action for the next state is the action with the highest Q-value?** + +We know that the accuracy of Q values depends on what action we tried **and** what neighboring states we explored. + +Consequently, we don’t have enough information about the best action to take at the beginning of the training. Therefore, taking the maximum Q value (which is noisy) as the best action to take can lead to false positives. If non-optimal actions are regularly **given a higher Q value than the optimal best action, the learning will be complicated.** + +The solution is: when we compute the Q target, we use two networks to decouple the action selection from the target Q value generation. We: +- Use our **DQN network** to select the best action to take for the next state (the action with the highest Q value). +- Use our **Target network** to calculate the target Q value of taking that action at the next state. + +Therefore, Double DQN helps us reduce the overestimation of q values and, as a consequence, helps us train faster and have more stable learning. + +Since these three improvements in Deep Q-Learning, many have been added such as Prioritized Experience Replay, Dueling Deep Q-Learning. They’re out of the scope of this course but if you’re interested, check the links we put in the reading list. diff --git a/units/en/unit3/deep-q-network.mdx b/units/en/unit3/deep-q-network.mdx new file mode 100644 index 0000000..2a2c5c5 --- /dev/null +++ b/units/en/unit3/deep-q-network.mdx @@ -0,0 +1,39 @@ +# The Deep Q-Network (DQN) [[deep-q-network]] +This is the architecture of our Deep Q-Learning network: + +Deep Q Network + +As input, we take a **stack of 4 frames** passed through the network as a state and output a **vector of Q-values for each possible action at that state**. Then, like with Q-Learning, we just need to use our epsilon-greedy policy to select which action to take. + +When the Neural Network is initialized, **the Q-value estimation is terrible**. But during training, our Deep Q-Network agent will associate a situation with appropriate action and **learn to play the game well**. + +## Preprocessing the input and temporal limitation [[preprocessing]] + +We mentioned that we preprocess the input. It’s an essential step since we want to **reduce the complexity of our state to reduce the computation time needed for training**. + +So what we do is **reduce the state space to 84x84 and grayscale it** (since the colors in Atari environments don't add important information). +This is an essential saving since we **reduce our three color channels (RGB) to 1**. + +We can also **crop a part of the screen in some games** if it does not contain important information. +Then we stack four frames together. + +Preprocessing + +**Why do we stack four frames together?** +We stack frames together because it helps us **handle the problem of temporal limitation**. Let’s take an example with the game of Pong. When you see this frame: + +Temporal Limitation + +Can you tell me where the ball is going? +No, because one frame is not enough to have a sense of motion! But what if I add three more frames? **Here you can see that the ball is going to the right**. + +Temporal Limitation +That’s why, to capture temporal information, we stack four frames together. + +Then, the stacked frames are processed by three convolutional layers. These layers **allow us to capture and exploit spatial relationships in images**. But also, because frames are stacked together, **you can exploit some spatial properties across those frames**. + +Finally, we have a couple of fully connected layers that output a Q-value for each possible action at that state. + +Deep Q Network + +So, we see that Deep Q-Learning is using a neural network to approximate, given a state, the different Q-values for each possible action at that state. Let’s now study the Deep Q-Learning algorithm. diff --git a/units/en/unit3/from-q-to-dqn.mdx b/units/en/unit3/from-q-to-dqn.mdx new file mode 100644 index 0000000..33d2ba4 --- /dev/null +++ b/units/en/unit3/from-q-to-dqn.mdx @@ -0,0 +1,33 @@ +# From Q-Learning to Deep Q-Learning [[from-q-to-dqn]] + +We learned that **Q-Learning is an algorithm we use to train our Q-Function**, an **action-value function** that determines the value of being at a particular state and taking a specific action at that state. + +
+ Q-function +
Given a state and action, our Q Function outputs a state-action value (also called Q-value)
+
+ +The **Q comes from "the Quality" of that action at that state.** + +Internally, our Q-function has **a Q-table, a table where each cell corresponds to a state-action pair value.** Think of this Q-table as **the memory or cheat sheet of our Q-function.** + +The problem is that Q-Learning is a *tabular method*. This raises a problem in which the states and actions spaces **are small enough to approximate value functions to be represented as arrays and tables**. Also, this is **not scalable**. +Q-Learning worked well with small state space environments like: + +- FrozenLake, we had 14 states. +- Taxi-v3, we had 500 states. + +But think of what we're going to do today: we will train an agent to learn to play Space Invaders a more complex game, using the frames as input. + +As **[Nikita Melkozerov mentioned](https://twitter.com/meln1k), Atari environments** have an observation space with a shape of (210, 160, 3), containing values ranging from 0 to 255 so that gives us 256^(210x160x3) = 256^100800 (for comparison, we have approximately 10^80 atoms in the observable universe). + +Atari State Space + +Therefore, the state space is gigantic; hence creating and updating a Q-table for that environment would not be efficient. In this case, the best idea is to approximate the Q-values instead of a Q-table using a parametrized Q-function \\(Q_{\theta}(s,a)\\) . + +This neural network will approximate, given a state, the different Q-values for each possible action at that state. And that's exactly what Deep Q-Learning does. + +Deep Q Learning + + +Now that we understand Deep Q-Learning, let's dive deeper into the Deep Q-Network. diff --git a/units/en/unit3/hands-on.mdx b/units/en/unit3/hands-on.mdx new file mode 100644 index 0000000..4b73137 --- /dev/null +++ b/units/en/unit3/hands-on.mdx @@ -0,0 +1,13 @@ +# Hands-on [[hands-on]] + +Now that you've studied the theory behind Deep Q-Learning, **you’re ready to train your Deep Q-Learning agent to play Atari Games**. We'll start with Space Invaders, but you'll be able to use any Atari game you want 🔥 + +Environments + + +We're using the [RL-Baselines-3 Zoo integration](https://github.com/DLR-RM/rl-baselines3-zoo), a vanilla version of Deep Q-Learning with no extensions such as Double-DQN, Dueling-DQN, and Prioritized Experience Replay. + + +**To start the hands-on click on Open In Colab button** 👇 : + +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)]() diff --git a/units/en/unit3/introduction.mdx b/units/en/unit3/introduction.mdx new file mode 100644 index 0000000..366286d --- /dev/null +++ b/units/en/unit3/introduction.mdx @@ -0,0 +1,19 @@ +# Deep Q-Learning [[deep-q-learning]] + +Unit 3 thumbnail + + + +In the last unit, we learned our first reinforcement learning algorithm: Q-Learning, **implemented it from scratch**, and trained it in two environments, FrozenLake-v1 ☃️ and Taxi-v3 🚕. + +We got excellent results with this simple algorithm. But these environments were relatively simple because the **state space was discrete and small** (14 different states for FrozenLake-v1 and 500 for Taxi-v3). + +But as we'll see, producing and updating a **Q-table can become ineffective in large state space environments.** + +So in this unit, **we'll study our first Deep Reinforcement Learning agent**: Deep Q-Learning. Instead of using a Q-table, Deep Q-Learning uses a Neural Network that takes a state and approximates Q-values for each action based on that state. + +And **we'll train it to play Space Invaders and other Atari environments using [RL-Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)**, a training framework for RL using Stable-Baselines that provides scripts for training, evaluating agents, tuning hyperparameters, plotting results, and recording videos. + +Environments + +So let’s get started! 🚀 diff --git a/units/en/unit3/quiz.mdx b/units/en/unit3/quiz.mdx new file mode 100644 index 0000000..cefffa6 --- /dev/null +++ b/units/en/unit3/quiz.mdx @@ -0,0 +1,104 @@ +# Quiz [[quiz]] + +The best way to learn and [to avoid the illusion of competence](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf) **is to test yourself.** This will help you to find **where you need to reinforce your knowledge**. + +### Q1: What are tabular methods? + +
+Solution + +*Tabular methods* are a type of problems in which the state and actions spaces are small enough to approximate value functions to be **represented as arrays and tables**. For instance, **Q-Learning is a tabular method** since we use a table to represent the state,action value pairs. + + +
+ +### Q2: Why we can't use a classical Q-Learning to solve an Atari Game? + + + + +### Q3: Why do we stack four frames together when we use frames as input in Deep Q-Learning? + +
+Solution + +We stack frames together because it helps us **handle the problem of temporal limitation**. Since one frame is not enough to capture temporal information. +For instance, in pong, our agent **will be unable to know the ball direction if it gets only one frame**. + +Temporal limitation +Temporal limitation + + +
+ + +### Q4: What are the two phases of Deep Q-Learning? + + + +### Q5: Why do we create a replay memory in Deep Q-Learning? + +
+ Solution + +**1. Make more efficient use of the experiences during the training** + +Usually, in online reinforcement learning, we interact in the environment, get experiences (state, action, reward, and next state), learn from them (update the neural network) and discard them. +But with experience replay, **we create a replay buffer that saves experience samples that we can reuse during the training**. + +**2. Avoid forgetting previous experiences and reduce the correlation between experiences** + + The problem we get if we give sequential samples of experiences to our neural network is that it **tends to forget the previous experiences as it overwrites new experiences**. For instance, if we are in the first level and then the second, which is different, our agent can forget how to behave and play in the first level. + + +
+ +### Q6: How do we use Double Deep Q-Learning? + + +
+ Solution + + When we compute the Q target, we use two networks to decouple the action selection from the target Q value generation. We: + + - Use our *DQN network* to **select the best action to take for the next state** (the action with the highest Q value). + + - Use our *Target network* to calculate **the target Q value of taking that action at the next state**. + +
+ + +Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the chapter to reinforce (😏) your knowledge. diff --git a/units/en/unitbonus2/hands-on.mdx b/units/en/unitbonus2/hands-on.mdx new file mode 100644 index 0000000..ac942a5 --- /dev/null +++ b/units/en/unitbonus2/hands-on.mdx @@ -0,0 +1,3 @@ +# Hands-on [[hands-on]] + +Now that you've learned to use Optuna, **why not going back to our Deep Q-Learning hands-on and implement Optuna to find the best training hyperparameters?** diff --git a/units/en/unitbonus2/introduction.mdx b/units/en/unitbonus2/introduction.mdx new file mode 100644 index 0000000..05c881e --- /dev/null +++ b/units/en/unitbonus2/introduction.mdx @@ -0,0 +1,7 @@ +# Introduction [[introduction]] + +One of the most critical task in Deep Reinforcement Learning is to **find a good set of training hyperparameters**. + +Optuna Logo + +[Optuna](https://optuna.org/) is a library that helps you to automate the search. In this Unit, we'll study a **little bit of the theory behind automatic hyperparameter tuning**. We'll first try to optimize the parameters of the DQN studied in the last unit manually. We'll then **learn how to automate the search using Optuna**. diff --git a/units/en/unitbonus2/optuna.mdx b/units/en/unitbonus2/optuna.mdx new file mode 100644 index 0000000..d01d8cc --- /dev/null +++ b/units/en/unitbonus2/optuna.mdx @@ -0,0 +1,12 @@ +# Optuna Tutorial [[optuna]] + +The content below comes from [Antonin's Raffin ICRA 2022 presentations](https://araffin.github.io/tools-for-robotic-rl-icra2022/), he's one of the founders of Stable-Baselines and RL-Baselines3-Zoo. + + +## The theory behind Hyperparameter tuning + + + +## Optuna Tutorial + +The notebook 👉 https://colab.research.google.com/github/araffin/tools-for-robotic-rl-icra2022/blob/main/notebooks/optuna_lab.ipynb From ea4b144a40d8fd037c06df6dedf8c2c2d0177668 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Fri, 16 Dec 2022 08:11:42 +0100 Subject: [PATCH 02/12] Apply suggestions from code review Co-authored-by: Omar Sanseviero --- units/en/unit3/deep-q-algorithm.mdx | 29 +++++++++++++++-------------- units/en/unit3/deep-q-network.mdx | 2 +- units/en/unit3/from-q-to-dqn.mdx | 6 ++++-- units/en/unit3/introduction.mdx | 2 +- units/en/unit3/quiz.mdx | 8 ++++---- 5 files changed, 25 insertions(+), 22 deletions(-) diff --git a/units/en/unit3/deep-q-algorithm.mdx b/units/en/unit3/deep-q-algorithm.mdx index d8dd604..63b8780 100644 --- a/units/en/unit3/deep-q-algorithm.mdx +++ b/units/en/unit3/deep-q-algorithm.mdx @@ -6,24 +6,25 @@ The difference is that, during the training phase, instead of updating the Q-val Q Loss -In Deep Q-Learning, we create a **Loss function between our Q-value prediction and the Q-target and use Gradient Descent to update the weights of our Deep Q-Network to approximate our Q-values better**. +in Deep Q-Learning, we create a **loss function that compares our Q-value prediction and the Q-target and uses Gradient Descent to update the weights of our Deep Q-Network to approximate our Q-values better**. Q-target The Deep Q-Learning training algorithm has *two phases*: -- **Sampling**: we perform actions and **store the observed experiences tuples in a replay memory**. -- **Training**: Select the **small batch of tuple randomly and learn from it using a gradient descent update step**. +- **Sampling**: we perform actions and **store the observed experience tuples in a replay memory**. +- **Training**: Select a **small batch of tuples randomly and learn from this batch using a gradient descent update step**. Sampling Training -But, this is not the only change compared with Q-Learning. Deep Q-Learning training **might suffer from instability**, mainly because of combining a non-linear Q-value function (Neural Network) and bootstrapping (when we update targets with existing estimates and not an actual complete return). +This is not the only difference compared with Q-Learning. Deep Q-Learning training **might suffer from instability**, mainly because of combining a non-linear Q-value function (Neural Network) and bootstrapping (when we update targets with existing estimates and not an actual complete return). To help us stabilize the training, we implement three different solutions: -1. *Experience Replay*, to make more **efficient use of experiences**. +1. *Experience Replay* to make more **efficient use of experiences**. 2. *Fixed Q-Target* **to stabilize the training**. 3. *Double Deep Q-Learning*, to **handle the problem of the overestimation of Q-values**. +Let's go through them! ## Experience Replay to make more efficient use of experiences [[exp-replay]] @@ -32,21 +33,21 @@ Why do we create a replay memory? Experience Replay in Deep Q-Learning has two functions: 1. **Make more efficient use of the experiences during the training**. -- Experience replay helps us **make more efficient use of the experiences during the training.** Usually, in online reinforcement learning, we interact in the environment, get experiences (state, action, reward, and next state), learn from them (update the neural network) and discard them. -- But with experience replay, we create a replay buffer that saves experience samples **that we can reuse during the training.** +Usually, in online reinforcement learning, the agent interacts in the environment, gets experiences (state, action, reward, and next state), learns from them (updates the neural network), and discards them. This is not efficient +Experience replay helps **using the experiences of the training more efficiently**. We use a replay buffer that saves experience samples **that we can reuse during the training.** Experience Replay -⇒ This allows us to **learn from individual experiences multiple times**. +⇒ This allows the agent to **learn from the same experiences multiple times**. 2. **Avoid forgetting previous experiences and reduce the correlation between experiences**. -- The problem we get if we give sequential samples of experiences to our neural network is that it tends to forget **the previous experiences as it overwrites new experiences.** For instance, if we are in the first level and then the second, which is different, our agent can forget how to behave and play in the first level. +- The problem we get if we give sequential samples of experiences to our neural network is that it tends to forget **the previous experiences as it gets new experiences.** For instance, if the agent is in the first level and then in the second, which is different, it can forget how to behave and play in the first level. -The solution is to create a Replay Buffer that stores experience tuples while interacting with the environment and then sample a small batch of tuples. This prevents **the network from only learning about what it has immediately done.** +The solution is to create a Replay Buffer that stores experience tuples while interacting with the environment and then sample a small batch of tuples. This prevents **the network from only learning about what it has done immediately before.** Experience replay also has other benefits. By randomly sampling the experiences, we remove correlation in the observation sequences and avoid **action values from oscillating or diverging catastrophically.** -In the Deep Q-Learning pseudocode, we see that we **initialize a replay memory buffer D from capacity N** (N is an hyperparameter that you can define). We then store experiences in the memory and sample a minibatch of experiences to feed the Deep Q-Network during the training phase. +In the Deep Q-Learning pseudocode, we **initialize a replay memory buffer D from capacity N** (N is a hyperparameter that you can define). We then store experiences in the memory and sample a batch of experiences to feed the Deep Q-Network during the training phase. Experience Replay Pseudocode @@ -60,9 +61,9 @@ But we **don’t have any idea of the real TD target**. We need to estimate it However, the problem is that we are using the same parameters (weights) for estimating the TD target **and** the Q value. Consequently, there is a significant correlation between the TD target and the parameters we are changing. -Therefore, it means that at every step of training, **our Q values shift but also the target value shifts.** So, we’re getting closer to our target, but the target is also moving. It’s like chasing a moving target! This led to a significant oscillation in training. +Therefore, it means that at every step of training, **our Q values shift but also the target value shifts.** We’re getting closer to our target, but the target is also moving. It’s like chasing a moving target! This can lead to a significant oscillation in training. -It’s like if you were a cowboy (the Q estimation) and you want to catch the cow (the Q-target), you must get closer (reduce the error). +It’s like if you were a cowboy (the Q estimation) and you want to catch the cow (the Q-target). Your goal is to get closer (reduce the error). Q-target @@ -74,7 +75,7 @@ This leads to a bizarre path of chasing (a significant oscillation in training). Q-target Instead, what we see in the pseudo-code is that we: -- Use a **separate network with a fixed parameter** for estimating the TD Target +- Use a **separate network with fixed parameters** for estimating the TD Target - **Copy the parameters from our Deep Q-Network at every C step** to update the target network. Fixed Q-target Pseudocode diff --git a/units/en/unit3/deep-q-network.mdx b/units/en/unit3/deep-q-network.mdx index 2a2c5c5..cb3d616 100644 --- a/units/en/unit3/deep-q-network.mdx +++ b/units/en/unit3/deep-q-network.mdx @@ -11,7 +11,7 @@ When the Neural Network is initialized, **the Q-value estimation is terrible**. We mentioned that we preprocess the input. It’s an essential step since we want to **reduce the complexity of our state to reduce the computation time needed for training**. -So what we do is **reduce the state space to 84x84 and grayscale it** (since the colors in Atari environments don't add important information). +To achieve this, we **reduce the state space to 84x84 and grayscale it**. We can do this since the colors in Atari environments don't add important information. This is an essential saving since we **reduce our three color channels (RGB) to 1**. We can also **crop a part of the screen in some games** if it does not contain important information. diff --git a/units/en/unit3/from-q-to-dqn.mdx b/units/en/unit3/from-q-to-dqn.mdx index 33d2ba4..13df4d1 100644 --- a/units/en/unit3/from-q-to-dqn.mdx +++ b/units/en/unit3/from-q-to-dqn.mdx @@ -19,11 +19,13 @@ Q-Learning worked well with small state space environments like: But think of what we're going to do today: we will train an agent to learn to play Space Invaders a more complex game, using the frames as input. -As **[Nikita Melkozerov mentioned](https://twitter.com/meln1k), Atari environments** have an observation space with a shape of (210, 160, 3), containing values ranging from 0 to 255 so that gives us 256^(210x160x3) = 256^100800 (for comparison, we have approximately 10^80 atoms in the observable universe). +As **[Nikita Melkozerov mentioned](https://twitter.com/meln1k), Atari environments** have an observation space with a shape of (210, 160, 3)*, containing values ranging from 0 to 255 so that gives us 256^(210x160x3) = 256^100800 (for comparison, we have approximately 10^80 atoms in the observable universe). + +* A single frame in Atari is composed of an image of 210x160 pixels. Given the images are in color (RGB), there are 3 channels. This is why the shape is (210, 160, 3). For each pixel, the value can go from 0 to 255. Atari State Space -Therefore, the state space is gigantic; hence creating and updating a Q-table for that environment would not be efficient. In this case, the best idea is to approximate the Q-values instead of a Q-table using a parametrized Q-function \\(Q_{\theta}(s,a)\\) . +Therefore, the state space is gigantic; due to this, creating and updating a Q-table for that environment would not be efficient. In this case, the best idea is to approximate the Q-values instead of a Q-table using a parametrized Q-function \\(Q_{\theta}(s,a)\\) . This neural network will approximate, given a state, the different Q-values for each possible action at that state. And that's exactly what Deep Q-Learning does. diff --git a/units/en/unit3/introduction.mdx b/units/en/unit3/introduction.mdx index 366286d..80118f2 100644 --- a/units/en/unit3/introduction.mdx +++ b/units/en/unit3/introduction.mdx @@ -6,7 +6,7 @@ In the last unit, we learned our first reinforcement learning algorithm: Q-Learning, **implemented it from scratch**, and trained it in two environments, FrozenLake-v1 ☃️ and Taxi-v3 🚕. -We got excellent results with this simple algorithm. But these environments were relatively simple because the **state space was discrete and small** (14 different states for FrozenLake-v1 and 500 for Taxi-v3). +We got excellent results with this simple algorithm, but these environments were relatively simple because the **state space was discrete and small** (14 different states for FrozenLake-v1 and 500 for Taxi-v3). But as we'll see, producing and updating a **Q-table can become ineffective in large state space environments.** diff --git a/units/en/unit3/quiz.mdx b/units/en/unit3/quiz.mdx index cefffa6..d042756 100644 --- a/units/en/unit3/quiz.mdx +++ b/units/en/unit3/quiz.mdx @@ -2,17 +2,17 @@ The best way to learn and [to avoid the illusion of competence](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf) **is to test yourself.** This will help you to find **where you need to reinforce your knowledge**. -### Q1: What are tabular methods? +### Q1: We mentioned Q Learning is a tabular method. What are tabular methods?
Solution -*Tabular methods* are a type of problems in which the state and actions spaces are small enough to approximate value functions to be **represented as arrays and tables**. For instance, **Q-Learning is a tabular method** since we use a table to represent the state,action value pairs. +*Tabular methods* is a type of problem in which the state and actions spaces are small enough to approximate value functions to be **represented as arrays and tables**. For instance, **Q-Learning is a tabular method** since we use a table to represent the state, and action value pairs.
-### Q2: Why we can't use a classical Q-Learning to solve an Atari Game? +### Q2: Why can't we use a classical Q-Learning to solve an Atari Game? Solution -We stack frames together because it helps us **handle the problem of temporal limitation**. Since one frame is not enough to capture temporal information. +We stack frames together because it helps us **handle the problem of temporal limitation**: one frame is not enough to capture temporal information. For instance, in pong, our agent **will be unable to know the ball direction if it gets only one frame**. Temporal limitation From e7d8a780fbb641a340dd3a9b083bbccf8fd68ef3 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Fri, 16 Dec 2022 08:49:52 +0100 Subject: [PATCH 03/12] Update Unit 3 --- notebooks/unit3/unit3.ipynb | 34 ++++++++++++++++++++++++---------- 1 file changed, 24 insertions(+), 10 deletions(-) diff --git a/notebooks/unit3/unit3.ipynb b/notebooks/unit3/unit3.ipynb index 019653e..60efd9f 100644 --- a/notebooks/unit3/unit3.ipynb +++ b/notebooks/unit3/unit3.ipynb @@ -1,5 +1,15 @@ { "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, { "cell_type": "markdown", "metadata": { @@ -32,15 +42,12 @@ { "cell_type": "markdown", "source": [ - "TODO: ADD TEXT LIVE INFO\n", + "### 🎮 Environments: \n", "\n", - "TODO: ADD IF YOU HAVE QUESTIONS\n", - "\n", - "\n", - "###🎮 Environments: \n", "- SpacesInvadersNoFrameskip-v4 \n", "\n", - "###📚 RL-Library: \n", + "### 📚 RL-Library: \n", + "\n", "- [RL-Baselines3-Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)" ], "metadata": { @@ -100,7 +107,7 @@ "## Prerequisites 🏗️\n", "Before diving into the notebook, you need to:\n", "\n", - "🔲 📚 **Study Deep Q-Learning by reading Unit 3** 🤗 ADD LINK " + "🔲 📚 **[Study Deep Q-Learning by reading Unit 3](https://huggingface.co/deep-rl-course/unit3/introduction)** 🤗 " ] }, { @@ -118,7 +125,13 @@ "id": "QR0jZtYreSI5" }, "source": [ - "# Let's train a Deep Q-Learning agent playing Atari' Space Invaders 👾 and upload it to the Hub." + "# Let's train a Deep Q-Learning agent playing Atari' Space Invaders 👾 and upload it to the Hub.\n", + "\n", + "To validate this hands-on for the certification process, you need to push your trained model to the Hub and **get a result of >= 500**.\n", + "\n", + "To find your result, go to the leaderboard and find your model, **the result = mean_reward - std of reward**\n", + "\n", + "For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process" ] }, { @@ -719,7 +732,7 @@ { "cell_type": "markdown", "source": [ - "See you on [Bonus unit 2](https://github.com/huggingface/deep-rl-class/tree/main/unit2#unit-2-introduction-to-q-learning)! 🔥 TODO CHANGE LINK" + "See you on Bonus unit 2! 🔥 " ], "metadata": { "id": "Kc3udPT-RcXc" @@ -738,7 +751,8 @@ "metadata": { "colab": { "private_outputs": true, - "provenance": [] + "provenance": [], + "include_colab_link": true }, "kernelspec": { "display_name": "Python 3 (ipykernel)", From ed065ac128087d00344e40067d504f9f0e8dce5d Mon Sep 17 00:00:00 2001 From: simoninithomas Date: Fri, 16 Dec 2022 09:44:59 +0100 Subject: [PATCH 04/12] Update Unit 3 and 4 --- notebooks/unit3/unit3.ipynb | 34 ++- units/en/unit3/deep-q-algorithm.mdx | 22 +- units/en/unit3/deep-q-network.mdx | 4 +- units/en/unit3/from-q-to-dqn.mdx | 3 +- units/en/unit3/hands-on.mdx | 309 +++++++++++++++++++++++++++- units/en/unit3/introduction.mdx | 2 +- units/en/unit3/quiz.mdx | 2 +- units/en/unitbonus2/optuna.mdx | 5 +- 8 files changed, 345 insertions(+), 36 deletions(-) diff --git a/notebooks/unit3/unit3.ipynb b/notebooks/unit3/unit3.ipynb index 019653e..e0e5281 100644 --- a/notebooks/unit3/unit3.ipynb +++ b/notebooks/unit3/unit3.ipynb @@ -32,15 +32,12 @@ { "cell_type": "markdown", "source": [ - "TODO: ADD TEXT LIVE INFO\n", + "### 🎮 Environments: \n", "\n", - "TODO: ADD IF YOU HAVE QUESTIONS\n", - "\n", - "\n", - "###🎮 Environments: \n", "- SpacesInvadersNoFrameskip-v4 \n", "\n", - "###📚 RL-Library: \n", + "### 📚 RL-Library: \n", + "\n", "- [RL-Baselines3-Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)" ], "metadata": { @@ -100,7 +97,7 @@ "## Prerequisites 🏗️\n", "Before diving into the notebook, you need to:\n", "\n", - "🔲 📚 **Study Deep Q-Learning by reading Unit 3** 🤗 ADD LINK " + "🔲 📚 **[Study Deep Q-Learning by reading Unit 3](https://huggingface.co/deep-rl-course/unit3/introduction)** 🤗 " ] }, { @@ -118,7 +115,13 @@ "id": "QR0jZtYreSI5" }, "source": [ - "# Let's train a Deep Q-Learning agent playing Atari' Space Invaders 👾 and upload it to the Hub." + "# Let's train a Deep Q-Learning agent playing Atari' Space Invaders 👾 and upload it to the Hub.\n", + "\n", + "To validate this hands-on for the certification process, you need to push your trained model to the Hub and **get a result of >= 500**.\n", + "\n", + "To find your result, go to the leaderboard and find your model, **the result = mean_reward - std of reward**\n", + "\n", + "For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process" ] }, { @@ -442,16 +445,9 @@ "1️⃣ (If it's not already done) create an account to HF ➡ https://huggingface.co/join\n", "\n", "2️⃣ Sign in and then, you need to store your authentication token from the Hugging Face website.\n", - "- Create a new token (https://huggingface.co/settings/tokens) **with write role**" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "9ToyuaYwHmxG" - }, - "source": [ - "![image.png](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAagAAAE5CAYAAADFiLQmAAAgAElEQVR4nOydeWBU1dn/P3dmkskeIEBYJcguIQGhEDdcC7K8bqDFFQT11brUV9TaWoWotS3V/mpBRapsKi6Au1VAK7uABGQREGQNayAJWWcmmZnz+2Myk1mTzJJkgOfTxofnLme7957vfe4594723vvvq1GjRmE0xtEQNEC5+xoo5W41lFJ+fF/rtWNNyn58l3VSn998+LQPvrWo0zqrH3ICdVhngn4zaIwMfQvg8Lz9MKoTLI1Z3QhQf/E0FMqP72tDr0AQFWqK9nTzw8kuhNqFQIQbxF8/GdA6XW/f5TpS9fabtH2CQ7NUVav6N2s+GtyhN8D6KIB3AvUKZBNc8JFun2BL28QdTuAaRaZAdXXgvta5l7cfwaPXzO3b+Nk3h2L5UbAw9cA7eX8devNf/aEQ5PHxaJDaAMO5vK4AxMc6U3EGKDXU1T4+AuUqgJffEOs8kpGPkKKHcC4/5w4Ri5DqTDASGQR/fBpdvwJl2JjVjSDBRECBIqKGR0hOgqhQ854+TX/+1EuEG8Srww8qxPHePMjdG6d9Gpeoj6C8Cfp0qeOA+tdLtw7fO4F6BTZSl1Top1AkLye/pWnmDqzhNfZfQGeH7+2HEmGFVNxQq9PEAhGJ7P1n15gVbEAJIqkvzsWB9aRBpYtu6jlejRxhBS1Qvv138GNO50KEFfLl5Na+kUnQTwbN2EFEjX5Fsnph4Fu8QBGWtx+JCKuZFSwIPYlEdg2oXQQIs8RBRlh1bl7/7lHfu0ZdBFXv4fURyMAHxLmBU0AbuEMdAtn8l0BjXLCNnkETKkpwEZJzqzN8zKlJi1MrmI1fQT8Vdvaw3goWhclHJ/Ucr3oUr+4Iybk4gE9t+2oNbCCtqqpaeRQ3iIio3gjJp8N3Et2H0J2wLz9n9d0SCE///CTYaB1C/YrT5HrWFB1+BE/PWsH09H07/EiMQUWgAk3cvtF3+BqzAbw7fOfiwH44EdGZIZh1E3URVLAEfTq59e8evj89rXMHtxyjeNZfo3cATdGhhVTjhhUouDGoho1JhUWUCXDjH85GFgTvEjh7dA8/dBuOYJyZAlLP8QowJhWSJQSB8imej7LLe1BhXW5+9C+yPUSgDBojQ98CODt0Tz9q9avJe5j6iyfvQYWaXQi1C4EIN0idIZSvQjofqfmNwJypBhDQpmmf4Ij6CKrBHXoDrI8CeCdQr0A2wQUf6fYJtrRN3OEErlFkCiTvQTV19s2hWH4ULEw98E4+nIgomjr8oI9PgIjI/xiUvAfV7IRz+Tl3iOwsvUAJRiKD4I9Po+tXoAwbs7oRRN6Dqrs4TX7+1EuEG6TOSQjOxYEVMZwxqcZpn8Yl6iMob4I+Xeo4oP710q3D906gXoGN1CUV+ikUycvJb2mauQNreI39F1Deg6q7wI1+/jRqBRtQgkjqi3NxYD1pUOmim3qOVyNHWPIeVIQJ+3Jya9/IJOgng2bsIKJGvyJZvTDwLZ68B+VPTyKRXQNqFwHCLHGQEVadm9e/e9T3rlEXQdV7eH0EMvABcW4g70EF0b82e48QXgHkPajGLo68B9WIh6cJqOd41aN48h5UlBH25eesvlsC4emfnwQbrUOoX3GaXM+aosOP4Okp70HVXbzoO3yN2QDeHb5zcWA/nIjozBDMuom6CCpYgj6d3Pp3D9+fnta5g1uOUTzrr9E7gKbo0EKqccMKJO9BNaw4jXc4G1kQvEvg7NE9/NBtOIJxZgpIPccrwJhUqO9B6UIpnoeveVstgO9rNW+rBfBdVvPx8evXYQngo9UU2tvHtdzHx913tI7m0Uq1fsOsMynNlbqmhWdxtzUrfHzNn+9r8edTV4a+BdD8+gEsdVv///Ne67nX5k15jBwxkldnzHCVJXSLhx+JP2ea3r53Xr6+r9W8LQH8gG1W/zHwOVZ1HFP8WgL4/s+xQOdiQ2xtcp5+OBZ3W/NfH7+BFn/Wt4Otwzr7Lm+/nj7Qo2/19QP1vX77aDx9CMMSgkB5h1vO54MHDhzkySefILPvBbROa8XYMWNZsXw5NputZjvlY5W3Vb4+Hlb5+tT4KKqrqykuLqKysrJmufKwrsL69d1q5+0HYZVXKymP5Q1IQdX6NdWs0xKMdbabt6/cMiScDOqxKEdLePselnr98vIyioqLsdltAVPx3ttpzWYzy5YtpaTktKsstZYAfkMOXAP/vLf1SsM7L9+mDFRWt7qqwG1CAL/uNq/Dqlrfz8kW3jnjfQ66klVe2akGZeeTnB+LZ/IuP9Ah97XuD3Q92zeYVFzWu4+q0zr7Om/fq+/09t37Xm+f2uX++m6Csc6aeQwq4dVenvgIlPsLVO5+IAuw5Ouvueaaq3nln/8ENAYOHMjq1asYM+YmFi1c6KOqPn5AVfZVeR/rdpewbds2+mdn8+knn3gsb1hEBD53LX7uarzvduq+K/K8i/K463LzQ7bOu0PcquO6a/T267d13r3Weddba+u8iw50143n3bu37+5VV1WRO3Uqd9x+O6eLT9cU0090UGPxtm74rsfT96peJP580vZXxgDWu07edfVpAz9tGGhNwGNT3zF1s3WfK3WcY2627nPUzzlOrd+A5Ou1uPvOs8S5vPasaZDFnx+or6mzj6q13n1a8BESPn6dllp/3rx5NUWrXY6XXb58ud/lHi3iagOPlvCLwXuBj7rVo4q7du3iT396GhR88umnDB9+LTqdjlMnT/LZ558zavRo111OTYrOhD19Hx31r6ua1xqnrwEmk4ljx4653dk5jmugQUbnjg0bc3IsUPjxg7Aod99ZGefyQH4d1tWezvbwvIOrndMWvHUm7OG7DoCX71m90KxblTx8N6w2KxXOCJnaO8LafZ0JevueZ07NfSWa0uq2RPY9KN/7ac/7bqWpmuJ7+TWWAH5gS63fgAI7d/P2Q7HOvDyz9zo+XidQsNeTz/Xl4fupsNf5Wr/1bEdNqznnaiqk1ZyDmtbA60p5XU8+kRAB/ACbB7e7R4s0DEfFly9fzt13382BAweYMmWKe4PgHHOaO3cOEydOZN++fWR07QohjDlp4PIhlDEot+Nss9n48IMP2Lp1K7979Hdce+0I9Ho9AG3atmXixIkkJSUBsHnzZkaPGsmGDRv47LNPGXjhhQwf9msKCgrQgK1bt3LbbbfSOq0V7dulM+XZZykuLnKp9+H8fP7yl7+QmZmJMTaGHj26M2PGdExmMxaLhalTp/J/j/4OgH/+v//HqFEjyZ06FbPZgqZpbNniSD+tlSP9Z595lqLiYtzvQtzvXvIP5fPiX/5CZt++xMbG0L17N6ZPn47JZPK9C0LDbDYzdcpUfv/kk+zfv59nn3mGdunptEtvyz//+U9sVivOuymlYPl333Ht8GHExhjo3r07M6b/C7PJhKbBG2+8waiRI1i2dKnrbm7ZsqWMGjmC116d4brbO7B/H7ffdit/++tfsFqrfe4ei4oKee21V7n0kouJiTHQLr0tzzz7DMXFRW43ZRqVlRVMnz6dzMy+xMQYuPDCAXzw4YdYbVbQNA4dPMADD9xPenpbYgwGbr55LFu2bsGZ4ZatW7h13DhatWpJeltHHkVFxa67vuMnjvPggw+69r/hhutZsXwFduXo6H/++WfH/i1b0qplSyZOvJttW7fW3CfU3BGisXnTJq77n+tYvGgRGzduZNxvxjFyxAg2b9qEhobZZGbWG29w4YABxBgMXHLxJXz44QfYrDZXKu6XnTMaWbN6Dddd9z888vDDnDx1Eg2NouIipjz7LN26nU+MwcDYsWPYuuXHmvbV2Ld/H7feOo53F7zLj5s3MXbsGAwGPQMG9Gf1mtXODOr9847OAkdY3r6vDT7C8ra1rRSJCAu/Pp4V9z5pvXzNfXm9EVbdEVfjRFiefaL7WebpB46wPGyQEVag8fPQIyxvHzRN48orr2TKlCnk5uaSm5vrE2HNmzeXiRMnMmfOHLp27epaTjDW1Z7Of/mJoOrDXZlLS0vZtHkT3bv34PLLr3CpoWM7T2symVi2bBkdO3Zi9erVtGuXzq8GDyYlJYU1a9cy/q67SElJ5qGHHqLSZGLWrDcwW8w8/9zzKGDGjOnMmzePMWPGcNOYm/ji88+Z/NhjtE5L44Ybb8L/PYFj2Zo1axg//i5SkpN56KGHMZkqmTXrDSwWM889/wLG2FjcQyqL2cyMGTOYN28uY8aMZcyYm/j88y9q8mvNrbfdWtOcbndsSnHy5EmWLF3Ct99+g06nIysri61bt/L8c8+RlZXNVVddhQIWfvghDzxwP9nZ2Uydmsuh/ENMmTIFBTz00MN069adZcuWccmll3HNr38NwA8/bGTZsmWkp6dTWlZGcnIyv+zdx8KFCxkxciQxhhiPGz7sinfffZepU6YwctQoRo4cxTffLOOvf/kLiYmJPPH4E+j1eioqynnqqaeYOfN1Lr30UsaMGcOqlSs5sH8fNquNX/bsYdKkSezcuZNRo0bRoWNHli5ZSlFREcqu+H7dWu66806SU1IcbWs2MeuNN7BYLDz//PMA/O2vf+O9BQuYcPfdtG3ThlWrVrF6zWpyLsqhpLKSP/3paTZv3sxvH3yQ+Ph4vvj8c3788Uf6ZmaiND3OO2LfI1wbjVSaTDz11O957dVXufTSS/nDH//IsqVLuf222zj44os89thk9Aa9z94njh1n2rS/cepUIX/72zTatG5DwcmT/PaB+1m5ciVjx46lY8eOfP75Fzz88CPMnz+fjIwMqquq2b17N/94+WUqKytJT2/HkCE5rF+/juefe565c+fSvn372tPEiedpUxvYuXzlsTyw7xV5ed0rBxr11PCM4Xzv9AngOwMKr/eg6otYg7VeDeMZGTkX15YgzOTxSp6a5D39oKxyT86n/eu1gZ40BQiR6t3cZ3dnf+XlO2tQExF5+2haTeQEubm5AEyZMgVN05gzxxE5zZ49hwkTJrhFRs7dQ38PiqqqamVx+6uqttZYb9/X7ti5S1144YXq6quvVkeOHlNVXuurqmtsVbVavmKlAtR5XbqoJUuXurYpPl2i7ho/XmVlZaut27a59p8+Y4bq3qOHytu0WVVVW9WRo8fUseMnatK1qjVrv1ft27dX99//gCotK1dV1VY1b958Bah58+a7tmto+t5/zvyqA+TnXO60paVl6t5771PJycnqjVmzlMlkVpaqajVjxgwFqBdeeEFVV1vVofx8dfXVV6uRI0eq/PzDqtpqVSaTWT3+xBPq8ssvVwcPHVJ79+5VF110kbrjjjtUUXGxOnWqUI0ZM0bdcsstKis7W/3000+q2mpVL730kmrfvr1av2GDqrbaVLXV6mFPFRaq/MNHXP6ePXvU4MFD1MiRI1XByVOq2mpVs2b9WwHq/gceUKdLSlS11abMFosyW6rU6ZISNX78eJWcnKzmv/2Oo75Wm6o0mVRVtVWdLilV48ePV1nZ2Wrbtu2OvKutasaMV1WPHj3Ups2b1YkTBWrYsOFqwoQJ6nRJqbJabcpktiiTyayqrTb10087VFZ2tnr66adVVVW1slptqtJkVmazRVmtNp+/EwWO9IYNG65OnChQVptj+dJly1RycrK68847VVFRsbLabOrw4SNq5MiRqkuXLiovL09ZrTa1cqXjPMzNzVUlJaXqgQceUF26dFH//e47R1o2m6tN5s2fr6qtVmW12dTGjRtVjx491Guvv66sNpv6accOlZ2drQYPGaI2bNigrDabKiwqUrfddptKTk5Wa9asUVabTdlsdle67r7Nx6/f2gL4tdYewA/izx7Aj4S1+/r2CFp7AD+4PxXAj5BV7lbV66u6rArgB7CRYsqUKQpQU6ZMUXPmzFGAmjNnToRzcWDweTbppcwqQESklMJms2G1WtHp9S6FxGs71x1djf8///M/XHTRxa5lx44dY/u2beTkDCE1NZVTp04BkNElg1/27OHAgf1kZmbSunVr7HY7R48eIS8vj7Vr16KAffv2Ul5ejtFo9LybrHnWe9wr/cLCUyhVm/7BmvS9x6TatGmN3Wbn6NGj5OVtZO3atVCTX0V5OXFxxto7EFWbc9eu53PJJZegNxgARWa/LACsVhsKxcGDh9iwYQOTH38CQ4yBU6cKAUW3bt149513OHbsOH369CYzsx/bt2+jqKgIs9nM4cOHue9//5cdO3Zy8FA+HTt1ZsfOnWRlZdG583k+7Q2KlJRUUpJTOHWqkM2bNrFhwwZKS0sxm82cLDhBXFwcmzZtIjk5mVvH3UpCQiIohU6nRwOOHTvGtm3buPjii/n1Ndc47gqVwmCIQQOOHzvKtm3byBmS42jbU6dAg4yMDPbs2cOB/Qc4//zzad++HYsXL6ZVqzTuv/9/6dIlA51BD0qRnJJMm9atef3110lOTub2O+4gvW06ms57NM39fKw9zs56r127lrKyMq759TCSU5JRSpHeLp2rrrqa//znP2zduo2s7GxXipaqKl59dQavv/468+bN47JLL0MphcVsYdPmTVx44UD69u1LUWERCkVSUjKdO3dm988/Y7FYXEFD/+xsevfpAwpSU1Lp1asXCxYscMxeVfjUofHHoAjgexTC08drnRvhjEF5W3/51N7Be/uhWQL4DQ+RArWjf+sck/IYg/L23a3ytrh8j/b36YcDWAL4AWz4+EZSs2fPZvx4R+TkPiYF4b8HFfQjPvcnFmlpabRr146jR49SWHiKtLQ0twycoZz7lyWgTes2GI1Gh68UhYWF7Nmzh02bNjFr1iyf/MwmM5oGW7ZsZeqUZ/nyyy/p378/3bp393gGrLmejeN65q0Bp+pJ32Qy1z7zdHu8u2XLFp/8ahtBq93Y5eO5rLYkHv6hgwcpKytj6pRnmTrlWY+1ycnJWK3VxMcn0KtXT/7971kcOnSI08WniYuLY/CvfkWvXj35YcMGLrigD3t/+YX+AwaQmpLiIbBOm38onylTnuXtt9+mZ69eZPXr51FEs9nEoUMH6Xr++bRunVZTjdrpAM5jM3DQIBKTksBrvWfbvuHbtmYTiYlJTJk6Fbvdzj/+8TL/+MfLjBo1ir/9bRq9evemQ/sO/G3aNKY8+yxPPfUUTz31FHfffTfPP/8C6e3a+elmatveOU5iNps5cfw4AF0zuni0etu2bQE4dOggmtuaN//9b8d4Io6JPja7DYPegMls4tDBg2zalMeggQN96tSrVy+UXYHm9szc3fE83EGjgccEEadfazWUpvz4vtahCm6WAH49CuZ9dntYZ3Lefh3Z+1rNv+/Myc2v6fZcJXD6DbGu2mle12WdFazf+oxBuSZQOMuoGmT9NpDbI7b6rdP19l0uTn328N2qhB/fF42MjAyXd/DgQVff4dEgzhYIdizKbUwq+DEot38nJyfTu08fli5dyo4dO+jZs1f9ERiefosWLejatStZWdk89/xzxMXFe+SXkpzMoUOHmPzY/2E2m1m79nsuHDiQX/bs4bbbbnXLS+F5N+q4k2nZsib97Gyef+454uITcJ+ll5ySUnvEag5N/qGDtfl9v46BAy9kz5493HbrrV6t4H6IvVvH/5KOnToB8MQTT/LY5Mke63Q6HcnJyQAM+tVgkpOT2blzJydPnqJnz1506NiJ7t17sHvPbnbv3sPu3buZOHESMbGxjrtqau/IykpK+f1Tv2fD+vV8+ulnXHvttRQVFzP+rjs5fvwEAEZjHB07dmLz5s2Ul1d4JoCiZYsWdD3/fIoKC6mqshAfF4f7HaRzfXZWFs89/zzxcfEeZ3xyUjKg6NypM2/Nns2UKVOYM3cu/3rlFX73u0eYUzNOk52dzUcff8zOnTt5/fXXeWPmTOx2O6/8618kJia5nTW4jjBoriNujDOS3q4dAKdqIh5nQcrLywFo06Yt7tFMQUEB99xzD3FxccydO5errrqaK668kjhjHB07daJPnz78+8036dG9h8cFazQaiYuL8z2w/m5R/S3TvJZ79UeBxkqCGpNy873PzbDHoLytqvUdyToL5u2HaD0awvP41ybf8DGpQO3srQeuvpbAHbp/6zsG5Vzu73jUa+sNoTxt7YMU737YmaqjpAHHoLx9PxFR7ZjTbA4ePOgzJlVXRORoz9qABbdc/eEjUO4THdx9fzY2NpaRI0cxZ/ZsXn7pJfr07kPvPn1cj/++WbaUARcOJL1tWzdVrLl7qRGEdu3a0b1HDzZu3Ehx8Wn69evsU+TDh4+wcuVKpk7NZdCvfgVAtdWK3a5cZXRLmYqKCoevQXp6Tfo/bKSo+DT9OnUO0BS48sx3z2/QIEd+1bX5ud9Ced6haR4peTe8BnTu1JGLLrqIH37YgMVspmOnjn7P9K5dM7jwwgv5cfNmSkpKGDx4MCnJyWRlZ7Fs2VKWL/8OgN59etfeubrKpDh+/Bg///wzw4YP58qrrkSn12OzWbFaba670fj4OHr27MmxY8f4esnXZGVnYTDEoLBjtdpo07Yt53ftytq1a9m0aTNXXnklADa7Y0Zierv29OjenR82bqS4uJhO/Tr7XLAoRUlpKS1SU+mS0ZWnfv8UBQUFbFi/nrLSMtq2aYvJbCY5KYnMvpk8l/sc+/ft59ix41jMFpISk3zaFTQsFgtVlqqa00gjOysbgO/++y3Dhw3DGGektKSM779fS/v27RkwoL/HffQ999zDyy//g6PHjrJhww+8+uoMMjMzadO2Db169uLfs2axfft2hgwZ4nExeRWk9t/+rjC/u7lFQB5+/dYZijQ8QnLi6XsXy+N+17m7t98AW3vgNf++MydvP8D15Gvx72veEVLA5BtkPQ93+BGRh+/V4QcV4nhvHuTurvbBs70a3iAac+fWztabMGGCKwWnSE2dOjWoWXruy/2hf+aZZ6fWsb5eOnfujMFg4O235/PBBx+wf/8+ftzyI3/9y4u8+OKLpKakctHFF3P48GHmzZ3L5VdcwWWXDXXtbzQaMRpjefvt+axYsZxYo5GiwiIWLVrEiuXLybnoIioqKlixfDm7d++mRYtU8jZtYsqzz/Djjz/SrVs3brjhRhLi4yktK+OjxYvZu3cvLVqkUlBQQI+ePYiLi3Olb4wzUlRYyKLFjvQvuvgiNE3nFkBplJWWstyVXws2bdrEs2753XjDjSQkJOB+ClRbrSxduoQjR44wZuwY0tJaA4r8/MPMnTuHK664ksuGXkZKSirlFeXMeuMNNm/eTGJiEkeOHmHunLkcOHiA7Oz+oDmimx0/7WDRokUcOHCA3z36f3Tu3Bm73c67777LhvXrycrK4q7xE2rKUnuwFVBdVc2yZUv54YcfSE5O5sD+A0ydOoVvvvmG9PR2jK0pY4f27di4cSOffPwxhw8fobqqipmvz2TZsmVcO/xakpKS+fDDD1m1chWVFeUcOXKEPzz1FDGGGAYOHEis0cjb8+ezYsUK4oyxFBYVsWjhQpavWMFFORexe/fP3Dx2LHv37UNDY9PmTXz4wYd06XIeY8fezMa8jdxy882Ul5dRba1m9apVfPrppwwcNJBRo0dhiPG6h9Jg/Yb1fPTRYuxKYbXaHELbqyfHjh7l/fff50RBAWazmZf+/ncWLFjAvffdx7hbb0Nv0HM4P585c+Zw3fXXc/XVV9OqZStA8dJLL9GmTWuGDMmhQ4f2bNyYxzvvvI3ZbMZmt7N2zVpmzJjOhRcOJDk5mcJThSxevJhOnToxbNhwYgwxAKxatdL1zkjnzucFf0G5lN3Lj4R1UsctaySz959dY1awASUIM9nwI6y6I4boo/Z4OcVp9uzZTJhwN84GueLyy4Fakbr88stdDRXue1BYvGbx1ffnmt3nNuuvtKxcvf322yozM1PV1Ea1atVKvfDCn1VhUbGyuM3imzo1t3aWn3PWn6VKLVm6TA0aNMi1/3lduqi/v/SSKj5doipNJjV9xgzVqlUrBahBgwapjz76WE2cNEkNGzbMNbuvpLRMPfXUH1xpjBw5Up0oOKksVdV1pu89g6/SZG5Qfu5/JTWz+LKystX2n35yze5b4VZv56y/SpNZvf32O6pnz16u8mRm9lPz5s13zZCrrraq1157TQHqoosuUnv37lXV1VZVcPKkGjlypALU73//lDJbLKq62uqYqedmq6qr1cJFi1TPXo48evbqpea//bb6/VNPecwErLba1Pbt29UNN9zgcexeeullVVpWpixV1erDhQtVZr9+rvWDBg1SS5d945qluXTZMjVo0K9c67t06aJeeulldbqkVB09dlzlPvecqy0BddVVV6sffvhBVVtt6pdf9qrfPvigSk5Odq2/+eab1d69+1R1zcy9aq+ZfN+vW6eys7Nd2y9Y8J6yWm2qoOCkevLJJ11ptWrVSv3pT39SxcWnXfu6z+KzWh0z6U6ePKXGjh2runTpotauXausVpvaf+CAuvPOO115JCcnqwkTJqg9v/yirNbaWXz33XefKisvd83Sy83NVYBauXKlx+y9QH82Hz/QLL/6Z/3VP8uvPtuAWYDNNMsvErP+IjPLr5FnAargZv0pj+W1vl+rAvheM+f8zfrbv3+/grpn6zln93333XcBtwmWiH7N3G63U1JSgt1uJzk5mdjY2KD3Ly0pQeEY3zIYDB53HJUmExaLmaSkZGIMBmfA4xPqlpWVU1VlITkpmVij0TVpw26zU1JaimP2WAoGvd5zR2dONb7JVInZbCE5OQmDIcatJGHcorphtVopKysDIDU1FZ2u/vemg73/rK6qorSsjJTkZGJiY+u8gysvL8fibDfnsavZwa7slJwuAc0xW02n13lk5L4+OSnZM/JRUFVdRVlZGTqdzm9dTSYTlZWVGAwxpKQkuz0G8F8zq9VGWVkpOp2+ZnsdzocXlaZKKitNJCTEkxCfgO8XIfx/GcLf0SwvL8disZCQkEB8vOf4aINoioioSYvj9R5UU0dEzpAlQhFRYyYfndRzvOp4BLn8u++48sorPWbref/swvsAACAASURBVEZCMG/eXMaPHx+x96Dk96DqIezLz1l9twS89TC4M95Pgo3WIdQvwJHsnnxT90NTdPgRPD2dHbq379vh12+de3j7Ea1AE7dv9B2+xmwAzw7fp8f244cz5nRmCGbdyO9B+REMTz/QDm45hqw4jd9DNnoH0BQdWkg1bliBghGIhn6bLyyiTIAb/3A2siB4l8DZo3v4odtwBOPMFJB6jpefWX8NDXB8LCEIlE/xfJS94RGYzy2Bf4XA9xA2XoceLmFfbn70L7I9RKAMGiND3wI4O3RPP2r1q8l7mPqL1/AIrEkVLHCBIyqo4WQXQu1CIMINUmcI5auQzkdqfiMwZ6oBBLRp2ic4oj6CanCH3gDrowDeCdQrkE1wwUe6fYItbRN3OIFrFJkChRMRNYL+NHv7Nn72zaFYfhQsTD3wTj6ciCiaOvygj0+AiChis/Sou318BCqY96DqHYOKWIQUPYRz+Tl3iFiEVGeCkcgg+OPT6PoVKMPGrG4EadwxqAi0aPOePk1//tRLhBukjkkIbj14QEUMZ0yqcdqncYn6CMqboE+XOg6of7106/C9E6hXYCN1SYV+CkXycvJbmmbuwBpeY/8FrJ204OmHM+YU0Qs+SgQiEtn7z64xK9iAEkRSX5yLA+tJg0oX3dRzvBo5wgp+DMqn/w5+zOlciLBCvpzc2jcyCfrJoBk7iKjRr0hWLwx8ixcowvL2IxFhNbOCBaEnkciuAbWLAGGWOMgIq87N69896nvXqIug6j28PgIZ+IA4N3AKaAN3qEMgm/8SaIwLttEzaEJFCS5Ccm51ho85NWlx5D2oRjw8TUA9x6sexas7QnIuDuBT275aAxtI3oOqh7AvP2f13RIIT//8JNhoHUL9itPketYUHX4ET095D6ru4kXf4WvMBvDu8J2LA/vhRERnhmDWTdRFUMES9Onk1r97+P70tM4d3HKM4ll/jd4BNEWHFlKNG1YgeQ+qYcVpvMPZyILgXQJnj+7hh27DEYwzU0DqOV4BxqTkPagoOcRhX25+9C+yPUSgDBojQ98CODt0Tz9q9avJe5j6iyfvQYWaXQi1C4EIN0idIZSvQjofqfmNwJypBhDQpmmf4Ij6CKrBHXoDrI8CeCdQr0A2wQUf6fYJtrRN3OEErlFkCiTvQTV19s2hWH4ULEw98E4+nIgomjr8oI9PgIjI/xiUvAfV7IRz+Tl3iOwsvUAJRiKD4I9Po+tXoAwbs7oRRN6Dqrs4TX7+1EuEG6TOSQjOxYEVMZwxqcZpn8Yl6iMob4I+Xeo4oP710q3D906gXoGN1CUV+ikUycvJb2mauQNreI39F1Deg6q7wI1+/jRqBRtQgkjqi3NxYD1pUOmim3qOVyNHWPIeVIQJ+3Jya9/IJOgng2bsIKJGvyJZvTDwLZ68B+VPTyKRXQNqFwHCLHGQEVadm9e/e9T3rlEXQdV7eH0EMvABcW4g70EF0b82e48QXgHkPajGLo68B9WIh6cJqOd41aN48h5UlBH25eesvlsC4emfnwQbrUOoX3GaXM+aosOP4Okp70HVXbzoO3yN2QDeHb5zcWA/nIjozBDMujFYqqqbuwyCIAiC4IOmlEcAJQiCIAhRga65CyAIgiAI/hCBEgRBEKISEShBEAQhKhGBEgRBEKISEShBEAQhKhGBEgRBEKISEShBEAQhKhGBEgRBEKISEShBEAQhKhGBEgRBEKISEShBEAQhKhGBEgRBEKISEShBEAQhKhGBEgRBEKISEShBEAQhKhGBEgRBEKISEShBEAQhKhGBEgRBEKISEShBEAQhKhGBEgRBEKISEShBEAQhKhGBEgRBEKISEShBEAQhKhGBEgRBEKISEShBEAQhKglboPLy8hg9ejTff/+933X33nsvxcXF4WYjCIIgnGMYwk3AbDbz5ZdfkpCQQO/evWnZsqXHuvz8fOx2e7jZCIIgCOcYEXnEd8MNN2CxWPjPf/4TieQEQRAEITICZTKZuPXWW5k7dy5Hjhypc9uKigoWLFjANddcg6ZpXHPNNfz3v/9FKeXaZvr06SxevJg9e/Ywfvx40tPTmThxIvn5+ZSVlTF16lQyMjIYOnQo69at88lj27ZtjB8/npSUFK655ho+//xzieIEQRDOMCI2SeLSSy+lW7duzJ8/H5vN5ncbu93Oa6+9xjfffENubi4HDhwgJyeHhx56iB07dri2O336NH/+85956qmnGDNmDDNnzmTHjh1MnjyZ++67j/bt2/PWW2/Ro0cPnnnmGY4dO+bad8WKFdx5551cfvnl7Nmzhz/+8Y+8+OKLfP7555GqqiAIgtAEhD0G5SQ+Pp577rmHhx56iFGjRpGVleWzjU6nY/LkyWiahqZpADz66KPk5eWRl5dH3759Xdt26NCBV199lXbt2gFQVlbGCy+8wOzZs7n00ksBaNOmDePGjePgwYO0b9+e0tJSXn/9dR588EHuvvtuNE0jPT2dJ554gk8++YRrrrmGxMTESFVZEARBaEQiOs28f//+jB49mvnz52OxWPxnqNOhaRpVVVXs3LmT7777jrKyMg4ePOixXadOnUhNTXX53bp1IyEhgTZt2riWGY1GYmNjXRHbgQMHOHr0KJdddplLAAEuuOACKisrMZvNkayuIAiC0IhEVKAMBgN33HEHGzduZMWKFX63OXnyJE8//TRZWVlMnz6d8vJyD9EJh7KyMlatWkWfPn1cUZqmafTp04eTJ09itVojko8gCILQ+ETsEZ+TLl26cPvtt/Puu+/ym9/8xmNdcXExDz74IBdccAF5eXkkJiZiMplYv359RPLW6/UMGDCABQsW0Lt374ikKQiCIDQPEf+ShKZpjB07FovFwurVqz3GfAoKCiguLmbcuHGNMhbUpUsX0tLSyMvLi3jagiAIQtPSKJ86atmyJRMnTmTGjBlUVFS4lsfFxVFZWcn69eux2+2Ulpby6quvsmDBgojk265dO+6++27+8pe/8OWXX2K1WrHb7Wzfvp19+/ZFJA9BEAShaWi0b/FddtlljB8/3mPZeeedx8MPP8xjjz2GXq/n0ksvpVOnTvzud7+LSJ6apnHLLbfw9NNPM3nyZGJiYtDr9fz2t79l9+7dEclDEARBaBo05f6GbBNhMpkwm80kJydjMER8GAwAq9VKWVkZAKmpqeh08l1cQRCEM4lmEShBEARBqA8JKwRBEISoRARKEARBiEpEoARBEISoRARKEARBiEpEoARBEISoRARKEARBiEoi+hKS1Q5VNpDfBhQEQTg30ekgVg+GCIQ/ERGoUjOUWxRV/n+nUBAEQTjHiNVDklEjJS70NMJ6UddUDUWVimoRJkEQBMEPMXpolaARHxP8viELVLkFTlXU7poQC4mxEGfQ0MvIliAIwjmJzQ5mq6KiCiqrape3TtRIMgaXVkiP+EzVteKk10FaAiTEanz11VcsWrSIdevWcezYMeQrSoIgCOcGmqbRvn17cnJyGDt2LCNGjKCySlFY6RCtUxUKvS64SCqkCOpIieOxnl4H6ckaJUUnefh3j7Lkq/8Em5QgCIJwFjJ8xEimv/JPUlu14USZwmZ3PO7rmKo1OI2gBarU7Bh3AmibBBUlpxh13Q3s2bWDlBatuP/BR7j5xtGc37WrfEFcEAThHMFut7Nv/34WfvwFM1/9F6Wni+jR+wK+/OwTElNbU1Du2K5VQsMnTgQtUEdLHLP1EmKgbbLGuNtuZ8lX/6H/oIt47+3ZtGvXLth6CYIgCGcRx48f59Y7J/Ljxu8ZPmIk7y94l4IyRWW1Y3ZfhwZGUUEJlNUOh087Nm+TBCu//ZrbbruNlBatWP/9GhEnQRAEAXCI1JCLLqH0dBELFixg6NXXcrImiurUQmvQe1JBPYNzveekFHEGjUWLFgFw/4OPiDgJgiAILtq1a8f9Dz4CwKJFi4gzaFATDzX0ndmgBMr9CxF6Haxbtw6Am28cHUwygiAIwjmAUxvWrVvn8fpRQ782FNYshmPHjgFwfteu4SQjCIIgnIU4tcGpFcESlkA5h69ktp4gCILgjVMbQn0nVpRFEARBiEpEoARBEISoRARKEARBiEpEoARBEISoRATqDEPTNDSt4d+yEgRBOFMRgRIEQRCiEhEoQRAEISoRgRIEQRCiEhEoQRAEISoRgYog06dP55lnnsFsNjd3UQRBEM54QvrJ91ApLi7mySefJDs7mwceeAC9Xu+z7v7772fgwIFNWp5Dhw4F3Gb06NE8/PDDDUrv9OnTnDx5Un7qXhAEIQI0qUDZ7Xby8/NZtmwZOTk5DBo0yGddU0YfCQkJTJo0yZXn8uXL+frrr3n66adJTk4GID09vcnKIwiCINTSpAIFkJqayogRI3jzzTfp27cv8fHxTV0EF0ajkZycHJd/5MgR1q1bx8UXX0xaWlqzlUsQBEFohjGokpISrrzySvbv38+qVavq3T4/P59HH32UtLQ0cnJymD17NhaLBYAvv/ySSZMmUVhY6Np+69at3HjjjWzdutW1rLi4mHvvvZdvvvkm5HIfOHCAxx57jIyMDHr37s20adMoKyurcx+r1co///lPHnnkEYqLiwEwmUzMnDmTrKwsMjIyeOKJJzh+/Lhrn3379vHoo49y9OhR3nvvPXJycsjIyCA3N5eKioqQyy8IgnCm0SyTJDp27Mhdd93Fq6++SkFBQcDtdu7cyS233EL79u3ZsmULM2fO5KOPPmLmzJkopejcuTN5eXkcOHDAtc/333/P+vXr+f77713L8vPz2bNnD507dw6pvDt37mTcuHEAfPHFF8yaNYtVq1Zx7733cvLkSb/7KKX48MMP+eCDD3jggQdo2bIlFRUVPP7446xcuZJ33nmH1atXYzQaPQSsurqa9evXc99997Fp0yb+/Oc/84c//IE5c+awZMmSkMovCIJwJtJss/hGjhyJ0Wjk008/9TupoKqqirlz53LVVVcxefJkOnXqRP/+/Xn22WdZtmwZx48fp0uXLvTo0YNdu3YBjuhk+/btPProo/z444+YTCbAITDnnXceHTp0CLqcFouFt956i169ejF16lQyMzMZOnQo06dP5/Dhw3z22Wd+91u5ciXTpk3jr3/9K3369AEcY1y7d+/m73//O1lZWXTq1InJkyejlGLTpk2ufU0mE+PGjWPatGlcffXVTJgwgeuvv57Vq1cHXX5BEIQzlWYTqJYtW3L//ffzwQcfcPDgQZ/1p06dYsOGDQwfPhyDoXaorGvXriQmJlJaWkpKSgrZ2dnk5eVRVVVFQUEBp06d4pJLLuHUqVMUFBRgs9nYsmULAwYMcE18CIbCwkLy8vIYNmwYKSkpruWdO3dm6NChbNq0yWdix+HDh8nNzeXxxx9n6NChANhsNtasWcMll1ziIZQtWrSgb9++Ho/5nPV0fnPPaDTSunVrKisrgy6/IAjCmUqzvgd1ySWX0L9/f/79739jtVo91pWVlVFcXMzll1/u+kCqpmm0bduWbdu2YbVa0TSNIUOGsG3bNkpKSjh48CDt27cnMzOTtLQ0du/eTWlpKT/99BODBw8OqYzOcmRkZHgs1+v1xMfHU1BQ4BoTAygqKuK5555j165d7Ny5E5vNBjgiwqKiInJzc9HpdK766HQ6cnNz5d0pQRAEL5pVoIxGI3fddRerVq1i+/btHuv0ej0Gg4EVK1aglPL427FjB3379gWgZ8+e2Gw2Dh06RF5eHgMHDiQ1NZUBAwawceNG8vPzsVgsdOrUKaQyOsvhPhEDHGNMNpuNtLQ0YmNjXcsXLlxISkoKX375JWvWrGHlypWA4yvker2eKVOmYLfbfeo0adKkkMonCIJwttLsX5Lo168fY8aM4Z133vGYct6+fXv69evHmjVrXFGIP9q2bUvPnj1Zu3Ytu3btol+/fgBkZmaya9cutmzZQkZGBm3btg2pfM5yrF+/3iPKO336NFu3bqVnz57ExcW5lo8aNYpnnnmG/v37c/vttzNjxgxOnjxJXFwcF154IZs3bw44sUIQBEGopdkFStM0xo4dy8GDB/nkk09cyxMTE5kwYQLz5s1j3rx5mEwmlFIcOHCAH3/80bVdfHw8AwYM4IsvvqCsrMw1vtO1a1fKy8t55513GDBgQMjvWznLsXDhQubNm4fFYqG0tJRXXnmFgoICbrrpJo/fZ+rUqROpqamuehkMBubOnYvNZmP06NFomkZubi7Hjh0DoLS0lNWrV3s8JhQEQRCiQKDAMe38vvvu85nEMHToUF577TVmzpxJQkICOp2O6667ji1btnhEVYMGDWLbtm1kZGTQsmVLAFq3bk2XLl3Ytm2bxxcrQmHo0KHMnDmTN954g7i4OFJTU8nLy+PNN9/0GZtyxzkRZN68eWzcuJH09HSmT59OeXk5HTp0QNM0unbtyqeffkp5eXlYZRQEQTjb0FQQH44rt8CpCgVKkZGmc4mB8x2exsJut1NSUgJAcnKyx6y+psRqtVJWVoZOpyMlJSWsX7YtLy/HYrGQkJAQVHTnzFO+9ycIwpmAu04cKLSDptE6USPJWP++zdPTB4lOVyuGzYnBYIhYOZKSkkhKSopIWoIgCGcjUfGITxAEQRC8EYESBEEQohIRKEEQBCEqOSPGoIRaZHKEIAjnChJBCYIgCFGJCJQgCIIQlYhACYIgCFGJCJQgCIIQlYQlUM6vGtjt9ogURhAEQTh7cGpDqF/dCUug2rdvD8C+/fvDSUYQBEE4C3Fqg1MrgiUogdK5bW21Q05ODgALP/4ipMwFQRCEsxenNuTk5GB1e9Cma6DyBCVQsfraf1uqFWPHjgVg5qv/8vnJckEQBOHc5fjx48x89V8AjB07Fkt17Tuc7lpSF0EJlEFXm3C5RTFixAiGjxhJ6ekibr1zooiUIAiCwPHjx7n1zomUni5i+IiRjBgxgnKLQ6Bi9Q4taQhB/dwGQKkZiiodP1PeNkmjsvQUo667gT27dpDSohX3P/gIN984mvO7dkXX0DhOEARBOKOx2+3s27+fhR9/wcxX/0Xp6SJ69L6ALz/7hISU1hSUKzRNo1WCRkpc/elBCAIFcKREUW1V6DRFuxQdJcWnePh3j7Lkq/8Em5QgCIJwFjJ8xEimv/JPUlu25nipHbvSiDFodExt+Iy+kATKVA0nyhw/XKjTFGmJOhKNGl999RWLFi1i3bp1HDt2TL4bJwiCcI6gaRrt27cnJyeHsWPHMmLECCosisIKhzihaaQna8THBJFmKAIFnr+uq5QiPgaS4nTExYBBF/ovzQqCIAhnLla7wlwN5WY7puqad6CC+BVdd0IWKHBEUkWVjsd9CqAmKYmbBEEQzk1c4YmmoQExBse4UzCRkyuJcATKSanZMauvyirSJAiCIECsQSPJ2PAJEf6IiEA5sdqhygby5SNBEIRzE50uuKnkdRFRgRIEQRCESCEvKgmCIAhRiQiUIAiCEJWIQAmCIAhRiQiUIAiCEJWIQAmCIAhRiVbYqbvM4hMEQRCiDomgBEEQhKhEBEoQBEGISkSgBEEQhKhEBEoQBEGISgzeC1rl72mOcgiCIAjnMEWde/gsi7oIavr06TzzzDOYzebmLoogCILQjPhEUMGSl5fHH//4R49lHTp04MYbb2T48OEYjcH9QtXp06c5efKk/BqvIAjCOU7YAmU2m9m/fz9TpkyhY8eOAOzcuZMXXniBJUuWMG3aNBITE8MuqCAIgnBuEbZAASQkJDBo0CB69eoFwBVXXEFOTg633347P/74I5dcckkkshEEQRDOIRptDOq8887jvPPO48CBA65lBw4c4LHHHiMjI4PevXszbdo0ysrK6k3LarXy0UcfMXToUNLS0pg4cSJ79shkDkEQhLOZRhOoQ4cOcejQIbp16wY4HvuNGzcOgC+++IJZs2axatUq7r33Xk6ePBkwHavVyssvv8wbb7zBCy+8wI4dOxg4cCD33nuvh/gJgiAIZxcRecRnt9spLS2lsLAQu93O1q1bmTZtGldeeSX9+vXDYrHw1ltv0atXL6ZOnUpKSgrgiLLuuOMOPvvsMyZNmuQ37e3bt/Pxxx8za9YssrKyALjnnnvYvXs33377bcD9BEEQhDObiERQ27ZtY/DgwbRu3Zq2bdsyadIkrr32WtcEicLCQvLy8hg2bJhLnAA6d+7M0KFD2bRpU8Bp5evXr6dfv350797dtcxoNDJw4ECOHj0aieILgiAIUUhEBCo7O5tdu3ahlOK9996jdevWjBw50jV7r6ysjOLiYjIyMjz20+v1xMfHU1BQgMVi8Zt2QUEBb775JomJiWia5vobP348FosFu90eiSoIgiAIUUbEx6BGjhxJ3759mT9/PlarFXAIkcFgoLCw0GNbpRQ2m420tDRiY2P9pmcwGLjnnnuoqKhAKeXx98ILL6DTRd27xoIgCEIEiHjvnpKSwqRJk1i8eDFr164FoH379vTr14/169e7RAscL+Vu3bqVnj17EhcX5ze9wYMHs3nzZvLz8yNdVEEQBCGKaZTwY8iQIYwePZoZM2ZQXFxMYmIiEyZMYOHChcybNw+LxUJpaSmvvPIKBQUF3HTTTWia5trfZrO5viSRk5PDkCFD+MMf/sCePXtQSmEymVizZg2lpaWNUXxBEAQhCmgUgTIajUyaNIndu3fz5ZdfopRi6NChzJw5kzfeeIO4uDhSU1PJy8vjzTff9BibysrK4oMPPuCVV14BIDExkeeff56ePXsycOBAdDod6enpzJo1i+Li4sYoviAIghAF+Pzke2N/zdxqtVJWVoZOpyMlJcUjcgLHuFRpaSl6vZ6kpCSPdSaTicrKSoxGo886QRAE4czF39fMI/IeVDAYDAZatmwZcL2maaSmpvpdFx8fT3x8fGMVTRAEQYgiZAqcIAiCEJWIQAmCIAhRiQiUIAiCEJWIQAmCIAhRiQiUIAiCEJWIQAmCIAhRiQiUIAiCEJWIQAmCIAhRiQiUIAiCEJVE9EsSzg+8CoIgCOc23p+xC4WICJRSiqqqaipNFqqqqrHJjwgKgiCck+h1OmJjY0iINxIbGxOWUIUtUEopSssqqKj0/5PtgiAIwrmDzW7HZLZgMltITIgjJTkxZJEKS6C8xSkxMZ6EeCMxhib/Bq0gCIIQBVRbrVSaLFRUmFzaEKpIhawkSikslipXAVq2SCY+zhhqcoIgCMJZQIzBQGqygdgYA8Wny6ioNGOMjcFojA1apEKexaeUotJkARyRk4iTIAiC4CQ+zkhiouPnkSpNlpAm0YUkUEoplFJUV1sBSIgXcRIEQRA8cWpDdbXVpRvBENZ7UM7ZejLmJAiCIHjj1IZQZ3aH9YhPEARBEBpCkz3iEwRBEITGRgRKEARBiEpEoARBEISoRARKEARBiEpEoARBEISoRARKEARBiEpEoARBEISoRARKEARBiEpEoARBEISoRARKEARBiErOKoE6ceIEeXl5WK3W5i6KIAiCECZnlUDNmjWLSZMmsXfvXtey8vJyTCZTM5ZKEARBCIWzSqDuu+8+3nrrLbp16waAzWbjxRdf5KOPPmrmkgmCIEQHhUWnG2XbxuCsEqj09HQGDhyIoeYT71VVVRQVFTVzqQRBEKKDd97/lCl/fqVe4dmz9wAPTc7lnfc/baKS+adZBer111/n6aef9ngEt2TJEu644w7y8/Ndy/bt28edd97J9u3bAZg+fTqLFy9m165d3HTTTQwfPpwTJ07w6aefcu+991JcXMzx48d57LHH+Prrr3n55ZcZPnw4zzzzDGaz4yfqTSYTM2fOJCsri4yMDJ544gmOHz/etA0gCILQhLRqmQpQp0jt2XuAV16bB0CPbl2arGz+aFaB6t69O8uXL6egoABwPJJbuXIly5YtY9euXa7t9u7dy6lTp2jXrh0Ap0+f5t133+Xpp5/mN7/5Dbm5ubRo0YLy8nLy8/Ox2+0kJiZy3XXX0a5dO6677jr+8Ic/MGbMGGJiYqioqODxxx9n5cqVvPPOO6xevRqj0cgjjzxCcXFxs7SFIAhCYzNy+BXk/Ko/4F+kPMUpg5HDr2jyMrrTrALVs2dPNE3j8OHDABQXF3P06FHuv/9+1q9f7/qBq40bNzJgwABatmzp2nfXrl1MmTKF3/zmN+Tk5GA0ev7sfHJyMoMHD6ZFixb06NGDK664gv79+6PX61m+fDm7d+/m73//O1lZWXTq1InJkyejlGLTpk1N1wCCIAhNzB3jrvcrUt7i9Lvfjm+2MjppVoFq27YtmZmZbNiwAYCDBw+i0+m44oor2LFjB6WlpZSVlbFz505ycnLQ6/WufYcOHUqPHj2CztNms7FmzRouueQSOnTo4FreokUL+vbtK4/5BEE46/EWqXU//Bh14gTNLFDx8fEMGDCAHTt2UFlZyU8//URmZib9+vWjurqagwcPcuLECU6ePEmvXr0ikqdz4kRubi46nQ5N09A0DZ1OR25urmuMShAE4WzGXaSckyGiSZwgCmbxDRo0iL1793LkyBG2bt3K4MGDadmyJT169GDnzp3s37+fpKQk1/hTuGiahl6vZ8qUKdjtdpRSHn+TJk2KSD6CIAjRjrtIRZs4ARiauwBdunTBaDSyYcMGCgoK6Nq1K3q9nr59+7JlyxZatmxJdnY2KSkpQaet0+kwGAzYbDbXsri4OC688EI+++wzTp48Sdu2bSNZHUEQhDOKO8Zdz4hhl5PWqkVzF8WHZo+gWrZsyYABA1i8eDGpqamuiRD9+vVj3bp1LF26lCFDhqBpWtBpJyYm0rNnT1atWkVZWRkWiwW73c7o0aPRNI3c3FyOHTsGQGlpKatXr8ZisUS0foIgCNFONIoTRIFA6fV6cnJy+Pjjj8nMzCQ+Ph5wRFZpaWlUVlbSs2fPkNKOjY1lzJgxrFq1ipSUFK699lqOHz9Oeno606dPp7y8nA4dOqBpGl27duXTTz+lvLw8ktUTqyJnswAAEspJREFUBEEQQkQr7NRduS9olb+n3p2UUthsNgpOOaYndmjXunFKFyGqqqooKysjISHBJYBOysvLsVgsftcJgiAI4XH0+CkA2rZugV6vD/g0rKiz76zsZh+DagpiY2NJS0vzuy4pKYmkpKQmLpEgCIJQH83+iE8QBEEQ/CECJQiCIEQlIlCCIAhCVCICJQiCIEQlIlCCIAhCVCICJQiCIEQlIlCCIAhCVCICJQiCIEQlIlCCIAhCVCICJQiCIEQlYQmUXufYvdpqjUhhBEEQhLMHpzY4tSJYQhYoTdMwGBw/wV5pkp+oEARBEDxxaoPBEPgjsXURVgQVHxcLQEWFCZNZREoQBEFwYDJbqKgwAbVaESwhCZSmaWiaRkyMwZVx8ekySsoq5HGfIAjCOUy11UpJWQXFp8sAhzjFxBhcuhEMIf/chqZp6HQ6EhPiUEphtlRTUWFyKaYgCIJwbhNnjCExIQ6dThfSI76wBQogKTGe2BgDZks1VqsNu1L17C0IgiCcjehq5ifEGWOIjY1Br9c3vUCBp0gZjRqxsTHY7XbA8au7giAIwrmDU4ScgqTT6UIWJ4jAL+o6C6FpGkop9Hq9iJMgCMI5ilOMnGNOoYoTgFbYqbuoiSAIghB1yJckBEEQhKhEBEoQBEGISkSgBEEQhKhEBEoQBEGISjQlU+4EQRCEKEQiKEEQBCEqEYESBEEQohIRKEEQBCEqEYESBEEQohIRKEEQBCEqCftbfO7IhEBBEAQBCOsbfE4iIlBKKaqrrVSaLFRVVWO12SKRrCAIgnCGYdDriY2NISHe6PqhwlAJ+z0opRRl5ZWUyw8VCoIgCG4kJcaTnJTQPD+34S1OiQlxJCTEYdDrXT+/4bCgabj53lbWy3pZL+tl/dmwvrraSqXZQkWFyaUNoYpUyBGUUoqqqmoKi0sBaNkimfg4YyhJCYIgCGcZJrOF4tNlAKS1TCE2NiZokQo5glJKUWmyAI7IKc4YS4haJwiCIJxlxBljSUyIo6LSTKXJEtJ4VEjTzJVSKOWYGAGQEB9Xm7GmAZr44osvvvjnuJ+QEAdAdbXVpRvBENYYlHO2nvNn3pUCDYcFxBdffPHFP4d9g14PEPLM7pDGoJRS2Gw2Ck6dBqB9elpImQuCIAhnN8dOFALQtnUL9DUT6BpKxN6DAg0QK1asWLFi3W3oROhTR1pNOcSKFStWrFh3GzqRiaBQaIqaZ5A1uim++OKLL/4574dDRCIozesfTl/TZL2sl/WyXtafy+vDIUJjUG7/rm9bWS/rZb2sl/Xn5PpgidzPbWhixYoVK1asHxsiERuDQuEpn+KLL7744osfBhGLoDQ0HP/XxBdffPHFF9/lh0pEXtRt27plWIUQBEEQzk4KThUDob2oK2NQYsWKFSu2cW2IRHAMSsNzOp8GqNrnkLJe1sv6Zlufn3+Ml155nXvvvp3MC3pFXflk/Vm+PkQiIlDgeNKotBoLNdbLl/Xn3Pry8nI+/OgLPv1iCTt27aZjh3ZcP2o4E+64hZYtWzR7+YJdb7fZqDCZSEpIQNPpmjR/q92OyWQiISEBvU4Lan+r1cq+A4eoslThul6jsH1l/dm3PhwiI1BK4RJSsWJr7IFDh8l98R+UlpYx/vab6dn9fErLyvnk8694YdorPPvU/5Gaktzs5QzG7vllP/+aOZs/P/skLVqkNmn+v3jlHcz+zrtbVfPv5m5HseeWDZUIRVCApuH5rQun721l/bmw3lxVxez57xEXZ+Tvf/4TrVunudYPGdQfS1U1CfFxUVv+QOvLKysxmczUvkbfdPmXV1ZiMpsD5NuA9P1er9HVvrL+LFwfBhF+D0rhKqHLx8uX9efC+p9372XdD5v5+5+fIS2tJUop13qdXkd8fBzKbX+b1ca3361i7rsL2bN3P9dceRn/O/EOMrp0BhT5+UeZv2Ah9959Oxs2bmbeuws5eaqIMTeM5J7xtxIfH+9K32wx8/FnX7Pgg48oK69kxLArmXTXrbROawlofPvdKvKPHGPU8Kt45fXZbN2+g//311w6d2zPuo2beX/RJ3y/Po/u52cw4Y5buPbXV6DXG/hqybe8++HH/LxnHw9P/hMGg4HHHvlfMi/oWW/5vdunylLNug15vL/4M75fv9GR1+23cO2wK9HrdT7t+9WS//Lu+x/x8y/7ePjxZxx5P3wfmX17U1VVxXcr17Lgw4/Zun0nFw0ZyH1330F2Zh80nQ6o+aE45UzN8e/9+w/yyutvcdN1Ixh66UWgFMeOn2T2/Pf45IslZJzXid+MvZ7rRv6a2NhYUIq331tM2zatybygF6//ez5Lvl3BBb178H8P3Uf/fhdEzfkn66Npfejop06dOjWUHZVSVFQ67uYSEuIdzxo1j1nw4p/D/ldL/ktZeQW33PQ/GI3GOre32ezMnv8+Xy9bziMPTOSh+++mrLScWXPeIedXA0hOTqa4qJh33v+ItevzsFptjBt7Az26d+WdBYs577xOdD8/AzQNs8nM3/7xGvsOHOSPjz/CneNuYsfPv/DZl0u4OOdXxBuNbNvxM9+uWM3KNesZPDCb60ddS4/zM/h2xWo++ewrfjP2Bh797SRiY2N56ZWZZF7Qh/M6dUDT6TGZzJSUlnLf3bdzUc4gzu/SmZiYmHrL713/Jd8s5+PPv+bWsdfzyG8nYYyN5aV/zSTzgt507tTBZ3u9pqOyJu97776di3MGkZFxHsaYGGa/8yHvfvAR94y/jUd/ew8VFZX87R+v0q5dOj27dQU0Sk6X8NU333HFZRfTvl06hYVFvDDtX2RlXsD1o4ej1+nYu/8Qj//xOfpl9uG5pyeTM2Qg737wESWlZQ7x0TTWfL+Bz/+zjDXfb2DYNZdz4+hrOXAonx82/sglFw8h3miMivNP/OjxK00OnUhMiEOn09HkvweFkmfaYj1tRWUl7dq2ITY2xnV+VFaaqKquAkDTdCQlJqDX69m9Zy9Lv13BC88+Sa8e3QC4+cbR7D94iDXrNnLzjaNRSmE2Wxg1/GquG/lrNE1jYP9+/LJ3P3mbtnDlZRcRExPDuh82sf/gIf723B9Jb9sGgIn/v71zD46qPOPwc/bsJrubBXLhohKSGsBwDyRyrYIyeEcU7QhWcKR4wRtWZ7y1tnam6h+dVmcKTqutw7TjoCgXEwSttHVq2xmnFJoCDVcdFUII5MJi9pI9OefrH2fPJru5SDY7bXTfZ2bz23d/Z7/vPWfP5t3vXFcu40c//Rl1dYeZO7sKUOzeU8u6nz/HZXNn4nDtoiu47qorE1+gW2++nsNHP+Gfe2uZM3MGYy8uobRkNEP3BZhRMZn8YfZ+oIOHj351/inL56v6cjaNONOXdem7skvfB+oOsXnrdn789GOJeVl953I8HjebtlRz6YypjBhe1DmCUopIOML6VzYwauRwViy7Bd3lIhaLsbV6B3NmVbJ65TJ0XWfUyBE8dO9drH9lA9dffWWinUCenxeefYrhwwsBCATyePyZ5zh+vJ5hk8r/p+uZ6NdD0yWD50FpoqJ96obXNzFrwWJmLVjMynvW8sXxetA0/r2/jvLxYykZMzoxfU5uDlMmlnO6qTmpnTHFF9n/1DWNnBwPBQX5RKLtmEphWhZ7avdTNX0qI+PFCU1j6JAA48eWcaa5OdFOZcVUpky6JCk/l8uF5nJhmib1Daf4+B97aWpu4UxTC+2GQXzCZO1n/o664icsmpZF/clTfLx7L03NzZxpbqE9Fut9OZIcHzh4hJIxo5kcLwxoGpqmMWdmFY2nz1Df0Nitne3v7aK+oZEH77kLn98HQGvwHPsOHOTyebPRdT0xfXHxRfh9PtpC4cT7Lxg1ksDQQCIeEsgjNzeHWF95i2a3pkmG9kFBYnujqCigu3TOBoPEYga58f0Xd61YxneXLeX4iZO8uO7V+A96RXNLK29t3c5bW7eTyv1334lpWYn9VcoZraf0p5QiZhgEg+d4c3M16369oVtbzz/7ZNJ+L6d/p51IJMqmLTW8ubmaaVMmMqtqOoUF+cn9Jv525nE++btcrqR8I+FID30VgKJbXkmakndzcwsXXXgBXm9u0nLxeNx43B6am1uSpv/obx/z3q4P8XjcNLW2UhQfBbW1hQie+5I7vvdQt3kYW1aK0dGR1H7Xz0HFF0ufeYtmt6ZJxo7i0yBpOCdxdsdTJ5dTveN9TjWeZkjgYjTsfZV+fLS2BnFpnYN3XXdz2y038sMn1uL1enttn5S4u28fgPHQmlU8fN+qpF9vqdOnYsQMfvmr12hrC7Fxw8sUFeSjgNNnmmg83dRj//3N34ljKX0Vxs8Hc/rqz/zrupuzwSAdRgfk5nZOoBSWshKHozvvf+fdP/DCT56i7tARXn9jCz94fC0+nxdd19HdOq+/tp6ZVRXn9Xn3xf97/ZN4cMXpkplNfErZhVIp4j+jJM7yuHz8OErHFLN527tEo9FuvrPeoGDalAn85+ARGhoa+24fuvvxF5WC3BwPkydcQt3BIzS3nO0jv65t2a+HwmGOHvuUJTdcQ2H+sB7bRyl0lxvTtDBN1f/843E40dfVFObnJ89fH98n3aVjmhaWaSX8aVMm8ulnX1B/8lTS9Ec/+Qyf18vI4cOT2n/0wXuYM3MGi69dxIn6Bv78l7+DUowoKqJ8XBl7a/djmmbv+acsj54+z8Gw/kk8iOIBkKF9UFr8J5WoqK1FRQWsWrmcD/70ES+t/w2fn6gnEmmn5exZdu/dR1solJh++tQpVEybxC/Wvcrnx0+gFERj7eytPUBbONSlXbr3R3J8xfx5aJrG+lc3cKapBTRoC4XZU7uPWKyj8y0kt6e73eT5/eze8y9isQ6isXZqduzi7W07ktovLRlNQ+NpDh09hmnZmxXPP39bdd1NXp6f3XtqiRkG0fYYNTs/iPfV+3ItLSm2+z7yCaayiMU6mD5tMpfOqODFda9w/EQDlrKo3V/Hb3+3kcXXLmL06AuSll9x8YVomotRo0awYvmt/H7j23z+RT0+v5elS65n2/b3eKfmfaKxdpSC+oZTHDx8rNfl/ZWvi4oOgIwUKIV9noWoaFedWVXByy89T/3JU1x943Kmz7uKuVfeSPWO91n7wGpKSopRSuH15fLI/av5VukYlt6+mgmV85m3cAmbttQQDJ5LtAd07ydl/SsqLOCZJ79POBThsqtupnz65Sy84Tb++OFfCUXCOPtOEo/4+wJ5fu64/VZqdu5i6uyFXHfzClqDQdbcvTKp/bKyUuZ/ezar7nuUSVUL2Fq987zzdzQQ8HPH8nhfsxZy3dIVtJ7t0lcvyzPR95pHmVS5gK01O/F6c3nkgdWMH1fGTctXMbFyAfc+/ARLbriGZd+5Ken7mbr85s6u4pJxZWx8exuRaJRLK6fx7NOP8cbmd6iYvYgJlfNZ88iTHDpyjA7T7HF5Oyr/B0R704GQkdttFBYMHVASwjcf+xBzA13XCeT56e1ciGh7O9FoOzkeD/74EWYD7dPrzcXbdf9MHxiGQSgcIc/vw+Px9DiNUoq2UBhlWQQCAVyuznnpT/7n01d/+nbmtz/t9YRlKdra2gDIi58KIAjp0tJ6DkjvdhuZKVD5doFSJI/oJJZYYoklzu645Wz6BSqj94NyNjtKLLHEEksscSJOk8xei88OREVFRUVFOzVNMjeC6lo6RUVFRUVFE5oeGToPyv6jFKDi2iUWX3zxxRc/S/0BkJkrSWhdREupnVrKZOKLL7744mednw6Z2QelVDwTp5p2NZ0n4osvvvjiZ6efHhm6Fp/Wy3OVEosvvvjii5+9fv/I2AhK00Ap0FCJtGyNx+KLL7744medPxAycpCEfd6VcxfFeKylxOKLL7744medPxAyuA8KQIHSbJVYYokllljiATCgSx21tH5Jh2nGb92dwVOqBEEQhK89pmnRFgrj1nUKC4b0+1JHaY+gNE3D43HTYZrEDAOfnhPfBmnXT03DjlNVfPHFF1/8rPANwwDsOzz3pzA5pD2CsiyLWMygNWhf9djv8+Jx63ZmdopdNPFG8cUXX3zxs8A3DINwpB2AgmEBcnI8uFyufhWqtAoUgGVZmKZJONJOKBwFICfHg8fjRnfJ5j5BEIRsxLQsDKODWMwePeX5vfh9uei6jquftSHtAuWMopwi5VRKQRAEQQDw+3KTilN/N/OlXaAguUgZRgcxw9YO00y3SUEQBOFrjFvX8Xjc5HhsTbc4wQALFHQWKcuyEs+d1wVBEITswSlCTkFyuVxpFyfIQIECuxg5DycWBEEQsg+nGGmalnik3VYmCpSDFCZBEAQBGFBhcsjQxWJtMpGQIAiCIEBG76grCIIgCJlDCpQgCIIwKJECJQiCIAxKpEAJgiAIgxIpUIIgCMKgRAqUIAiCMCiRAiUIgiAMSqRACYIgCIMSKVCCIAjCoOS/nOQ7sq1wapUAAAAASUVORK5CYII=)" + "- Create a new token (https://huggingface.co/settings/tokens) **with write role**\n", + "\n", + "\"Create" ] }, { @@ -719,7 +715,7 @@ { "cell_type": "markdown", "source": [ - "See you on [Bonus unit 2](https://github.com/huggingface/deep-rl-class/tree/main/unit2#unit-2-introduction-to-q-learning)! 🔥 TODO CHANGE LINK" + "See you on Bonus unit 2! 🔥 " ], "metadata": { "id": "Kc3udPT-RcXc" diff --git a/units/en/unit3/deep-q-algorithm.mdx b/units/en/unit3/deep-q-algorithm.mdx index 63b8780..a0b15cb 100644 --- a/units/en/unit3/deep-q-algorithm.mdx +++ b/units/en/unit3/deep-q-algorithm.mdx @@ -6,7 +6,7 @@ The difference is that, during the training phase, instead of updating the Q-val Q Loss -in Deep Q-Learning, we create a **loss function that compares our Q-value prediction and the Q-target and uses Gradient Descent to update the weights of our Deep Q-Network to approximate our Q-values better**. +in Deep Q-Learning, we create a **loss function that compares our Q-value prediction and the Q-target and uses gradient descent to update the weights of our Deep Q-Network to approximate our Q-values better**. Q-target @@ -35,7 +35,7 @@ Experience Replay in Deep Q-Learning has two functions: 1. **Make more efficient use of the experiences during the training**. Usually, in online reinforcement learning, the agent interacts in the environment, gets experiences (state, action, reward, and next state), learns from them (updates the neural network), and discards them. This is not efficient -Experience replay helps **using the experiences of the training more efficiently**. We use a replay buffer that saves experience samples **that we can reuse during the training.** +Experience replay helps **using the experiences of the training more efficiently**. We use a replay buffer that saves experience samples **that we can reuse during the training.** Experience Replay ⇒ This allows the agent to **learn from the same experiences multiple times**. @@ -59,9 +59,9 @@ But we **don’t have any idea of the real TD target**. We need to estimate it Q-target -However, the problem is that we are using the same parameters (weights) for estimating the TD target **and** the Q value. Consequently, there is a significant correlation between the TD target and the parameters we are changing. +However, the problem is that we are using the same parameters (weights) for estimating the TD target **and** the Q-value. Consequently, there is a significant correlation between the TD target and the parameters we are changing. -Therefore, it means that at every step of training, **our Q values shift but also the target value shifts.** We’re getting closer to our target, but the target is also moving. It’s like chasing a moving target! This can lead to a significant oscillation in training. +Therefore, it means that at every step of training, **our Q-values shift but also the target value shifts.** We’re getting closer to our target, but the target is also moving. It’s like chasing a moving target! This can lead to a significant oscillation in training. It’s like if you were a cowboy (the Q estimation) and you want to catch the cow (the Q-target). Your goal is to get closer (reduce the error). @@ -88,16 +88,18 @@ Double DQNs, or Double Learning, were introduced [by Hado van Hasselt](https:// To understand this problem, remember how we calculate the TD Target: +TD target + We face a simple problem by calculating the TD target: how are we sure that **the best action for the next state is the action with the highest Q-value?** -We know that the accuracy of Q values depends on what action we tried **and** what neighboring states we explored. +We know that the accuracy of Q-values depends on what action we tried **and** what neighboring states we explored. -Consequently, we don’t have enough information about the best action to take at the beginning of the training. Therefore, taking the maximum Q value (which is noisy) as the best action to take can lead to false positives. If non-optimal actions are regularly **given a higher Q value than the optimal best action, the learning will be complicated.** +Consequently, we don’t have enough information about the best action to take at the beginning of the training. Therefore, taking the maximum Q-value (which is noisy) as the best action to take can lead to false positives. If non-optimal actions are regularly **given a higher Q value than the optimal best action, the learning will be complicated.** -The solution is: when we compute the Q target, we use two networks to decouple the action selection from the target Q value generation. We: -- Use our **DQN network** to select the best action to take for the next state (the action with the highest Q value). -- Use our **Target network** to calculate the target Q value of taking that action at the next state. +The solution is: when we compute the Q target, we use two networks to decouple the action selection from the target Q-value generation. We: +- Use our **DQN network** to select the best action to take for the next state (the action with the highest Q-value). +- Use our **Target network** to calculate the target Q-value of taking that action at the next state. -Therefore, Double DQN helps us reduce the overestimation of q values and, as a consequence, helps us train faster and have more stable learning. +Therefore, Double DQN helps us reduce the overestimation of Q-values and, as a consequence, helps us train faster and have more stable learning. Since these three improvements in Deep Q-Learning, many have been added such as Prioritized Experience Replay, Dueling Deep Q-Learning. They’re out of the scope of this course but if you’re interested, check the links we put in the reading list. diff --git a/units/en/unit3/deep-q-network.mdx b/units/en/unit3/deep-q-network.mdx index cb3d616..75c66d3 100644 --- a/units/en/unit3/deep-q-network.mdx +++ b/units/en/unit3/deep-q-network.mdx @@ -9,7 +9,7 @@ When the Neural Network is initialized, **the Q-value estimation is terrible**. ## Preprocessing the input and temporal limitation [[preprocessing]] -We mentioned that we preprocess the input. It’s an essential step since we want to **reduce the complexity of our state to reduce the computation time needed for training**. +We need to **preprocess the input**. It’s an essential step since we want to **reduce the complexity of our state to reduce the computation time needed for training**. To achieve this, we **reduce the state space to 84x84 and grayscale it**. We can do this since the colors in Atari environments don't add important information. This is an essential saving since we **reduce our three color channels (RGB) to 1**. @@ -32,6 +32,8 @@ That’s why, to capture temporal information, we stack four frames together. Then, the stacked frames are processed by three convolutional layers. These layers **allow us to capture and exploit spatial relationships in images**. But also, because frames are stacked together, **you can exploit some spatial properties across those frames**. +If you don't know what are convolutional layers, don't worry. You can check the [Lesson 4 of this free Deep Reinforcement Learning Course by Udacity](https://www.udacity.com/course/deep-learning-pytorch--ud188) + Finally, we have a couple of fully connected layers that output a Q-value for each possible action at that state. Deep Q Network diff --git a/units/en/unit3/from-q-to-dqn.mdx b/units/en/unit3/from-q-to-dqn.mdx index 13df4d1..d4f77a8 100644 --- a/units/en/unit3/from-q-to-dqn.mdx +++ b/units/en/unit3/from-q-to-dqn.mdx @@ -4,7 +4,6 @@ We learned that **Q-Learning is an algorithm we use to train our Q-Function**,
Q-function -
Given a state and action, our Q Function outputs a state-action value (also called Q-value)
The **Q comes from "the Quality" of that action at that state.** @@ -19,7 +18,7 @@ Q-Learning worked well with small state space environments like: But think of what we're going to do today: we will train an agent to learn to play Space Invaders a more complex game, using the frames as input. -As **[Nikita Melkozerov mentioned](https://twitter.com/meln1k), Atari environments** have an observation space with a shape of (210, 160, 3)*, containing values ranging from 0 to 255 so that gives us 256^(210x160x3) = 256^100800 (for comparison, we have approximately 10^80 atoms in the observable universe). +As **[Nikita Melkozerov mentioned](https://twitter.com/meln1k), Atari environments** have an observation space with a shape of (210, 160, 3)*, containing values ranging from 0 to 255 so that gives us \\(256^{210x160x3} = 256^{100800}\\) (for comparison, we have approximately \\(10^{80}\\) atoms in the observable universe). * A single frame in Atari is composed of an image of 210x160 pixels. Given the images are in color (RGB), there are 3 channels. This is why the shape is (210, 160, 3). For each pixel, the value can go from 0 to 255. diff --git a/units/en/unit3/hands-on.mdx b/units/en/unit3/hands-on.mdx index 4b73137..26c7ad0 100644 --- a/units/en/unit3/hands-on.mdx +++ b/units/en/unit3/hands-on.mdx @@ -1,5 +1,14 @@ # Hands-on [[hands-on]] + + + + + Now that you've studied the theory behind Deep Q-Learning, **you’re ready to train your Deep Q-Learning agent to play Atari Games**. We'll start with Space Invaders, but you'll be able to use any Atari game you want 🔥 Environments @@ -8,6 +17,304 @@ Now that you've studied the theory behind Deep Q-Learning, **you’re ready to t We're using the [RL-Baselines-3 Zoo integration](https://github.com/DLR-RM/rl-baselines3-zoo), a vanilla version of Deep Q-Learning with no extensions such as Double-DQN, Dueling-DQN, and Prioritized Experience Replay. +To validate this hands-on for the certification process, you need to push your trained model to the Hub and **get a result of >= 500**. + +To find your result, go to the leaderboard and find your model, **the result = mean_reward - std of reward** + +For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process + **To start the hands-on click on Open In Colab button** 👇 : -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)]() +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/unit3/unit3.ipynb) + + +# Unit 3: Deep Q-Learning with Atari Games 👾 using RL Baselines3 Zoo + +Unit 3 Thumbnail + +In this notebook, **you'll train a Deep Q-Learning agent** playing Space Invaders using [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), a training framework based on [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) that provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos. + +We're using the [RL-Baselines-3 Zoo integration, a vanilla version of Deep Q-Learning](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html) with no extensions such as Double-DQN, Dueling-DQN, and Prioritized Experience Replay. + +⬇️ Here is an example of what **you will achieve** ⬇️ + +```python +%%html + +``` + +### 🎮 Environments: + +- SpacesInvadersNoFrameskip-v4 + +### 📚 RL-Library: + +- [RL-Baselines3-Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) + +## Objectives 🏆 + +At the end of the notebook, you will: + +- Be able to understand deeper **how RL Baselines3 Zoo works**. +- Be able to **push your trained agent and the code to the Hub** with a nice video replay and an evaluation score 🔥. + + +## Prerequisites 🏗️ +Before diving into the notebook, you need to: + +🔲 📚 **[Study Deep Q-Learning by reading Unit 3](https://huggingface.co/deep-rl-course/unit3/introduction)** 🤗 + +We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the Github Repo](https://github.com/huggingface/deep-rl-class/issues). + +# Let's train a Deep Q-Learning agent playing Atari' Space Invaders 👾 and upload it to the Hub. + +To validate this hands-on for the certification process, you need to push your trained model to the Hub and **get a result of >= 500**. + +To find your result, go to the leaderboard and find your model, **the result = mean_reward - std of reward** + +For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process + +## Set the GPU 💪 +- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type` + +GPU Step 1 + +- `Hardware Accelerator > GPU` + +GPU Step 2 + +## Create a virtual display 🔽 + +During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames). + +Hence the following cell will install the librairies and create and run a virtual screen 🖥 + +```bash +apt install python-opengl +apt install ffmpeg +apt install xvfb +pip3 install pyvirtualdisplay +``` + +```bash +apt-get install swig cmake freeglut3-dev +``` + +```bash +pip install pyglet==1.5.1 +``` + +```python +# Virtual display +from pyvirtualdisplay import Display + +virtual_display = Display(visible=0, size=(1400, 900)) +virtual_display.start() +``` + +## Clone RL-Baselines3 Zoo Repo 📚 +You can now directly install from python package `pip install rl_zoo3` but since we want **the full installation with extra environments and dependencies** we're going to clone `RL-Baselines3-Zoo` repository and install from source. + +```bash +git clone https://github.com/DLR-RM/rl-baselines3-zoo +``` + +## Install dependencies 🔽 +We can now install the dependencies RL-Baselines3 Zoo needs (this can take 5min ⏲) + +But we'll also install: +- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub. + +```bash +cd /content/rl-baselines3-zoo/ +``` + +```bash +pip install -r requirements.txt +``` + +```bash +pip install huggingface_sb3 +``` + +## Train our Deep Q-Learning Agent to Play Space Invaders 👾 + +To train an agent with RL-Baselines3-Zoo, we just need to do two things: +1. We define the hyperparameters in `rl-baselines3-zoo/hyperparams/dqn.yml` + +DQN Hyperparameters + + +Here we see that: +- We use the `Atari Wrapper` that preprocess the input (Frame reduction ,grayscale, stack 4 frames) +- We use `CnnPolicy`, since we use Convolutional layers to process the frames +- We train it for 10 million `n_timesteps` +- Memory (Experience Replay) size is 100000, aka the amount of experience steps you saved to train again your agent with. + +💡 My advice is to **reduce the training timesteps to 1M,** which will take about 90 minutes on a P100. `!nvidia-smi` will tell you what GPU you're using. At 10 million steps, this will take about 9 hours, which could likely result in Colab timing out. I recommend running this on your local computer (or somewhere else). Just click on: `File>Download`. + +In terms of hyperparameters optimization, my advice is to focus on these 3 hyperparameters: +- `learning_rate` +- `buffer_size (Experience Memory size)` +- `batch_size` + +As a good practice, you need to **check the documentation to understand what each hyperparameters does**: https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html#parameters + + + +2. We run `train.py` and save the models on `logs` folder 📁 + +```bash +python train.py --algo ________ --env SpaceInvadersNoFrameskip-v4 -f _________ +``` + +#### Solution + +```bash +python train.py --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/ +``` + +## Let's evaluate our agent 👀 +- RL-Baselines3-Zoo provides `enjoy.py` to evaluate our agent. +- Let's evaluate it for 5000 timesteps 🔥 + +```bash +python enjoy.py --algo dqn --env SpaceInvadersNoFrameskip-v4 --no-render --n-timesteps _________ --folder logs/ +``` + +#### Solution + +```bash +python enjoy.py --algo dqn --env SpaceInvadersNoFrameskip-v4 --no-render --n-timesteps 5000 --folder logs/ +``` + +## Publish our trained model on the Hub 🚀 +Now that we saw we got good results after the training, we can publish our trained model on the hub 🤗 with one line of code. + +Space Invaders model + +By using `rl_zoo3.push_to_hub.py` **you evaluate, record a replay, generate a model card of your agent and push it to the hub**. + +This way: +- You can **showcase our work** 🔥 +- You can **visualize your agent playing** 👀 +- You can **share with the community an agent that others can use** 💾 +- You can **access a leaderboard 🏆 to see how well your agent is performing compared to your classmates** 👉 https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard + +To be able to share your model with the community there are three more steps to follow: + +1️⃣ (If it's not already done) create an account to HF ➡ https://huggingface.co/join + +2️⃣ Sign in and then, you need to store your authentication token from the Hugging Face website. +- Create a new token (https://huggingface.co/settings/tokens) **with write role** + +Create HF Token + +- Copy the token +- Run the cell below and past the token + +```python +from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub. +notebook_login() +git config --global credential.helper store +``` + +If you don't want to use a Google Colab or a Jupyter Notebook, you need to use this command instead: `huggingface-cli login` + +3️⃣ We're now ready to push our trained agent to the 🤗 Hub 🔥 + +Let's run push_to_hub.py file to upload our trained agent to the Hub. + +`--repo-name `: The name of the repo + +`-orga`: Your Hugging Face username + +Select Id + +```bash +python -m rl_zoo3.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 --repo-name _____________________ -orga _____________________ -f logs/ +``` + +#### Solution + +```bash +python -m rl_zoo3.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 --repo-name dqn-SpaceInvadersNoFrameskip-v4 -orga ThomasSimonini -f logs/ +``` + +Congrats 🥳 you've just trained and uploaded your first Deep Q-Learning agent using RL-Baselines-3 Zoo. The script above should have displayed a link to a model repository such as https://huggingface.co/ThomasSimonini/dqn-SpaceInvadersNoFrameskip-v4. When you go to this link, you can: + +- See a **video preview of your agent** at the right. +- Click "Files and versions" to see all the files in the repository. +- Click "Use in stable-baselines3" to get a code snippet that shows how to load the model. +- A model card (`README.md` file) which gives a description of the model and the hyperparameters you used. + +Under the hood, the Hub uses git-based repositories (don't worry if you don't know what git is), which means you can update the model with new versions as you experiment and improve your agent. + +**Compare the results of your agents with your classmates** using the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) 🏆 + +## Load a powerful trained model 🔥 + +- The Stable-Baselines3 team uploaded **more than 150 trained Deep Reinforcement Learning agents on the Hub**. + +You can find them here: 👉 https://huggingface.co/sb3 + +Some examples: +- Asteroids: https://huggingface.co/sb3/dqn-AsteroidsNoFrameskip-v4 +- Beam Rider: https://huggingface.co/sb3/dqn-BeamRiderNoFrameskip-v4 +- Breakout: https://huggingface.co/sb3/dqn-BreakoutNoFrameskip-v4 +- Road Runner: https://huggingface.co/sb3/dqn-RoadRunnerNoFrameskip-v4 + +Let's load an agent playing Beam Rider: https://huggingface.co/sb3/dqn-BeamRiderNoFrameskip-v4 + +```python + +``` + +1. We download the model using `rl_zoo3.load_from_hub`, and place it in a new folder that we can call `rl_trained` + +```bash +# Download model and save it into the logs/ folder +python -m rl_zoo3.load_from_hub --algo dqn --env BeamRiderNoFrameskip-v4 -orga sb3 -f rl_trained/ +``` + +2. Let's evaluate if for 5000 timesteps + +```bash +python enjoy.py --algo dqn --env BeamRiderNoFrameskip-v4 -n 5000 -f rl_trained/ +``` + +Why not trying to train your own **Deep Q-Learning Agent playing BeamRiderNoFrameskip-v4? 🏆.** + +If you want to try, check https://huggingface.co/sb3/dqn-BeamRiderNoFrameskip-v4#hyperparameters **in the model card, you have the hyperparameters of the trained agent.** + +But finding hyperparameters can be a daunting task. Fortunately, we'll see in the next Unit, how we can **use Optuna for optimizing the Hyperparameters 🔥.** + + +## Some additional challenges 🏆 + +The best way to learn **is to try things by your own**! + +In the [Leaderboard](https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top? + +Here's a list of environments you can try to train your agent with: +- BeamRiderNoFrameskip-v4 +- BreakoutNoFrameskip-v4 +- EnduroNoFrameskip-v4 +- PongNoFrameskip-v4 + +Also, **if you want to learn to implement Deep Q-Learning by yourself**, you definitely should look at CleanRL implementation: https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/dqn_atari.py + +Environments + +________________________________________________________________________ +Congrats on finishing this chapter! + +If you’re still feel confused with all these elements...it's totally normal! **This was the same for me and for all people who studied RL.** + +Take time to really **grasp the material before continuing and try the additional challenges**. It’s important to master these elements and having a solid foundations. + +In the next unit, **we’re going to learn about [Optuna](https://optuna.org/)**. One of the most critical task in Deep Reinforcement Learning is to find a good set of training hyperparameters. And Optuna is a library that helps you to automate the search. + +See you on Bonus unit 2! 🔥 + +### Keep Learning, Stay Awesome 🤗 diff --git a/units/en/unit3/introduction.mdx b/units/en/unit3/introduction.mdx index 80118f2..de75540 100644 --- a/units/en/unit3/introduction.mdx +++ b/units/en/unit3/introduction.mdx @@ -6,7 +6,7 @@ In the last unit, we learned our first reinforcement learning algorithm: Q-Learning, **implemented it from scratch**, and trained it in two environments, FrozenLake-v1 ☃️ and Taxi-v3 🚕. -We got excellent results with this simple algorithm, but these environments were relatively simple because the **state space was discrete and small** (14 different states for FrozenLake-v1 and 500 for Taxi-v3). +We got excellent results with this simple algorithm, but these environments were relatively simple because the **state space was discrete and small** (14 different states for FrozenLake-v1 and 500 for Taxi-v3). For comparison, the state space in Atari games can **contain \\(10^{9}\\) to \\(10^{11}\\) states**. But as we'll see, producing and updating a **Q-table can become ineffective in large state space environments.** diff --git a/units/en/unit3/quiz.mdx b/units/en/unit3/quiz.mdx index d042756..841ee4d 100644 --- a/units/en/unit3/quiz.mdx +++ b/units/en/unit3/quiz.mdx @@ -76,7 +76,7 @@ For instance, in pong, our agent **will be unable to know the ball direction if **1. Make more efficient use of the experiences during the training** -Usually, in online reinforcement learning, we interact in the environment, get experiences (state, action, reward, and next state), learn from them (update the neural network) and discard them. +Usually, in online reinforcement learning, the agent interacts in the environment, gets experiences (state, action, reward, and next state), learns from them (updates the neural network), and discards them. This is not efficient But with experience replay, **we create a replay buffer that saves experience samples that we can reuse during the training**. **2. Avoid forgetting previous experiences and reduce the correlation between experiences** diff --git a/units/en/unitbonus2/optuna.mdx b/units/en/unitbonus2/optuna.mdx index d01d8cc..ec378de 100644 --- a/units/en/unitbonus2/optuna.mdx +++ b/units/en/unitbonus2/optuna.mdx @@ -4,9 +4,12 @@ The content below comes from [Antonin's Raffin ICRA 2022 presentations](https:// ## The theory behind Hyperparameter tuning + ## Optuna Tutorial + -The notebook 👉 https://colab.research.google.com/github/araffin/tools-for-robotic-rl-icra2022/blob/main/notebooks/optuna_lab.ipynb + +The notebook 👉 [here](https://colab.research.google.com/github/araffin/tools-for-robotic-rl-icra2022/blob/main/notebooks/optuna_lab.ipynb) From e442f0832be7d5eb83f1540bcf112f1323bc796e Mon Sep 17 00:00:00 2001 From: simoninithomas Date: Fri, 16 Dec 2022 10:02:52 +0100 Subject: [PATCH 05/12] Update Unit 3 --- notebooks/unit3/unit3.ipynb | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/notebooks/unit3/unit3.ipynb b/notebooks/unit3/unit3.ipynb index 7df2003..e0e5281 100644 --- a/notebooks/unit3/unit3.ipynb +++ b/notebooks/unit3/unit3.ipynb @@ -1,15 +1,5 @@ { "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "view-in-github", - "colab_type": "text" - }, - "source": [ - "\"Open" - ] - }, { "cell_type": "markdown", "metadata": { @@ -744,8 +734,7 @@ "metadata": { "colab": { "private_outputs": true, - "provenance": [], - "include_colab_link": true + "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", From 1abf623feb5d1f57f4244d2bebdf17386b50d9c7 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Fri, 16 Dec 2022 10:05:42 +0100 Subject: [PATCH 06/12] Update notebook --- notebooks/unit3/unit3.ipynb | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/notebooks/unit3/unit3.ipynb b/notebooks/unit3/unit3.ipynb index e0e5281..7df2003 100644 --- a/notebooks/unit3/unit3.ipynb +++ b/notebooks/unit3/unit3.ipynb @@ -1,5 +1,15 @@ { "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, { "cell_type": "markdown", "metadata": { @@ -734,7 +744,8 @@ "metadata": { "colab": { "private_outputs": true, - "provenance": [] + "provenance": [], + "include_colab_link": true }, "kernelspec": { "display_name": "Python 3 (ipykernel)", From d500baac63d16d4df1b6885907535ee6920a00a0 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Mon, 19 Dec 2022 12:18:26 +0100 Subject: [PATCH 07/12] Apply suggestions from code review Co-authored-by: Omar Sanseviero --- units/en/unit3/hands-on.mdx | 37 ++++++++++++++++++------------------- 1 file changed, 18 insertions(+), 19 deletions(-) diff --git a/units/en/unit3/hands-on.mdx b/units/en/unit3/hands-on.mdx index 26c7ad0..90108ce 100644 --- a/units/en/unit3/hands-on.mdx +++ b/units/en/unit3/hands-on.mdx @@ -4,7 +4,7 @@ @@ -14,7 +14,7 @@ Now that you've studied the theory behind Deep Q-Learning, **you’re ready to t Environments -We're using the [RL-Baselines-3 Zoo integration](https://github.com/DLR-RM/rl-baselines3-zoo), a vanilla version of Deep Q-Learning with no extensions such as Double-DQN, Dueling-DQN, and Prioritized Experience Replay. +We're using the [RL-Baselines-3 Zoo integration](https://github.com/DLR-RM/rl-baselines3-zoo), a vanilla version of Deep Q-Learning with no extensions such as Double-DQN, Dueling-DQN, or Prioritized Experience Replay. To validate this hands-on for the certification process, you need to push your trained model to the Hub and **get a result of >= 500**. @@ -113,7 +113,7 @@ virtual_display.start() ``` ## Clone RL-Baselines3 Zoo Repo 📚 -You can now directly install from python package `pip install rl_zoo3` but since we want **the full installation with extra environments and dependencies** we're going to clone `RL-Baselines3-Zoo` repository and install from source. +You could directly install from the Python package (`pip install rl_zoo3`), but since we want **the full installation with extra environments and dependencies**, we're going to clone the `RL-Baselines3-Zoo` repository and install from source. ```bash git clone https://github.com/DLR-RM/rl-baselines3-zoo @@ -146,10 +146,10 @@ To train an agent with RL-Baselines3-Zoo, we just need to do two things: Here we see that: -- We use the `Atari Wrapper` that preprocess the input (Frame reduction ,grayscale, stack 4 frames) -- We use `CnnPolicy`, since we use Convolutional layers to process the frames -- We train it for 10 million `n_timesteps` -- Memory (Experience Replay) size is 100000, aka the amount of experience steps you saved to train again your agent with. +- We use the `Atari Wrapper` that does the pre-processing (Frame reduction, grayscale, stack four frames frames), +- We use `CnnPolicy`, since we use Convolutional layers to process the frames. +- We train the model for 10 million `n_timesteps`. +- Memory (Experience Replay) size is 100000, i.e. the number of experience steps you saved to train again your agent with. 💡 My advice is to **reduce the training timesteps to 1M,** which will take about 90 minutes on a P100. `!nvidia-smi` will tell you what GPU you're using. At 10 million steps, this will take about 9 hours, which could likely result in Colab timing out. I recommend running this on your local computer (or somewhere else). Just click on: `File>Download`. @@ -189,11 +189,11 @@ python enjoy.py --algo dqn --env SpaceInvadersNoFrameskip-v4 --no-render --n ``` ## Publish our trained model on the Hub 🚀 -Now that we saw we got good results after the training, we can publish our trained model on the hub 🤗 with one line of code. +Now that we saw we got good results after the training, we can publish our trained model on the Hub with one line of code. Space Invaders model -By using `rl_zoo3.push_to_hub.py` **you evaluate, record a replay, generate a model card of your agent and push it to the hub**. +By using `rl_zoo3.push_to_hub.py`, **you evaluate, record a replay, generate a model card of your agent, and push it to the Hub**. This way: - You can **showcase our work** 🔥 @@ -201,9 +201,9 @@ This way: - You can **share with the community an agent that others can use** 💾 - You can **access a leaderboard 🏆 to see how well your agent is performing compared to your classmates** 👉 https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard -To be able to share your model with the community there are three more steps to follow: +To be able to share your model with the community, there are three more steps to follow: -1️⃣ (If it's not already done) create an account to HF ➡ https://huggingface.co/join +1️⃣ (If it's not already done) create an account in HF ➡ https://huggingface.co/join 2️⃣ Sign in and then, you need to store your authentication token from the Hugging Face website. - Create a new token (https://huggingface.co/settings/tokens) **with write role** @@ -221,13 +221,12 @@ git config --global credential.helper store If you don't want to use a Google Colab or a Jupyter Notebook, you need to use this command instead: `huggingface-cli login` -3️⃣ We're now ready to push our trained agent to the 🤗 Hub 🔥 +3️⃣ We're now ready to push our trained agent to the Hub 🔥 -Let's run push_to_hub.py file to upload our trained agent to the Hub. +Let's run `push_to_hub.py` file to upload our trained agent to the Hub. There are two important parameters: -`--repo-name `: The name of the repo - -`-orga`: Your Hugging Face username +* `--repo-name `: The name of the repo +* `-orga`: Your Hugging Face username Select Id @@ -254,7 +253,7 @@ Under the hood, the Hub uses git-based repositories (don't worry if you don't kn ## Load a powerful trained model 🔥 -- The Stable-Baselines3 team uploaded **more than 150 trained Deep Reinforcement Learning agents on the Hub**. +The Stable-Baselines3 team uploaded **more than 150 trained Deep Reinforcement Learning agents on the Hub**. You can download them and use them to see how they perform! You can find them here: 👉 https://huggingface.co/sb3 @@ -285,9 +284,9 @@ python enjoy.py --algo dqn --env BeamRiderNoFrameskip-v4 -n 5000 -f rl_trained/ Why not trying to train your own **Deep Q-Learning Agent playing BeamRiderNoFrameskip-v4? 🏆.** -If you want to try, check https://huggingface.co/sb3/dqn-BeamRiderNoFrameskip-v4#hyperparameters **in the model card, you have the hyperparameters of the trained agent.** +If you want to try, check https://huggingface.co/sb3/dqn-BeamRiderNoFrameskip-v4#hyperparameters. There, **in the model card, you have the hyperparameters of the trained agent.** -But finding hyperparameters can be a daunting task. Fortunately, we'll see in the next Unit, how we can **use Optuna for optimizing the Hyperparameters 🔥.** +But finding hyperparameters can be a daunting task. Fortunately, we'll see in the next bonus Unit, how we can **use Optuna for optimizing the Hyperparameters 🔥.** ## Some additional challenges 🏆 From 33c02be8003a8b7d2961f8343af0809549cf9837 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Mon, 19 Dec 2022 12:44:13 +0100 Subject: [PATCH 08/12] Update hands-on.mdx --- units/en/unit3/hands-on.mdx | 13 +++---------- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/units/en/unit3/hands-on.mdx b/units/en/unit3/hands-on.mdx index 90108ce..fffa6a6 100644 --- a/units/en/unit3/hands-on.mdx +++ b/units/en/unit3/hands-on.mdx @@ -122,9 +122,6 @@ git clone https://github.com/DLR-RM/rl-baselines3-zoo ## Install dependencies 🔽 We can now install the dependencies RL-Baselines3 Zoo needs (this can take 5min ⏲) -But we'll also install: -- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub. - ```bash cd /content/rl-baselines3-zoo/ ``` @@ -133,10 +130,6 @@ cd /content/rl-baselines3-zoo/ pip install -r requirements.txt ``` -```bash -pip install huggingface_sb3 -``` - ## Train our Deep Q-Learning Agent to Play Space Invaders 👾 To train an agent with RL-Baselines3-Zoo, we just need to do two things: @@ -175,7 +168,7 @@ python train.py --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/ ``` ## Let's evaluate our agent 👀 -- RL-Baselines3-Zoo provides `enjoy.py` to evaluate our agent. +- RL-Baselines3-Zoo provides `enjoy.py`, a python script to evaluate our agent. In most RL libraries, we call the evaluation script `enjoy.py`. - Let's evaluate it for 5000 timesteps 🔥 ```bash @@ -199,7 +192,7 @@ This way: - You can **showcase our work** 🔥 - You can **visualize your agent playing** 👀 - You can **share with the community an agent that others can use** 💾 -- You can **access a leaderboard 🏆 to see how well your agent is performing compared to your classmates** 👉 https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard +- You can **access a leaderboard 🏆 to see how well your agent is performing compared to your classmates** 👉 https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard To be able to share your model with the community, there are three more steps to follow: @@ -293,7 +286,7 @@ But finding hyperparameters can be a daunting task. Fortunately, we'll see in th The best way to learn **is to try things by your own**! -In the [Leaderboard](https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top? +In the [Leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top? Here's a list of environments you can try to train your agent with: - BeamRiderNoFrameskip-v4 From bcbc168d6ba914583c81a627ff50c24d44dba36b Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Mon, 19 Dec 2022 12:44:34 +0100 Subject: [PATCH 09/12] Update colab --- notebooks/unit3/unit3.ipynb | 23 ++++------------------- 1 file changed, 4 insertions(+), 19 deletions(-) diff --git a/notebooks/unit3/unit3.ipynb b/notebooks/unit3/unit3.ipynb index 7df2003..3ae8b17 100644 --- a/notebooks/unit3/unit3.ipynb +++ b/notebooks/unit3/unit3.ipynb @@ -251,10 +251,7 @@ }, "source": [ "## Install dependencies 🔽\n", - "We can now install the dependencies RL-Baselines3 Zoo needs (this can take 5min ⏲)\n", - "\n", - "But we'll also install:\n", - "- `huggingface_sb3`: Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub." + "We can now install the dependencies RL-Baselines3 Zoo needs (this can take 5min ⏲)" ] }, { @@ -279,18 +276,6 @@ "!pip install -r requirements.txt" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "RLRGKFR39l9s" - }, - "outputs": [], - "source": [ - "%%capture\n", - "!pip install huggingface_sb3" - ] - }, { "cell_type": "markdown", "metadata": { @@ -382,7 +367,7 @@ }, "source": [ "## Let's evaluate our agent 👀\n", - "- RL-Baselines3-Zoo provides `enjoy.py` to evaluate our agent.\n", + "- RL-Baselines3-Zoo provides `enjoy.py`, a python script to evaluate our agent. In most RL libraries, we call the evaluation script `enjoy.py`.\n", "- Let's evaluate it for 5000 timesteps 🔥" ] }, @@ -441,7 +426,7 @@ "- You can **showcase our work** 🔥\n", "- You can **visualize your agent playing** 👀\n", "- You can **share with the community an agent that others can use** 💾\n", - "- You can **access a leaderboard 🏆 to see how well your agent is performing compared to your classmates** 👉 https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard" + "- You can **access a leaderboard 🏆 to see how well your agent is performing compared to your classmates** 👉 https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard" ] }, { @@ -676,7 +661,7 @@ "## Some additional challenges 🏆\n", "The best way to learn **is to try things by your own**!\n", "\n", - "In the [Leaderboard](https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top?\n", + "In the [Leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top?\n", "\n", "Here's a list of environments you can try to train your agent with:\n", "- BeamRiderNoFrameskip-v4\n", From 96b49481da920a4b72c93153bd7c386fe765b513 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Mon, 19 Dec 2022 13:53:38 +0100 Subject: [PATCH 10/12] Update hands-on.mdx --- units/en/unitbonus2/hands-on.mdx | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/units/en/unitbonus2/hands-on.mdx b/units/en/unitbonus2/hands-on.mdx index ac942a5..4130e38 100644 --- a/units/en/unitbonus2/hands-on.mdx +++ b/units/en/unitbonus2/hands-on.mdx @@ -1,3 +1,11 @@ # Hands-on [[hands-on]] -Now that you've learned to use Optuna, **why not going back to our Deep Q-Learning hands-on and implement Optuna to find the best training hyperparameters?** +Now that you've learned to use Optuna, we give you some ideas to apply what you've learned: + +1️⃣ **Beat your LunarLander-v2 agent results**, by using Optuna to find a better set of hyperparameters. You can also try with another environment, such as MountainCar-v0 and CartPole-v1. + +2️⃣ **Beat your SpaceInvaders agent results**. + +By doing that, you're going to see how Optuna is valuable and powerful in training better agents, + +Have fun, From 5aa9e72390092cba5697204cb50394321a25e1a9 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Mon, 19 Dec 2022 14:00:37 +0100 Subject: [PATCH 11/12] Add advice about saving the colab --- notebooks/unit3/unit3.ipynb | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/notebooks/unit3/unit3.ipynb b/notebooks/unit3/unit3.ipynb index 3ae8b17..c72e83f 100644 --- a/notebooks/unit3/unit3.ipynb +++ b/notebooks/unit3/unit3.ipynb @@ -134,6 +134,22 @@ "For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process" ] }, + { + "cell_type": "markdown", + "source": [ + "## An advice 💡\n", + "It's better to run this colab in a copy on your Google Drive, so that **if it timeouts** you still have the saved notebook on your Google Drive and do not need to fill everything from scratch.\n", + "\n", + "To do that you can either do `Ctrl + S` or `File > Save a copy in Google Drive.`\n", + "\n", + "Also, we're going to **train it for 90 minutes with 1M timesteps**. By typing `!nvidia-smi` will tell you what GPU you're using.\n", + "\n", + "And if you want to train more such 10 million steps, this will take about 9 hours, potentially resulting in Colab timing out. In that case, I recommend running this on your local computer (or somewhere else). Just click on: `File>Download`. " + ], + "metadata": { + "id": "Nc8BnyVEc3Ys" + } + }, { "cell_type": "markdown", "source": [ From fd710896cf229b1e2652b720ceae6ef17dff9abc Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Mon, 19 Dec 2022 15:05:38 +0100 Subject: [PATCH 12/12] Update hands-on.mdx - Add cleanrl link - Some cleanups --- units/en/unit3/hands-on.mdx | 2 ++ 1 file changed, 2 insertions(+) diff --git a/units/en/unit3/hands-on.mdx b/units/en/unit3/hands-on.mdx index fffa6a6..e9c07cf 100644 --- a/units/en/unit3/hands-on.mdx +++ b/units/en/unit3/hands-on.mdx @@ -16,6 +16,7 @@ Now that you've studied the theory behind Deep Q-Learning, **you’re ready to t We're using the [RL-Baselines-3 Zoo integration](https://github.com/DLR-RM/rl-baselines3-zoo), a vanilla version of Deep Q-Learning with no extensions such as Double-DQN, Dueling-DQN, or Prioritized Experience Replay. +Also, **if you want to learn to implement Deep Q-Learning by yourself after this hands-on**, you definitely should look at CleanRL implementation: https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/dqn_atari.py To validate this hands-on for the certification process, you need to push your trained model to the Hub and **get a result of >= 500**. @@ -75,6 +76,7 @@ To find your result, go to the leaderboard and find your model, **the result = m For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process ## Set the GPU 💪 + - To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type` GPU Step 1