mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-15 02:41:20 +08:00
Update hands-on.mdx
* Gymnasium Update
This commit is contained in:
@@ -1,42 +1,10 @@
|
||||
# Hands-on [[hands-on]]
|
||||
|
||||
<CourseFloatingBanner classNames="absolute z-10 right-0 top-0"
|
||||
notebooks={[
|
||||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/unit2/unit2.ipynb"}
|
||||
]}
|
||||
askForHelpUrl="http://hf.co/join/discord" />
|
||||
|
||||
|
||||
|
||||
Now that we studied the Q-Learning algorithm, let's implement it from scratch and train our Q-Learning agent in two environments:
|
||||
1. [Frozen-Lake-v1 (non-slippery and slippery version)](https://www.gymlibrary.dev/environments/toy_text/frozen_lake/) ☃️ : where our agent will need to **go from the starting state (S) to the goal state (G)** by walking only on frozen tiles (F) and avoiding holes (H).
|
||||
2. [An autonomous taxi](https://www.gymlibrary.dev/environments/toy_text/taxi/) 🚖 will need **to learn to navigate** a city to **transport its passengers from point A to point B.**
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/envs.gif" alt="Environments"/>
|
||||
|
||||
Thanks to a [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard), you'll be able to compare your results with other classmates and exchange the best practices to improve your agent's scores. Who will win the challenge for Unit 2?
|
||||
|
||||
**If you don't find your model, go to the bottom of the page and click on the refresh button.**
|
||||
|
||||
To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to push your trained Taxi model to the Hub and **get a result of >= 4.5**.
|
||||
|
||||
To find your result, go to the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) and find your model, **the result = mean_reward - std of reward**
|
||||
|
||||
For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process
|
||||
|
||||
And you can check your progress here 👉 https://huggingface.co/spaces/ThomasSimonini/Check-my-progress-Deep-RL-Course
|
||||
|
||||
|
||||
**To start the hands-on click on the Open In Colab button** 👇 :
|
||||
|
||||
[](https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/unit2/unit2.ipynb)
|
||||
|
||||
<a href="https://colab.research.google.com/github/huggingface/deep-rl-class/blob/main/notebooks/unit2/unit2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
|
||||
|
||||
# Unit 2: Q-Learning with FrozenLake-v1 ⛄ and Taxi-v3 🚕
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/thumbnail.jpg" alt="Unit 2 Thumbnail">
|
||||
|
||||
In this notebook, **you'll code your first Reinforcement Learning agent from scratch** to play FrozenLake ❄️ using Q-Learning, share it with the community, and experiment with different configurations.
|
||||
In this notebook, **you'll code from scratch your first Reinforcement Learning agent** playing FrozenLake ❄️ using Q-Learning, share it to the community, and experiment with different configurations.
|
||||
|
||||
|
||||
⬇️ Here is an example of what **you will achieve in just a couple of minutes.** ⬇️
|
||||
@@ -44,12 +12,12 @@ In this notebook, **you'll code your first Reinforcement Learning agent from scr
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/envs.gif" alt="Environments"/>
|
||||
|
||||
### 🎮 Environments:
|
||||
###🎮 Environments:
|
||||
|
||||
- [FrozenLake-v1](https://www.gymlibrary.dev/environments/toy_text/frozen_lake/)
|
||||
- [Taxi-v3](https://www.gymlibrary.dev/environments/toy_text/taxi/)
|
||||
|
||||
### 📚 RL-Library:
|
||||
###📚 RL-Library:
|
||||
|
||||
- Python and NumPy
|
||||
- [Gym](https://www.gymlibrary.dev/)
|
||||
@@ -61,34 +29,52 @@ We're constantly trying to improve our tutorials, so **if you find some issues i
|
||||
At the end of the notebook, you will:
|
||||
|
||||
- Be able to use **Gym**, the environment library.
|
||||
- Be able to code a Q-Learning agent from scratch.
|
||||
- Be able to code from scratch a Q-Learning agent.
|
||||
- Be able to **push your trained agent and the code to the Hub** with a nice video replay and an evaluation score 🔥.
|
||||
|
||||
|
||||
|
||||
|
||||
## This notebook is from the Deep Reinforcement Learning Course
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/deep-rl-course-illustration.jpg" alt="Deep RL Course illustration"/>
|
||||
|
||||
In this free course, you will:
|
||||
|
||||
- 📖 Study Deep Reinforcement Learning in **theory and practice**.
|
||||
- 🧑💻 Learn to **use famous Deep RL libraries** such as Stable Baselines3, RL Baselines3 Zoo, CleanRL and Sample Factory 2.0.
|
||||
- 🤖 Train **agents in unique environments**
|
||||
|
||||
And more check 📚 the syllabus 👉 https://simoninithomas.github.io/deep-rl-course
|
||||
|
||||
Don’t forget to **<a href="http://eepurl.com/ic5ZUD">sign up to the course</a>** (we are collecting your email to be able to **send you the links when each Unit is published and give you information about the challenges and updates).**
|
||||
|
||||
|
||||
The best way to keep in touch is to join our discord server to exchange with the community and with us 👉🏻 https://discord.gg/ydHrjt3WP5
|
||||
|
||||
## Prerequisites 🏗️
|
||||
Before diving into the notebook, you need to:
|
||||
|
||||
🔲 📚 **Study [Q-Learning by reading Unit 2](https://huggingface.co/deep-rl-course/unit2/introduction)** 🤗
|
||||
🔲 📚 **Study [Q-Learning by reading Unit 2](https://huggingface.co/deep-rl-course/unit2/introduction)** 🤗
|
||||
|
||||
## A small recap of Q-Learning
|
||||
|
||||
- *Q-Learning* **is the RL algorithm that**
|
||||
|
||||
- Trains *Q-Function*, an **action-value function** that encoded, in internal memory, by a *Q-table* **that contains all the state-action pair values.**
|
||||
|
||||
- Given a state and action, our Q-Function **will search the Q-table for the corresponding value.**
|
||||
- The *Q-Learning* **is the RL algorithm that**
|
||||
|
||||
- Trains *Q-Function*, an **action-value function** that contains, as internal memory, a *Q-table* **that contains all the state-action pair values.**
|
||||
|
||||
- Given a state and action, our Q-Function **will search into its Q-table the corresponding value.**
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/Q-function-2.jpg" alt="Q function" width="100%"/>
|
||||
|
||||
- When the training is done,**we have an optimal Q-Function, so an optimal Q-Table.**
|
||||
|
||||
|
||||
- And if we **have an optimal Q-function**, we
|
||||
have an optimal policy, since we **know for, each state, the best action to take.**
|
||||
have an optimal policy,since we **know for each state, what is the best action to take.**
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/link-value-policy.jpg" alt="Link value policy" width="100%"/>
|
||||
|
||||
|
||||
But, in the beginning, our **Q-Table is useless since it gives arbitrary values for each state-action pair (most of the time we initialize the Q-Table to 0 values)**. But, as we explore the environment and update our Q-Table it will give us better and better approximations
|
||||
But, in the beginning, our **Q-Table is useless since it gives arbitrary value for each state-action pair (most of the time we initialize the Q-Table to 0 values)**. But, as we’ll explore the environment and update our Q-Table it will give us better and better approximations
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit2/q-learning.jpeg" alt="q-learning.jpeg" width="100%"/>
|
||||
|
||||
@@ -99,6 +85,13 @@ This is the Q-Learning pseudocode:
|
||||
|
||||
# Let's code our first Reinforcement Learning algorithm 🚀
|
||||
|
||||
|
||||
To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to push your trained Taxi model to the Hub and **get a result of >= 4.5**.
|
||||
|
||||
To find your result, go to the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) and find your model, **the result = mean_reward - std of reward**
|
||||
|
||||
For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process
|
||||
|
||||
## Install dependencies and create a virtual display 🔽
|
||||
|
||||
In the notebook, we'll need to generate a replay video. To do so, with Colab, **we need to have a virtual screen to render the environment** (and thus record the frames).
|
||||
@@ -113,19 +106,20 @@ We’ll install multiple ones:
|
||||
|
||||
The Hugging Face Hub 🤗 works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easily collaborate with others.
|
||||
|
||||
You can see all the Deep RL models available here (if they use Q Learning) 👉 https://huggingface.co/models?other=q-learning
|
||||
You can see here all the Deep RL models available (if they use Q Learning) 👉 https://huggingface.co/models?other=q-learning
|
||||
|
||||
```bash
|
||||
pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit2/requirements-unit2.txt
|
||||
```python
|
||||
!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit2/requirements-unit2.txt
|
||||
```
|
||||
|
||||
```bash
|
||||
sudo apt-get update
|
||||
apt install python-opengl ffmpeg xvfb
|
||||
pip3 install pyvirtualdisplay
|
||||
```python
|
||||
%%capture
|
||||
!sudo apt-get update
|
||||
!apt install python-opengl ffmpeg xvfb
|
||||
!pip3 install pyvirtualdisplay
|
||||
```
|
||||
|
||||
To make sure the new installed libraries are used, **sometimes it's required to restart the notebook runtime**. The next cell will force the **runtime to crash, so you'll need to connect again and run the code starting from here**. Thanks to this trick, **we will be able to run our virtual screen.**
|
||||
To make sure the new installed libraries are used, **sometimes it's required to restart the notebook runtime**. The next cell will force the **runtime to crash, so you'll need to connect again and run the code starting from here**. Thanks for this trick, **we will be able to run our virtual screen.**
|
||||
|
||||
```python
|
||||
import os
|
||||
@@ -154,6 +148,7 @@ import gym
|
||||
import random
|
||||
import imageio
|
||||
import os
|
||||
import tqdm
|
||||
|
||||
import pickle5 as pickle
|
||||
from tqdm.notebook import tqdm
|
||||
@@ -163,10 +158,10 @@ We're now ready to code our Q-Learning algorithm 🔥
|
||||
|
||||
# Part 1: Frozen Lake ⛄ (non slippery version)
|
||||
|
||||
## Create and understand [FrozenLake environment ⛄](https://www.gymlibrary.dev/environments/toy_text/frozen_lake/)
|
||||
## Create and understand [FrozenLake environment ⛄]((https://www.gymlibrary.dev/environments/toy_text/frozen_lake/)
|
||||
---
|
||||
|
||||
💡 A good habit when you start to use an environment is to check its documentation
|
||||
💡 A good habit when you start to use an environment is to check its documentation
|
||||
|
||||
👉 https://www.gymlibrary.dev/environments/toy_text/frozen_lake/
|
||||
|
||||
@@ -217,7 +212,7 @@ print("Observation Space", env.observation_space)
|
||||
print("Sample observation", env.observation_space.sample()) # Get a random observation
|
||||
```
|
||||
|
||||
We see with `Observation Space Shape Discrete(16)` that the observation is an integer representing the **agent’s current position as current_row * nrows + current_col (where both the row and col start at 0)**.
|
||||
We see with `Observation Space Shape Discrete(16)` that the observation is an integer representing the **agent’s current position as current_row * nrows + current_col (where both the row and col start at 0)**.
|
||||
|
||||
For example, the goal position in the 4x4 map can be calculated as follows: 3 * 4 + 3 = 15. The number of possible observations is dependent on the size of the map. **For example, the 4x4 map has 16 possible observations.**
|
||||
|
||||
@@ -253,18 +248,17 @@ It's time to initialize our Q-table! To know how many rows (states) and columns
|
||||
|
||||
|
||||
```python
|
||||
state_space =
|
||||
state_space =
|
||||
print("There are ", state_space, " possible states")
|
||||
|
||||
action_space =
|
||||
action_space =
|
||||
print("There are ", action_space, " possible actions")
|
||||
```
|
||||
|
||||
```python
|
||||
# Let's create our Qtable of size (state_space, action_space) and initialized each values at 0 using np.zeros. np.zeros needs a tuple (a,b)
|
||||
|
||||
def initialize_q_table(state_space, action_space):
|
||||
Qtable =
|
||||
Qtable =
|
||||
return Qtable
|
||||
```
|
||||
|
||||
@@ -299,7 +293,7 @@ Remember we have two policies since Q-Learning is an **off-policy** algorithm. T
|
||||
- Epsilon-greedy policy (acting policy)
|
||||
- Greedy-policy (updating policy)
|
||||
|
||||
The greedy policy will also be the final policy we'll have when the Q-learning agent completes training. The greedy policy is used to select an action using the Q-table.
|
||||
Greedy policy will also be the final policy we'll have when the Q-learning agent will be trained. The greedy policy is used to select an action from the Q-table.
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/off-on-4.jpg" alt="Q-Learning" width="100%"/>
|
||||
|
||||
@@ -307,8 +301,8 @@ The greedy policy will also be the final policy we'll have when the Q-learning a
|
||||
```python
|
||||
def greedy_policy(Qtable, state):
|
||||
# Exploitation: take the action with the highest state, action value
|
||||
action =
|
||||
|
||||
action =
|
||||
|
||||
return action
|
||||
```
|
||||
|
||||
@@ -330,9 +324,9 @@ The idea with epsilon-greedy:
|
||||
|
||||
- With *probability 1 - ɛ* : **we do exploitation** (i.e. our agent selects the action with the highest state-action pair value).
|
||||
|
||||
- With *probability ɛ*: we do **exploration** (trying a random action).
|
||||
- With *probability ɛ*: we do **exploration** (trying random action).
|
||||
|
||||
As the training continues, we progressively **reduce the epsilon value since we will need less and less exploration and more exploitation.**
|
||||
And as the training goes, we progressively **reduce the epsilon value since we will need less and less exploration and more exploitation.**
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/Q-learning-4.jpg" alt="Q-Learning" width="100%"/>
|
||||
|
||||
@@ -340,16 +334,16 @@ As the training continues, we progressively **reduce the epsilon value since we
|
||||
```python
|
||||
def epsilon_greedy_policy(Qtable, state, epsilon):
|
||||
# Randomly generate a number between 0 and 1
|
||||
random_num =
|
||||
random_num =
|
||||
# if random_num > greater than epsilon --> exploitation
|
||||
if random_num > epsilon:
|
||||
# Take the action with the highest value given a state
|
||||
# np.argmax can be useful here
|
||||
action =
|
||||
action =
|
||||
# else --> exploration
|
||||
else:
|
||||
action = # Take a random action
|
||||
|
||||
|
||||
return action
|
||||
```
|
||||
|
||||
@@ -372,7 +366,7 @@ def epsilon_greedy_policy(Qtable, state, epsilon):
|
||||
```
|
||||
|
||||
## Define the hyperparameters ⚙️
|
||||
The exploration related hyperparameters are some of the most important ones.
|
||||
The exploration related hyperparamters are some of the most important ones.
|
||||
|
||||
- We need to make sure that our agent **explores enough of the state space** to learn a good value approximation. To do that, we need to have progressive decay of the epsilon.
|
||||
- If you decrease epsilon too fast (too high decay_rate), **you take the risk that your agent will be stuck**, since your agent didn't explore enough of the state space and hence can't solve the problem.
|
||||
@@ -409,7 +403,7 @@ For episode in the total of training episodes:
|
||||
Reduce epsilon (since we need less and less exploration)
|
||||
Reset the environment
|
||||
|
||||
For step in max timesteps:
|
||||
For step in max timesteps:
|
||||
Choose the action At using epsilon greedy policy
|
||||
Take the action (a) and observe the outcome state(s') and reward (r)
|
||||
Update the Q-value Q(s,a) using Bellman equation Q(s,a) + lr [R(s,a) + gamma * max Q(s',a') - Q(s,a)]
|
||||
@@ -419,7 +413,7 @@ Reset the environment
|
||||
|
||||
```python
|
||||
def train(n_training_episodes, min_epsilon, max_epsilon, decay_rate, env, max_steps, Qtable):
|
||||
for episode in range(n_training_episodes):
|
||||
for episode in tqdm(range(n_training_episodes)):
|
||||
# Reduce epsilon (because we need less and less exploration)
|
||||
epsilon = min_epsilon + (max_epsilon - min_epsilon)*np.exp(-decay_rate*episode)
|
||||
# Reset the environment
|
||||
@@ -430,19 +424,19 @@ def train(n_training_episodes, min_epsilon, max_epsilon, decay_rate, env, max_st
|
||||
# repeat
|
||||
for step in range(max_steps):
|
||||
# Choose the action At using epsilon greedy policy
|
||||
action =
|
||||
action =
|
||||
|
||||
# Take action At and observe Rt+1 and St+1
|
||||
# Take the action (a) and observe the outcome state(s') and reward (r)
|
||||
new_state, reward, done, info =
|
||||
new_state, reward, done, info =
|
||||
|
||||
# Update Q(s,a):= Q(s,a) + lr [R(s,a) + gamma * max Q(s',a') - Q(s,a)]
|
||||
Qtable[state][action] =
|
||||
Qtable[state][action] =
|
||||
|
||||
# If done, finish the episode
|
||||
if done:
|
||||
break
|
||||
|
||||
|
||||
# Our next state is the new state
|
||||
state = new_state
|
||||
return Qtable
|
||||
@@ -674,19 +668,19 @@ def push_to_hub(repo_id, model, env, video_fps=1, local_repo_path="hub"):
|
||||
metadata = {**metadata, **eval}
|
||||
|
||||
model_card = f"""
|
||||
# **Q-Learning** Agent playing1 **{env_id}**
|
||||
This is a trained model of a **Q-Learning** agent playing **{env_id}** .
|
||||
# **Q-Learning** Agent playing1 **{env_id}**
|
||||
This is a trained model of a **Q-Learning** agent playing **{env_id}** .
|
||||
|
||||
## Usage
|
||||
## Usage
|
||||
|
||||
```python
|
||||
```python
|
||||
|
||||
model = load_from_hub(repo_id="{repo_id}", filename="q-learning.pkl")
|
||||
|
||||
model = load_from_hub(repo_id="{repo_id}", filename="q-learning.pkl")
|
||||
|
||||
# Don't forget to check if you need to add additional attributes (is_slippery=False etc)
|
||||
env = gym.make(model["env_id"])
|
||||
```
|
||||
"""
|
||||
# Don't forget to check if you need to add additional attributes (is_slippery=False etc)
|
||||
env = gym.make(model["env_id"])
|
||||
```
|
||||
"""
|
||||
|
||||
evaluate_agent(env, model["max_steps"], model["n_eval_episodes"], model["qtable"], model["eval_seed"])
|
||||
|
||||
@@ -726,7 +720,7 @@ By using `push_to_hub` **you evaluate, record a replay, generate a model card of
|
||||
This way:
|
||||
- You can **showcase our work** 🔥
|
||||
- You can **visualize your agent playing** 👀
|
||||
- You can **share an agent with the community that others can use** 💾
|
||||
- You can **share with the community an agent that others can use** 💾
|
||||
- You can **access a leaderboard 🏆 to see how well your agent is performing compared to your classmates** 👉 https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard
|
||||
|
||||
|
||||
@@ -788,21 +782,21 @@ repo_name = "q-FrozenLake-v1-4x4-noSlippery"
|
||||
push_to_hub(repo_id=f"{username}/{repo_name}", model=model, env=env)
|
||||
```
|
||||
|
||||
Congrats 🥳 you've just implemented from scratch, trained, and uploaded your first Reinforcement Learning agent.
|
||||
FrozenLake-v1 no_slippery is very simple environment, let's try a harder one 🔥.
|
||||
Congrats 🥳 you've just implemented from scratch, trained and uploaded your first Reinforcement Learning agent.
|
||||
FrozenLake-v1 no_slippery is very simple environment, let's try an harder one 🔥.
|
||||
|
||||
# Part 2: Taxi-v3 🚖
|
||||
|
||||
## Create and understand [Taxi-v3 🚕](https://www.gymlibrary.dev/environments/toy_text/taxi/)
|
||||
---
|
||||
|
||||
💡 A good habit when you start to use an environment is to check its documentation
|
||||
💡 A good habit when you start to use an environment is to check its documentation
|
||||
|
||||
👉 https://www.gymlibrary.dev/environments/toy_text/taxi/
|
||||
|
||||
---
|
||||
|
||||
In `Taxi-v3` 🚕, there are four designated locations in the grid world indicated by R(ed), G(reen), Y(ellow), and B(lue).
|
||||
In `Taxi-v3` 🚕, there are four designated locations in the grid world indicated by R(ed), G(reen), Y(ellow), and B(lue).
|
||||
|
||||
When the episode starts, **the taxi starts off at a random square** and the passenger is at a random location. The taxi drives to the passenger’s location, **picks up the passenger**, drives to the passenger’s destination (another one of the four specified locations), and then **drops off the passenger**. Once the passenger is dropped off, the episode ends.
|
||||
|
||||
@@ -1009,7 +1003,7 @@ repo_name = ""
|
||||
push_to_hub(repo_id=f"{username}/{repo_name}", model=model, env=env)
|
||||
```
|
||||
|
||||
Now that it's on the Hub, you can compare the results of your Taxi-v3 with your classmates using the leaderboard 🏆 👉 https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard
|
||||
Now that's on the Hub, you can compare the results of your Taxi-v3 with your classmates using the leaderboard 🏆 👉 https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard
|
||||
|
||||
⚠ To see your entry, you need to go to the bottom of the leaderboard page and **click on refresh** ⚠
|
||||
|
||||
@@ -1075,36 +1069,36 @@ evaluate_agent(env, model["max_steps"], model["n_eval_episodes"], model["qtable"
|
||||
```
|
||||
|
||||
## Some additional challenges 🏆
|
||||
The best way to learn **is to try things on your own**! As you saw, the current agent is not doing great. As a first suggestion, you can train for more steps. With 1,000,000 steps, we saw some great results!
|
||||
The best way to learn **is to try things by your own**! As you saw, the current agent is not doing great. As a first suggestion, you can train for more steps. With 1,000,000 steps, we saw some great results!
|
||||
|
||||
In the [Leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top?
|
||||
|
||||
Here are some ideas to climb up the leaderboard:
|
||||
Here are some ideas to achieve so:
|
||||
|
||||
* Train more steps
|
||||
* Try different hyperparameters by looking at what your classmates have done.
|
||||
* **Push your new trained model** on the Hub 🔥
|
||||
|
||||
Are walking on ice and driving taxis too boring to you? Try to **change the environment**, why not use the FrozenLake-v1 slippery version? Check how they work [using the gym documentation](https://www.gymlibrary.dev/) and have fun 🎉.
|
||||
Are walking on ice and driving taxis too boring to you? Try to **change the environment**, why not using FrozenLake-v1 slippery version? Check how they work [using the gym documentation](https://www.gymlibrary.dev/) and have fun 🎉.
|
||||
|
||||
_____________________________________________________________________
|
||||
Congrats 🥳, you've just implemented, trained, and uploaded your first Reinforcement Learning agent.
|
||||
|
||||
Understanding Q-Learning is an **important step to understanding value-based methods.**
|
||||
|
||||
In the next Unit with Deep Q-Learning, we'll see that while creating and updating a Q-table was a good strategy — **however, it is not scalable.**
|
||||
In the next Unit with Deep Q-Learning, we'll see that creating and updating a Q-table was a good strategy — **however, this is not scalable.**
|
||||
|
||||
For instance, imagine you create an agent that learns to play Doom.
|
||||
For instance, imagine you create an agent that learns to play Doom.
|
||||
|
||||
<img src="https://vizdoom.cs.put.edu.pl/user/pages/01.tutorial/basic.png" alt="Doom"/>
|
||||
|
||||
Doom is a large environment with a huge state space (millions of different states). Creating and updating a Q-table for that environment would not be efficient.
|
||||
Doom is a large environment with a huge state space (millions of different states). Creating and updating a Q-table for that environment would not be efficient.
|
||||
|
||||
That's why we'll study Deep Q-Learning in the next unit, an algorithm **where we use a neural network that approximates, given a state, the different Q-values for each action.**
|
||||
That's why we'll study, in the next unit, Deep Q-Learning, an algorithm **where we use a neural network that approximates, given a state, the different Q-values for each action.**
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit4/atari-envs.gif" alt="Environments"/>
|
||||
|
||||
|
||||
See you in Unit 3! 🔥
|
||||
See you on Unit 3! 🔥
|
||||
|
||||
## Keep learning, stay awesome 🤗
|
||||
## Keep learning, stay awesome 🤗
|
||||
Reference in New Issue
Block a user