mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-06-15 06:27:24 +08:00
Update with new RL Zoo
This commit is contained in:
@@ -7,7 +7,7 @@
|
||||
"colab_type": "text"
|
||||
},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/huggingface/deep-rl-class/blob/main/notebooks/unit3.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
"<a href=\"https://colab.research.google.com/github/huggingface/deep-rl-class/blob/main/notebooks/unit3/unit3.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -42,13 +42,13 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"### 🎮 Environments: \n",
|
||||
"### 🎮 Environments:\n",
|
||||
"\n",
|
||||
"- [SpacesInvadersNoFrameskip-v4](https://gymnasium.farama.org/environments/atari/space_invaders/)\n",
|
||||
"\n",
|
||||
"You can see the difference between Space Invaders versions here 👉 https://gymnasium.farama.org/environments/atari/space_invaders/#variants\n",
|
||||
"\n",
|
||||
"### 📚 RL-Library: \n",
|
||||
"### 📚 RL-Library:\n",
|
||||
"\n",
|
||||
"- [RL-Baselines3-Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)"
|
||||
],
|
||||
@@ -90,7 +90,7 @@
|
||||
"\n",
|
||||
"- 📖 Study Deep Reinforcement Learning in **theory and practice**.\n",
|
||||
"- 🧑💻 Learn to **use famous Deep RL libraries** such as Stable Baselines3, RL Baselines3 Zoo, CleanRL and Sample Factory 2.0.\n",
|
||||
"- 🤖 Train **agents in unique environments** \n",
|
||||
"- 🤖 Train **agents in unique environments**\n",
|
||||
"\n",
|
||||
"And more check 📚 the syllabus 👉 https://simoninithomas.github.io/deep-rl-course\n",
|
||||
"\n",
|
||||
@@ -109,7 +109,7 @@
|
||||
"## Prerequisites 🏗️\n",
|
||||
"Before diving into the notebook, you need to:\n",
|
||||
"\n",
|
||||
"🔲 📚 **[Study Deep Q-Learning by reading Unit 3](https://huggingface.co/deep-rl-course/unit3/introduction)** 🤗 "
|
||||
"🔲 📚 **[Study Deep Q-Learning by reading Unit 3](https://huggingface.co/deep-rl-course/unit3/introduction)** 🤗"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -150,7 +150,7 @@
|
||||
"\n",
|
||||
"Also, we're going to **train it for 90 minutes with 1M timesteps**. By typing `!nvidia-smi` will tell you what GPU you're using.\n",
|
||||
"\n",
|
||||
"And if you want to train more such 10 million steps, this will take about 9 hours, potentially resulting in Colab timing out. In that case, I recommend running this on your local computer (or somewhere else). Just click on: `File>Download`. "
|
||||
"And if you want to train more such 10 million steps, this will take about 9 hours, potentially resulting in Colab timing out. In that case, I recommend running this on your local computer (or somewhere else). Just click on: `File>Download`."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "Nc8BnyVEc3Ys"
|
||||
@@ -193,31 +193,10 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# For now we install this update of RL-Baselines3 Zoo\n",
|
||||
"!pip install git+https://github.com/DLR-RM/rl-baselines3-zoo@update/hf"
|
||||
"!pip install git+https://github.com/DLR-RM/rl-baselines3-zoo"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "hLTwHqIWdnPb"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"IF AND ONLY IF THE VERSION ABOVE DOES NOT EXIST ANYMORE. UNCOMMENT AND INSTALL THE ONE BELOW"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "p0xe2sJHdtHy"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"#!pip install rl_zoo3==2.0.0a9"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "N0d6wy-F-f39"
|
||||
"id": "S1A_E4z3awa_"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
@@ -259,7 +238,7 @@
|
||||
"source": [
|
||||
"## Create a virtual display 🔽\n",
|
||||
"\n",
|
||||
"During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames). \n",
|
||||
"During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames).\n",
|
||||
"\n",
|
||||
"Hence the following cell will install the librairies and create and run a virtual screen 🖥"
|
||||
],
|
||||
@@ -341,10 +320,10 @@
|
||||
"Here we see that:\n",
|
||||
"- We use the `Atari Wrapper` that preprocess the input (Frame reduction ,grayscale, stack 4 frames)\n",
|
||||
"- We use `CnnPolicy`, since we use Convolutional layers to process the frames\n",
|
||||
"- We train it for 10 million `n_timesteps` \n",
|
||||
"- We train it for 10 million `n_timesteps`\n",
|
||||
"- Memory (Experience Replay) size is 100000, aka the amount of experience steps you saved to train again your agent with.\n",
|
||||
"\n",
|
||||
"💡 My advice is to **reduce the training timesteps to 1M,** which will take about 90 minutes on a P100. `!nvidia-smi` will tell you what GPU you're using. At 10 million steps, this will take about 9 hours, which could likely result in Colab timing out. I recommend running this on your local computer (or somewhere else). Just click on: `File>Download`. "
|
||||
"💡 My advice is to **reduce the training timesteps to 1M,** which will take about 90 minutes on a P100. `!nvidia-smi` will tell you what GPU you're using. At 10 million steps, this will take about 9 hours, which could likely result in Colab timing out. I recommend running this on your local computer (or somewhere else). Just click on: `File>Download`."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -423,7 +402,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4 --no-render --n-timesteps _________ --folder logs/ "
|
||||
"!python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4 --no-render --n-timesteps _________ --folder logs/"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -495,7 +474,7 @@
|
||||
"id": "9O6FI0F8HnzE"
|
||||
},
|
||||
"source": [
|
||||
"- Copy the token \n",
|
||||
"- Copy the token\n",
|
||||
"- Run the cell below and past the token"
|
||||
]
|
||||
},
|
||||
@@ -595,7 +574,7 @@
|
||||
"source": [
|
||||
"Congrats 🥳 you've just trained and uploaded your first Deep Q-Learning agent using RL-Baselines-3 Zoo. The script above should have displayed a link to a model repository such as https://huggingface.co/ThomasSimonini/dqn-SpaceInvadersNoFrameskip-v4. When you go to this link, you can:\n",
|
||||
"\n",
|
||||
"- See a **video preview of your agent** at the right. \n",
|
||||
"- See a **video preview of your agent** at the right.\n",
|
||||
"- Click \"Files and versions\" to see all the files in the repository.\n",
|
||||
"- Click \"Use in stable-baselines3\" to get a code snippet that shows how to load the model.\n",
|
||||
"- A model card (`README.md` file) which gives a description of the model and the hyperparameters you used.\n",
|
||||
@@ -711,7 +690,7 @@
|
||||
"\n",
|
||||
"Here's a list of environments you can try to train your agent with:\n",
|
||||
"- BeamRiderNoFrameskip-v4\n",
|
||||
"- BreakoutNoFrameskip-v4 \n",
|
||||
"- BreakoutNoFrameskip-v4\n",
|
||||
"- EnduroNoFrameskip-v4\n",
|
||||
"- PongNoFrameskip-v4\n",
|
||||
"\n",
|
||||
@@ -756,7 +735,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"See you on Bonus unit 2! 🔥 "
|
||||
"See you on Bonus unit 2! 🔥"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "Kc3udPT-RcXc"
|
||||
@@ -829,4 +808,4 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user