mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-14 10:22:37 +08:00
Update Unit3
This commit is contained in:
@@ -7,7 +7,7 @@
|
||||
"colab_type": "text"
|
||||
},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/huggingface/deep-rl-class/blob/ThomasSimonini%2FUnit3/notebooks/unit3/unit3.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
"<a href=\"https://colab.research.google.com/github/huggingface/deep-rl-class/blob/main/notebooks/unit3.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -44,7 +44,9 @@
|
||||
"source": [
|
||||
"### 🎮 Environments: \n",
|
||||
"\n",
|
||||
"- SpacesInvadersNoFrameskip-v4 \n",
|
||||
"- [SpacesInvadersNoFrameskip-v4](https://gymnasium.farama.org/environments/atari/space_invaders/)\n",
|
||||
"\n",
|
||||
"You can see the difference between Space Invaders versions here 👉 https://gymnasium.farama.org/environments/atari/space_invaders/#variants\n",
|
||||
"\n",
|
||||
"### 📚 RL-Library: \n",
|
||||
"\n",
|
||||
@@ -127,6 +129,10 @@
|
||||
"source": [
|
||||
"# Let's train a Deep Q-Learning agent playing Atari' Space Invaders 👾 and upload it to the Hub.\n",
|
||||
"\n",
|
||||
"We strongly recommend students **to use Google Colab for the hands-on exercises instead of running them on their personal computers**.\n",
|
||||
"\n",
|
||||
"By using Google Colab, **you can focus on learning and experimenting without worrying about the technical aspects of setting up your environments**.\n",
|
||||
"\n",
|
||||
"To validate this hands-on for the certification process, you need to push your trained model to the Hub and **get a result of >= 200**.\n",
|
||||
"\n",
|
||||
"To find your result, go to the leaderboard and find your model, **the result = mean_reward - std of reward**\n",
|
||||
@@ -173,6 +179,81 @@
|
||||
"id": "KV0NyFdQM9ZG"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# Install RL-Baselines3 Zoo and its dependencies 📚\n",
|
||||
"\n",
|
||||
"If you see `ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.` **this is normal and it's not a critical error** there's a conflict of version. But the packages we need are installed."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "wS_cVefO-aYg"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# For now we install this update of RL-Baselines3 Zoo\n",
|
||||
"!pip install git+https://github.com/DLR-RM/rl-baselines3-zoo@update/hf"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "hLTwHqIWdnPb"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"IF AND ONLY IF THE VERSION ABOVE DOES NOT EXIST ANYMORE. UNCOMMENT AND INSTALL THE ONE BELOW"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "p0xe2sJHdtHy"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"#!pip install rl_zoo3==2.0.0a9"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "N0d6wy-F-f39"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"!apt-get install swig cmake ffmpeg"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "8_MllY6Om1eI"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "4S9mJiKg6SqC"
|
||||
},
|
||||
"source": [
|
||||
"To be able to use Atari games in Gymnasium we need to install atari package. And accept-rom-license to download the rom files (games files)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"!pip install gymnasium[atari]\n",
|
||||
"!pip install gymnasium[accept-rom-license]"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "NsRP-lX1_2fC"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
@@ -201,29 +282,6 @@
|
||||
"!pip3 install pyvirtualdisplay"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Additional dependencies for RL Baselines3 Zoo\n",
|
||||
"!apt-get install swig cmake freeglut3-dev "
|
||||
],
|
||||
"metadata": {
|
||||
"id": "fWyKJCy_NJBX"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"!pip install pyglet==1.5.1"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "C5LwHrISW7Q5"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
@@ -234,68 +292,11 @@
|
||||
"virtual_display.start()"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "ww5PQH1gNLI4"
|
||||
"id": "BE5JWP5rQIKf"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "mYIMvl5X9NAu"
|
||||
},
|
||||
"source": [
|
||||
"## Clone RL-Baselines3 Zoo Repo 📚\n",
|
||||
"You can now directly install from python package `pip install rl_zoo3` but since we want **the full installation with extra environments and dependencies** we're going to clone `RL-Baselines3-Zoo` repository and install from source."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "eu5ZDPZ09VNQ"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!git clone https://github.com/DLR-RM/rl-baselines3-zoo"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "HCIoSbvbfAQh"
|
||||
},
|
||||
"source": [
|
||||
"## Install dependencies 🔽\n",
|
||||
"We can now install the dependencies RL-Baselines3 Zoo needs (this can take 5min ⏲)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "s2QsFAk29h-D"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%cd /content/rl-baselines3-zoo/ \n",
|
||||
"!git checkout v1.8.0"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "3QaOS7Xj9j1s"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install setuptools==65.5.0\n",
|
||||
"!pip install -r requirements.txt\n",
|
||||
"# Since colab uses Python 3.9 we need to add this installation\n",
|
||||
"!pip install gym[atari,accept-rom-license]==0.21.0"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
@@ -305,9 +306,31 @@
|
||||
"## Train our Deep Q-Learning Agent to Play Space Invaders 👾\n",
|
||||
"\n",
|
||||
"To train an agent with RL-Baselines3-Zoo, we just need to do two things:\n",
|
||||
"1. We define the hyperparameters in `/content/rl-baselines3-zoo/hyperparams/dqn.yml`\n",
|
||||
"\n",
|
||||
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit3/hyperparameters.png\" alt=\"DQN Hyperparameters\">\n"
|
||||
"1. Create a hyperparameter config file that will contain our training hyperparameters called `dqn.yml`.\n",
|
||||
"\n",
|
||||
"This is a template example:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"SpaceInvadersNoFrameskip-v4:\n",
|
||||
" env_wrapper:\n",
|
||||
" - stable_baselines3.common.atari_wrappers.AtariWrapper\n",
|
||||
" frame_stack: 4\n",
|
||||
" policy: 'CnnPolicy'\n",
|
||||
" n_timesteps: !!float 1e7\n",
|
||||
" buffer_size: 100000\n",
|
||||
" learning_rate: !!float 1e-4\n",
|
||||
" batch_size: 32\n",
|
||||
" learning_starts: 100000\n",
|
||||
" target_update_interval: 1000\n",
|
||||
" train_freq: 4\n",
|
||||
" gradient_steps: 1\n",
|
||||
" exploration_fraction: 0.1\n",
|
||||
" exploration_final_eps: 0.01\n",
|
||||
" # If True, you need to deactivate handle_timeout_termination\n",
|
||||
" # in the replay_buffer_kwargs\n",
|
||||
" optimize_memory_usage: False\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -346,7 +369,9 @@
|
||||
"id": "Hn8bRTHvERRL"
|
||||
},
|
||||
"source": [
|
||||
"2. We run `train.py` and save the models on `logs` folder 📁"
|
||||
"2. We start the training and save the models on `logs` folder 📁\n",
|
||||
"\n",
|
||||
"- Define the algorithm after `--algo`, where we save the model after `-f` and where the hyperparameter config is after `-c`."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -357,7 +382,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!python train.py --algo ________ --env SpaceInvadersNoFrameskip-v4 -f _________"
|
||||
"!python -m rl_zoo3.train --algo ________ --env SpaceInvadersNoFrameskip-v4 -f _________ -c _________"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -377,7 +402,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!python train.py --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/"
|
||||
"!python -m rl_zoo3.train --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/ -c dqn.yml"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -399,7 +424,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!python enjoy.py --algo dqn --env SpaceInvadersNoFrameskip-v4 --no-render --n-timesteps _________ --folder logs/"
|
||||
"!python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4 --no-render --n-timesteps _________ --folder logs/ "
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -419,7 +444,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!python enjoy.py --algo dqn --env SpaceInvadersNoFrameskip-v4 --no-render --n-timesteps 5000 --folder logs/"
|
||||
"!python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4 --no-render --n-timesteps 5000 --folder logs/"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -440,7 +465,7 @@
|
||||
"id": "ezbHS1q3HYVV"
|
||||
},
|
||||
"source": [
|
||||
"By using `rl_zoo3.push_to_hub.py` **you evaluate, record a replay, generate a model card of your agent and push it to the hub**.\n",
|
||||
"By using `rl_zoo3.push_to_hub` **you evaluate, record a replay, generate a model card of your agent and push it to the hub**.\n",
|
||||
"\n",
|
||||
"This way:\n",
|
||||
"- You can **showcase our work** 🔥\n",
|
||||
@@ -518,6 +543,8 @@
|
||||
"\n",
|
||||
"`-orga`: Your Hugging Face username\n",
|
||||
"\n",
|
||||
"`-f`: Where the trained model folder is (in our case `logs`)\n",
|
||||
"\n",
|
||||
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit3/select-id.png\" alt=\"Select Id\">"
|
||||
]
|
||||
},
|
||||
@@ -649,7 +676,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!python enjoy.py --algo dqn --env BeamRiderNoFrameskip-v4 -n 5000 -f rl_trained/"
|
||||
"!python -m rl_zoo3.enjoy --algo dqn --env BeamRiderNoFrameskip-v4 -n 5000 -f rl_trained/ --no-render"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -803,4 +830,4 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user