From d6f3cc4906c20a95818d9db04f79c02ab7a4c57d Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Tue, 2 Jul 2024 09:49:18 +0200 Subject: [PATCH] Update with new RL Zoo --- notebooks/unit3/unit3.ipynb | 55 ++++++++++++------------------------- 1 file changed, 17 insertions(+), 38 deletions(-) diff --git a/notebooks/unit3/unit3.ipynb b/notebooks/unit3/unit3.ipynb index 6b9378d..bcd3410 100644 --- a/notebooks/unit3/unit3.ipynb +++ b/notebooks/unit3/unit3.ipynb @@ -7,7 +7,7 @@ "colab_type": "text" }, "source": [ - "\"Open" + "\"Open" ] }, { @@ -42,13 +42,13 @@ { "cell_type": "markdown", "source": [ - "### 🎮 Environments: \n", + "### 🎮 Environments:\n", "\n", "- [SpacesInvadersNoFrameskip-v4](https://gymnasium.farama.org/environments/atari/space_invaders/)\n", "\n", "You can see the difference between Space Invaders versions here 👉 https://gymnasium.farama.org/environments/atari/space_invaders/#variants\n", "\n", - "### 📚 RL-Library: \n", + "### 📚 RL-Library:\n", "\n", "- [RL-Baselines3-Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)" ], @@ -90,7 +90,7 @@ "\n", "- 📖 Study Deep Reinforcement Learning in **theory and practice**.\n", "- 🧑‍💻 Learn to **use famous Deep RL libraries** such as Stable Baselines3, RL Baselines3 Zoo, CleanRL and Sample Factory 2.0.\n", - "- 🤖 Train **agents in unique environments** \n", + "- 🤖 Train **agents in unique environments**\n", "\n", "And more check 📚 the syllabus 👉 https://simoninithomas.github.io/deep-rl-course\n", "\n", @@ -109,7 +109,7 @@ "## Prerequisites 🏗️\n", "Before diving into the notebook, you need to:\n", "\n", - "🔲 📚 **[Study Deep Q-Learning by reading Unit 3](https://huggingface.co/deep-rl-course/unit3/introduction)** 🤗 " + "🔲 📚 **[Study Deep Q-Learning by reading Unit 3](https://huggingface.co/deep-rl-course/unit3/introduction)** 🤗" ] }, { @@ -150,7 +150,7 @@ "\n", "Also, we're going to **train it for 90 minutes with 1M timesteps**. By typing `!nvidia-smi` will tell you what GPU you're using.\n", "\n", - "And if you want to train more such 10 million steps, this will take about 9 hours, potentially resulting in Colab timing out. In that case, I recommend running this on your local computer (or somewhere else). Just click on: `File>Download`. " + "And if you want to train more such 10 million steps, this will take about 9 hours, potentially resulting in Colab timing out. In that case, I recommend running this on your local computer (or somewhere else). Just click on: `File>Download`." ], "metadata": { "id": "Nc8BnyVEc3Ys" @@ -193,31 +193,10 @@ { "cell_type": "code", "source": [ - "# For now we install this update of RL-Baselines3 Zoo\n", - "!pip install git+https://github.com/DLR-RM/rl-baselines3-zoo@update/hf" + "!pip install git+https://github.com/DLR-RM/rl-baselines3-zoo" ], "metadata": { - "id": "hLTwHqIWdnPb" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "IF AND ONLY IF THE VERSION ABOVE DOES NOT EXIST ANYMORE. UNCOMMENT AND INSTALL THE ONE BELOW" - ], - "metadata": { - "id": "p0xe2sJHdtHy" - } - }, - { - "cell_type": "code", - "source": [ - "#!pip install rl_zoo3==2.0.0a9" - ], - "metadata": { - "id": "N0d6wy-F-f39" + "id": "S1A_E4z3awa_" }, "execution_count": null, "outputs": [] @@ -259,7 +238,7 @@ "source": [ "## Create a virtual display 🔽\n", "\n", - "During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames). \n", + "During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames).\n", "\n", "Hence the following cell will install the librairies and create and run a virtual screen 🖥" ], @@ -341,10 +320,10 @@ "Here we see that:\n", "- We use the `Atari Wrapper` that preprocess the input (Frame reduction ,grayscale, stack 4 frames)\n", "- We use `CnnPolicy`, since we use Convolutional layers to process the frames\n", - "- We train it for 10 million `n_timesteps` \n", + "- We train it for 10 million `n_timesteps`\n", "- Memory (Experience Replay) size is 100000, aka the amount of experience steps you saved to train again your agent with.\n", "\n", - "💡 My advice is to **reduce the training timesteps to 1M,** which will take about 90 minutes on a P100. `!nvidia-smi` will tell you what GPU you're using. At 10 million steps, this will take about 9 hours, which could likely result in Colab timing out. I recommend running this on your local computer (or somewhere else). Just click on: `File>Download`. " + "💡 My advice is to **reduce the training timesteps to 1M,** which will take about 90 minutes on a P100. `!nvidia-smi` will tell you what GPU you're using. At 10 million steps, this will take about 9 hours, which could likely result in Colab timing out. I recommend running this on your local computer (or somewhere else). Just click on: `File>Download`." ] }, { @@ -423,7 +402,7 @@ }, "outputs": [], "source": [ - "!python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4 --no-render --n-timesteps _________ --folder logs/ " + "!python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4 --no-render --n-timesteps _________ --folder logs/" ] }, { @@ -495,7 +474,7 @@ "id": "9O6FI0F8HnzE" }, "source": [ - "- Copy the token \n", + "- Copy the token\n", "- Run the cell below and past the token" ] }, @@ -595,7 +574,7 @@ "source": [ "Congrats 🥳 you've just trained and uploaded your first Deep Q-Learning agent using RL-Baselines-3 Zoo. The script above should have displayed a link to a model repository such as https://huggingface.co/ThomasSimonini/dqn-SpaceInvadersNoFrameskip-v4. When you go to this link, you can:\n", "\n", - "- See a **video preview of your agent** at the right. \n", + "- See a **video preview of your agent** at the right.\n", "- Click \"Files and versions\" to see all the files in the repository.\n", "- Click \"Use in stable-baselines3\" to get a code snippet that shows how to load the model.\n", "- A model card (`README.md` file) which gives a description of the model and the hyperparameters you used.\n", @@ -711,7 +690,7 @@ "\n", "Here's a list of environments you can try to train your agent with:\n", "- BeamRiderNoFrameskip-v4\n", - "- BreakoutNoFrameskip-v4 \n", + "- BreakoutNoFrameskip-v4\n", "- EnduroNoFrameskip-v4\n", "- PongNoFrameskip-v4\n", "\n", @@ -756,7 +735,7 @@ { "cell_type": "markdown", "source": [ - "See you on Bonus unit 2! 🔥 " + "See you on Bonus unit 2! 🔥" ], "metadata": { "id": "Kc3udPT-RcXc" @@ -829,4 +808,4 @@ }, "nbformat": 4, "nbformat_minor": 0 -} +} \ No newline at end of file