mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-13 18:00:45 +08:00
Merge branch 'glossary-unit4' of github.com:dcarpintero/deep-rl-class into glossary-unit4
This commit is contained in:
@@ -1,3 +1,4 @@
|
||||
stable-baselines3==2.0.0a5
|
||||
swig
|
||||
gymnasium[box2d]
|
||||
huggingface_sb3
|
||||
|
||||
@@ -5,7 +5,6 @@
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"private_outputs": true,
|
||||
"authorship_tag": "ABX9TyPDFLK3trc6MCLJLqUUuAbl",
|
||||
"include_colab_link": true
|
||||
},
|
||||
"kernelspec": {
|
||||
@@ -36,7 +35,7 @@
|
||||
"\n",
|
||||
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit8/thumbnail.png\" alt=\"Thumbnail\"/>\n",
|
||||
"\n",
|
||||
"In this notebook, you'll learn to use A2C with PyBullet and Panda-Gym, two set of robotics environments. \n",
|
||||
"In this notebook, you'll learn to use A2C with PyBullet and Panda-Gym, two set of robotics environments.\n",
|
||||
"\n",
|
||||
"With [PyBullet](https://github.com/bulletphysics/bullet3), you're going to **train a robot to move**:\n",
|
||||
"- `AntBulletEnv-v0` 🕸️ More precisely, a spider (they say Ant but come on... it's a spider 😆) 🕸️\n",
|
||||
@@ -62,12 +61,12 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"### 🎮 Environments: \n",
|
||||
"### 🎮 Environments:\n",
|
||||
"\n",
|
||||
"- [PyBullet](https://github.com/bulletphysics/bullet3)\n",
|
||||
"- [Panda-Gym](https://github.com/qgallouedec/panda-gym)\n",
|
||||
"\n",
|
||||
"###📚 RL-Library: \n",
|
||||
"###📚 RL-Library:\n",
|
||||
"\n",
|
||||
"- [Stable-Baselines3](https://stable-baselines3.readthedocs.io/)"
|
||||
],
|
||||
@@ -112,7 +111,7 @@
|
||||
"\n",
|
||||
"- 📖 Study Deep Reinforcement Learning in **theory and practice**.\n",
|
||||
"- 🧑💻 Learn to **use famous Deep RL libraries** such as Stable Baselines3, RL Baselines3 Zoo, CleanRL and Sample Factory 2.0.\n",
|
||||
"- 🤖 Train **agents in unique environments** \n",
|
||||
"- 🤖 Train **agents in unique environments**\n",
|
||||
"\n",
|
||||
"And more check 📚 the syllabus 👉 https://simoninithomas.github.io/deep-rl-course\n",
|
||||
"\n",
|
||||
@@ -192,7 +191,7 @@
|
||||
"source": [
|
||||
"## Create a virtual display 🔽\n",
|
||||
"\n",
|
||||
"During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames). \n",
|
||||
"During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames).\n",
|
||||
"\n",
|
||||
"Hence the following cell will install the librairies and create and run a virtual screen 🖥"
|
||||
],
|
||||
@@ -266,7 +265,10 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit6/requirements-unit6.txt"
|
||||
"!pip install stable-baselines3[extra]==1.8.0\n",
|
||||
"!pip install huggingface_sb3\n",
|
||||
"!pip install panda_gym==2.0.0\n",
|
||||
"!pip install pyglet==1.5.1"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -403,7 +405,7 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"A good practice in reinforcement learning is to [normalize input features](https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html). \n",
|
||||
"A good practice in reinforcement learning is to [normalize input features](https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html).\n",
|
||||
"\n",
|
||||
"For that purpose, there is a wrapper that will compute a running average and standard deviation of input features.\n",
|
||||
"\n",
|
||||
@@ -630,7 +632,7 @@
|
||||
"\n",
|
||||
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/create-token.jpg\" alt=\"Create HF Token\">\n",
|
||||
"\n",
|
||||
"- Copy the token \n",
|
||||
"- Copy the token\n",
|
||||
"- Run the cell below and paste the token"
|
||||
]
|
||||
},
|
||||
@@ -855,7 +857,7 @@
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# 6\n",
|
||||
"model_name = \"a2c-PandaReachDense-v2\"; \n",
|
||||
"model_name = \"a2c-PandaReachDense-v2\";\n",
|
||||
"model.save(model_name)\n",
|
||||
"env.save(\"vec_normalize.pkl\")\n",
|
||||
"\n",
|
||||
@@ -927,4 +929,4 @@
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -41,7 +41,7 @@
|
||||
"from IPython.display import HTML\n",
|
||||
"\n",
|
||||
"HTML('''<video width=\"640\" height=\"480\" controls>\n",
|
||||
" <source src=\"https://huggingface.co/edbeeching/doom_health_gathering_supreme_3333/resolve/main/replay.mp4\" \n",
|
||||
" <source src=\"https://huggingface.co/edbeeching/doom_health_gathering_supreme_3333/resolve/main/replay.mp4\"\n",
|
||||
" type=\"video/mp4\">Your browser does not support the video tag.</video>'''\n",
|
||||
")"
|
||||
]
|
||||
@@ -124,15 +124,15 @@
|
||||
"\n",
|
||||
"### How sample-factory works\n",
|
||||
"\n",
|
||||
"Sample-factory is one of the **most highly optimized RL implementations available to the community**. \n",
|
||||
"Sample-factory is one of the **most highly optimized RL implementations available to the community**.\n",
|
||||
"\n",
|
||||
"It works by **spawning multiple processes that run rollout workers, inference workers and a learner worker**. \n",
|
||||
"It works by **spawning multiple processes that run rollout workers, inference workers and a learner worker**.\n",
|
||||
"\n",
|
||||
"The *workers* **communicate through shared memory, which lowers the communication cost between processes**. \n",
|
||||
"The *workers* **communicate through shared memory, which lowers the communication cost between processes**.\n",
|
||||
"\n",
|
||||
"The *rollout workers* interact with the environment and send observations to the *inference workers*. \n",
|
||||
"The *rollout workers* interact with the environment and send observations to the *inference workers*.\n",
|
||||
"\n",
|
||||
"The *inferences workers* query a fixed version of the policy and **send actions back to the rollout worker**. \n",
|
||||
"The *inferences workers* query a fixed version of the policy and **send actions back to the rollout worker**.\n",
|
||||
"\n",
|
||||
"After *k* steps the rollout works send a trajectory of experience to the learner worker, **which it uses to update the agent’s policy network**.\n",
|
||||
"\n",
|
||||
@@ -164,9 +164,9 @@
|
||||
"source": [
|
||||
"## ViZDoom\n",
|
||||
"\n",
|
||||
"[ViZDoom](https://vizdoom.cs.put.edu.pl/) is an **open-source python interface for the Doom Engine**. \n",
|
||||
"[ViZDoom](https://vizdoom.cs.put.edu.pl/) is an **open-source python interface for the Doom Engine**.\n",
|
||||
"\n",
|
||||
"The library was created in 2016 by Marek Wydmuch, Michal Kempka at the Institute of Computing Science, Poznan University of Technology, Poland. \n",
|
||||
"The library was created in 2016 by Marek Wydmuch, Michal Kempka at the Institute of Computing Science, Poznan University of Technology, Poland.\n",
|
||||
"\n",
|
||||
"The library enables the **training of agents directly from the screen pixels in a number of scenarios**, including team deathmatch, shown in the video below. Because the ViZDoom environment is based on a game the was created in the 90s, it can be run on modern hardware at accelerated speeds, **allowing us to learn complex AI behaviors fairly quickly**.\n",
|
||||
"\n",
|
||||
@@ -195,7 +195,7 @@
|
||||
"source": [
|
||||
"## We first need to install some dependencies that are required for the ViZDoom environment\n",
|
||||
"\n",
|
||||
"Now that our Colab runtime is set up, we can start by installing the dependencies required to run ViZDoom on linux. \n",
|
||||
"Now that our Colab runtime is set up, we can start by installing the dependencies required to run ViZDoom on linux.\n",
|
||||
"\n",
|
||||
"If you are following on your machine on Mac, you will want to follow the installation instructions on the [github page](https://github.com/Farama-Foundation/ViZDoom/blob/master/doc/Quickstart.md#-quickstart-for-macos-and-anaconda3-python-36)."
|
||||
]
|
||||
@@ -210,7 +210,7 @@
|
||||
"source": [
|
||||
"%%capture\n",
|
||||
"%%bash\n",
|
||||
"# Install ViZDoom deps from \n",
|
||||
"# Install ViZDoom deps from\n",
|
||||
"# https://github.com/mwydmuch/ViZDoom/blob/master/doc/Building.md#-linux\n",
|
||||
"\n",
|
||||
"apt-get install build-essential zlib1g-dev libsdl2-dev libjpeg-dev \\\n",
|
||||
@@ -244,11 +244,21 @@
|
||||
"source": [
|
||||
"# install python libraries\n",
|
||||
"# thanks toinsson\n",
|
||||
"!pip install sample-factory==2.0.2\n",
|
||||
"!pip install faster-fifo==1.4.2\n",
|
||||
"!pip install vizdoom"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"!pip install sample-factory==2.0.2"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "alxUt7Au-O8e"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
@@ -330,7 +340,7 @@
|
||||
"- 1 available game variable: HEALTH\n",
|
||||
"- death penalty = 100\n",
|
||||
"\n",
|
||||
"You can find out more about the scenarios available in ViZDoom [here](https://github.com/Farama-Foundation/ViZDoom/tree/master/scenarios). \n",
|
||||
"You can find out more about the scenarios available in ViZDoom [here](https://github.com/Farama-Foundation/ViZDoom/tree/master/scenarios).\n",
|
||||
"\n",
|
||||
"There are also a number of more complex scenarios that have been create for ViZDoom, such as the ones detailed on [this github page](https://github.com/edbeeching/3d_control_deep_rl).\n",
|
||||
"\n"
|
||||
@@ -451,7 +461,7 @@
|
||||
"\n",
|
||||
"<img src=\"https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/create-token.jpg\" alt=\"Create HF Token\">\n",
|
||||
"\n",
|
||||
"- Copy the token \n",
|
||||
"- Copy the token\n",
|
||||
"- Run the cell below and paste the token"
|
||||
]
|
||||
},
|
||||
@@ -571,7 +581,7 @@
|
||||
"source": [
|
||||
"## Some additional challenges 🏆: Doom Deathmatch\n",
|
||||
"\n",
|
||||
"Training an agent to play a Doom deathmatch **takes many hours on a more beefy machine than is available in Colab**. \n",
|
||||
"Training an agent to play a Doom deathmatch **takes many hours on a more beefy machine than is available in Colab**.\n",
|
||||
"\n",
|
||||
"Fortunately, we have have **already trained an agent in this scenario and it is available in the 🤗 Hub!** Let’s download the model and visualize the agent’s performance."
|
||||
],
|
||||
|
||||
@@ -3,8 +3,8 @@
|
||||
|
||||
The certification process is **completely free**:
|
||||
|
||||
- To get a *certificate of completion*: you need **to pass 80% of the assignments** before the end of July 2023.
|
||||
- To get a *certificate of excellence*: you need **to pass 100% of the assignments** before the end of July 2023.
|
||||
- To get a *certificate of completion*: you need **to pass 80% of the assignments** before the end of September 2023.
|
||||
- To get a *certificate of excellence*: you need **to pass 100% of the assignments** before the end of September 2023.
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit0/certification.jpg" alt="Course certification" width="100%"/>
|
||||
|
||||
|
||||
@@ -716,13 +716,10 @@ def push_to_hub(repo_id, model, env, video_fps=1, local_repo_path="hub"):
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
|
||||
model = load_from_hub(repo_id="{repo_id}", filename="q-learning.pkl")
|
||||
|
||||
# Don't forget to check if you need to add additional attributes (is_slippery=False etc)
|
||||
env = gym.make(model["env_id"])
|
||||
```
|
||||
"""
|
||||
|
||||
evaluate_agent(env, model["max_steps"], model["n_eval_episodes"], model["qtable"], model["eval_seed"])
|
||||
|
||||
@@ -108,7 +108,10 @@ The first step is to install the dependencies, we’ll install multiple ones:
|
||||
- `huggingface_hub`: Library allowing anyone to work with the Hub repositories.
|
||||
|
||||
```bash
|
||||
!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit6/requirements-unit6.txt
|
||||
!pip install stable-baselines3[extra]==1.8.0
|
||||
!pip install huggingface_sb3
|
||||
!pip install panda_gym==2.0.0
|
||||
!pip install pyglet==1.5.1
|
||||
```
|
||||
|
||||
## Import the packages 📦
|
||||
|
||||
@@ -88,7 +88,7 @@ Finally, you need to install git-lfs: https://git-lfs.com/
|
||||
|
||||
Now that it’s installed, we need to add the environment training executable. Based on your operating system you need to download one of them, unzip it and place it in a new folder inside `ml-agents` that you call `training-envs-executables`
|
||||
|
||||
At the end your executable should be in `mlagents/training-envs-executables/SoccerTwos`
|
||||
At the end your executable should be in `ml-agents/training-envs-executables/SoccerTwos`
|
||||
|
||||
Windows: Download [this executable](https://drive.google.com/file/d/1sqFxbEdTMubjVktnV4C6ICjp89wLhUcP/view?usp=sharing)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user