mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-02-09 05:14:23 +08:00
Merge branch 'huggingface:main' into main
This commit is contained in:
@@ -65,7 +65,7 @@ At every step:
|
||||
|
||||
- *Episodic task* : we have a **starting point and an ending point (a terminal state)**. This creates an episode: a list of States, Actions, Rewards, and new States. For instance, think about Super Mario Bros: an episode begin at the launch of a new Mario Level and ending when you’re killed or you reached the end of the level.
|
||||
|
||||
- *Continous task* : these are tasks that **continue forever (no terminal state)**. In this case, the agent must learn how to choose the best actions and simultaneously interact with the environment.
|
||||
- *Continuous task* : these are tasks that **continue forever (no terminal state)**. In this case, the agent must learn how to choose the best actions and simultaneously interact with the environment.
|
||||
|
||||
<img src="assets/img/tasks.jpg" alt="Task"/>
|
||||
|
||||
|
||||
@@ -269,7 +269,7 @@
|
||||
"One additional library we import is huggingface_hub **to be able to upload and download trained models from the hub**.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"The Hugging Face Hub 🤗 works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easilly collaborate with others.\n",
|
||||
"The Hugging Face Hub 🤗 works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easily collaborate with others.\n",
|
||||
"\n",
|
||||
"You can see here all the Deep reinforcement Learning models available 👉 https://huggingface.co/models?pipeline_tag=reinforcement-learning&sort=downloads\n",
|
||||
"\n"
|
||||
@@ -1021,7 +1021,7 @@
|
||||
"# But Python 3.6, 3.7 use protocol 4\n",
|
||||
"# In order to get compatibility we need to:\n",
|
||||
"# 1. Install pickle5 (we done it at the beginning of the colab)\n",
|
||||
"# 2. Create a custom empty object we pass as paramater to PPO.load()\n",
|
||||
"# 2. Create a custom empty object we pass as parameter to PPO.load()\n",
|
||||
"custom_objects = {\n",
|
||||
" \"learning_rate\": 0.0,\n",
|
||||
" \"lr_schedule\": lambda _: 0.0,\n",
|
||||
|
||||
@@ -318,7 +318,7 @@
|
||||
"\n",
|
||||
"- `sampler=TPESampler()`: This specifies that we want to employ a Bayesian optimization algorithm called Tree-structured Parzen Estimator. Other options are `GridSampler()`, `RandomSampler()`, etc. (The full list can be found <a href=\"https://optuna.readthedocs.io/en/stable/reference/samplers.html\" target=\"_blank\">here</a>.)\n",
|
||||
"- `study_name=\"PPO-LunarLander-v2\"`: This is a name we give to the study (optional).\n",
|
||||
"- `direction=\"maximize\"`: This is to specify that our objective is to maximize (not mimimize) the score.\n",
|
||||
"- `direction=\"maximize\"`: This is to specify that our objective is to maximize (not minimize) the score.\n",
|
||||
"\n",
|
||||
"Once our study is created, we call the `optimize()` method on it, specifying that we want to conduct `10` trials.\n",
|
||||
"\n",
|
||||
|
||||
@@ -191,7 +191,7 @@
|
||||
"- `pygame`: Used for the FrozenLake-v1 and Taxi-v3 UI.\n",
|
||||
"- `numPy`: Used for handling our Q-table.\n",
|
||||
"\n",
|
||||
"The Hugging Face Hub 🤗 works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easilly collaborate with others.\n",
|
||||
"The Hugging Face Hub 🤗 works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easily collaborate with others.\n",
|
||||
"\n",
|
||||
"You can see here all the Deep reinforcement Learning models available 👉 https://huggingface.co/models?other=q-learning\n",
|
||||
"\n"
|
||||
@@ -1232,7 +1232,7 @@
|
||||
"source": [
|
||||
"3️⃣ We're now ready to push our trained agent to the 🤗 Hub 🔥 using `package_to_hub()` function\n",
|
||||
"\n",
|
||||
"- Let's create **the model dictionnary that contains the hyperparameters and the Q_table**."
|
||||
"- Let's create **the model dictionary that contains the hyperparameters and the Q_table**."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -1271,7 +1271,7 @@
|
||||
"- `repo_id`: the name of the Hugging Face Hub Repository that will be created/updated `\n",
|
||||
"(repo_id = {username}/{repo_name})`\n",
|
||||
"💡 **A good name is {username}/q-{env_id}**\n",
|
||||
"- `model`: our model dictionnary containing the hyperparameters and the Qtable.\n",
|
||||
"- `model`: our model dictionary containing the hyperparameters and the Qtable.\n",
|
||||
"- `env`: the environment.\n",
|
||||
"- `commit_message`: message of the commit"
|
||||
]
|
||||
@@ -1309,7 +1309,7 @@
|
||||
"id": "E2875IGsprzq"
|
||||
},
|
||||
"source": [
|
||||
"Congrats 🥳 you've just implented from scratch, trained and uploaded your first Reinforcement Learning agent. \n",
|
||||
"Congrats 🥳 you've just implemented from scratch, trained and uploaded your first Reinforcement Learning agent. \n",
|
||||
"FrozenLake-v1 no_slippery is very simple environment, let's try an harder one 🔥."
|
||||
]
|
||||
},
|
||||
|
||||
@@ -20,6 +20,8 @@
|
||||
"\n",
|
||||
"In this notebook, **you'll train a Deep Q-Learning agent** playing Space Invaders using [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), a training framework based on [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) that provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.\n",
|
||||
"\n",
|
||||
"We're using the [RL-Baselines-3 Zoo integration, a vanilla version of Deep Q-Learning](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html) with no extensions such as Double-DQN, Dueling-DQN, and Prioritized Experience Replay.\n",
|
||||
"\n",
|
||||
"❓ If you have questions, please post them on #study-group-unit3 discord channel 👉 https://discord.gg/aYka4Yhff9\n",
|
||||
"\n",
|
||||
"🎮 Environments: \n",
|
||||
|
||||
@@ -42,6 +42,7 @@ You can work directly **with the colab notebook, which allows you not to have to
|
||||
## Additional readings 📚
|
||||
- [Foundations of Deep RL Series, L3 Policy Gradients and Advantage Estimation by Pieter Abbeel](https://youtu.be/AKbX1Zvo7r8)
|
||||
- [Policy Gradient Algorithms](https://lilianweng.github.io/posts/2018-04-08-policy-gradient/)
|
||||
- [An Intuitive Explanation of Policy Gradient](https://towardsdatascience.com/an-intuitive-explanation-of-policy-gradient-part-1-reinforce-aa4392cbfd3c)
|
||||
|
||||
## How to make the most of this course
|
||||
|
||||
|
||||
@@ -190,7 +190,7 @@
|
||||
"\n",
|
||||
"- `gym`\n",
|
||||
"- `gym-games`: Extra gym environments made with PyGame.\n",
|
||||
"- `huggingface_hub`: 🤗 works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easilly collaborate with others.\n",
|
||||
"- `huggingface_hub`: 🤗 works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easily collaborate with others.\n",
|
||||
"\n",
|
||||
"You can see here all the Reinforce models available 👉 https://huggingface.co/models?other=reinforce\n",
|
||||
"\n",
|
||||
|
||||
Reference in New Issue
Block a user