Merge branch 'huggingface:main' into main

This commit is contained in:
Chase Lambert
2022-08-21 12:49:05 -07:00
committed by GitHub
7 changed files with 12 additions and 9 deletions

View File

@@ -65,7 +65,7 @@ At every step:
- *Episodic task* : we have a **starting point and an ending point (a terminal state)**. This creates an episode: a list of States, Actions, Rewards, and new States. For instance, think about Super Mario Bros: an episode begin at the launch of a new Mario Level and ending when youre killed or you reached the end of the level.
- *Continous task* : these are tasks that **continue forever (no terminal state)**. In this case, the agent must learn how to choose the best actions and simultaneously interact with the environment.
- *Continuous task* : these are tasks that **continue forever (no terminal state)**. In this case, the agent must learn how to choose the best actions and simultaneously interact with the environment.
<img src="assets/img/tasks.jpg" alt="Task"/>

View File

@@ -269,7 +269,7 @@
"One additional library we import is huggingface_hub **to be able to upload and download trained models from the hub**.\n",
"\n",
"\n",
"The Hugging Face Hub 🤗 works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easilly collaborate with others.\n",
"The Hugging Face Hub 🤗 works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easily collaborate with others.\n",
"\n",
"You can see here all the Deep reinforcement Learning models available 👉 https://huggingface.co/models?pipeline_tag=reinforcement-learning&sort=downloads\n",
"\n"
@@ -1021,7 +1021,7 @@
"# But Python 3.6, 3.7 use protocol 4\n",
"# In order to get compatibility we need to:\n",
"# 1. Install pickle5 (we done it at the beginning of the colab)\n",
"# 2. Create a custom empty object we pass as paramater to PPO.load()\n",
"# 2. Create a custom empty object we pass as parameter to PPO.load()\n",
"custom_objects = {\n",
" \"learning_rate\": 0.0,\n",
" \"lr_schedule\": lambda _: 0.0,\n",

View File

@@ -318,7 +318,7 @@
"\n",
"- `sampler=TPESampler()`: This specifies that we want to employ a Bayesian optimization algorithm called Tree-structured Parzen Estimator. Other options are `GridSampler()`, `RandomSampler()`, etc. (The full list can be found <a href=\"https://optuna.readthedocs.io/en/stable/reference/samplers.html\" target=\"_blank\">here</a>.)\n",
"- `study_name=\"PPO-LunarLander-v2\"`: This is a name we give to the study (optional).\n",
"- `direction=\"maximize\"`: This is to specify that our objective is to maximize (not mimimize) the score.\n",
"- `direction=\"maximize\"`: This is to specify that our objective is to maximize (not minimize) the score.\n",
"\n",
"Once our study is created, we call the `optimize()` method on it, specifying that we want to conduct `10` trials.\n",
"\n",

View File

@@ -191,7 +191,7 @@
"- `pygame`: Used for the FrozenLake-v1 and Taxi-v3 UI.\n",
"- `numPy`: Used for handling our Q-table.\n",
"\n",
"The Hugging Face Hub 🤗 works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easilly collaborate with others.\n",
"The Hugging Face Hub 🤗 works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easily collaborate with others.\n",
"\n",
"You can see here all the Deep reinforcement Learning models available 👉 https://huggingface.co/models?other=q-learning\n",
"\n"
@@ -1232,7 +1232,7 @@
"source": [
"3⃣ We're now ready to push our trained agent to the 🤗 Hub 🔥 using `package_to_hub()` function\n",
"\n",
"- Let's create **the model dictionnary that contains the hyperparameters and the Q_table**."
"- Let's create **the model dictionary that contains the hyperparameters and the Q_table**."
]
},
{
@@ -1271,7 +1271,7 @@
"- `repo_id`: the name of the Hugging Face Hub Repository that will be created/updated `\n",
"(repo_id = {username}/{repo_name})`\n",
"💡 **A good name is {username}/q-{env_id}**\n",
"- `model`: our model dictionnary containing the hyperparameters and the Qtable.\n",
"- `model`: our model dictionary containing the hyperparameters and the Qtable.\n",
"- `env`: the environment.\n",
"- `commit_message`: message of the commit"
]
@@ -1309,7 +1309,7 @@
"id": "E2875IGsprzq"
},
"source": [
"Congrats 🥳 you've just implented from scratch, trained and uploaded your first Reinforcement Learning agent. \n",
"Congrats 🥳 you've just implemented from scratch, trained and uploaded your first Reinforcement Learning agent. \n",
"FrozenLake-v1 no_slippery is very simple environment, let's try an harder one 🔥."
]
},

View File

@@ -20,6 +20,8 @@
"\n",
"In this notebook, **you'll train a Deep Q-Learning agent** playing Space Invaders using [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), a training framework based on [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) that provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.\n",
"\n",
"We're using the [RL-Baselines-3 Zoo integration, a vanilla version of Deep Q-Learning](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html) with no extensions such as Double-DQN, Dueling-DQN, and Prioritized Experience Replay.\n",
"\n",
"❓ If you have questions, please post them on #study-group-unit3 discord channel 👉 https://discord.gg/aYka4Yhff9\n",
"\n",
"🎮 Environments: \n",

View File

@@ -42,6 +42,7 @@ You can work directly **with the colab notebook, which allows you not to have to
## Additional readings 📚
- [Foundations of Deep RL Series, L3 Policy Gradients and Advantage Estimation by Pieter Abbeel](https://youtu.be/AKbX1Zvo7r8)
- [Policy Gradient Algorithms](https://lilianweng.github.io/posts/2018-04-08-policy-gradient/)
- [An Intuitive Explanation of Policy Gradient](https://towardsdatascience.com/an-intuitive-explanation-of-policy-gradient-part-1-reinforce-aa4392cbfd3c)
## How to make the most of this course

View File

@@ -190,7 +190,7 @@
"\n",
"- `gym`\n",
"- `gym-games`: Extra gym environments made with PyGame.\n",
"- `huggingface_hub`: 🤗 works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easilly collaborate with others.\n",
"- `huggingface_hub`: 🤗 works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easily collaborate with others.\n",
"\n",
"You can see here all the Reinforce models available 👉 https://huggingface.co/models?other=reinforce\n",
"\n",