From c98bc5445248eb4547b34aa922059c3df577dae8 Mon Sep 17 00:00:00 2001 From: Andrii Roiko Date: Sat, 6 Aug 2022 22:51:00 +0300 Subject: [PATCH 1/3] Fix typos --- unit1/quiz.md | 2 +- unit1/unit1.ipynb | 4 ++-- unit1/unit1_optuna_guide.ipynb | 2 +- unit2/unit2.ipynb | 8 ++++---- unit5/unit5.ipynb | 2 +- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/unit1/quiz.md b/unit1/quiz.md index 74c5d37..000ed85 100644 --- a/unit1/quiz.md +++ b/unit1/quiz.md @@ -65,7 +65,7 @@ At every step: - *Episodic task* : we have a **starting point and an ending point (a terminal state)**. This creates an episode: a list of States, Actions, Rewards, and new States. For instance, think about Super Mario Bros: an episode begin at the launch of a new Mario Level and ending when youโ€™re killed or you reached the end of the level. -- *Continous task* : these are tasks that **continue forever (no terminal state)**. In this case, the agent must learn how to choose the best actions and simultaneously interact with the environment. +- *Continuous task* : these are tasks that **continue forever (no terminal state)**. In this case, the agent must learn how to choose the best actions and simultaneously interact with the environment. Task diff --git a/unit1/unit1.ipynb b/unit1/unit1.ipynb index 1e141eb..09d1c54 100644 --- a/unit1/unit1.ipynb +++ b/unit1/unit1.ipynb @@ -269,7 +269,7 @@ "One additional library we import is huggingface_hub **to be able to upload and download trained models from the hub**.\n", "\n", "\n", - "The Hugging Face Hub ๐Ÿค— works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easilly collaborate with others.\n", + "The Hugging Face Hub ๐Ÿค— works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easily collaborate with others.\n", "\n", "You can see here all the Deep reinforcement Learning models available ๐Ÿ‘‰ https://huggingface.co/models?pipeline_tag=reinforcement-learning&sort=downloads\n", "\n" @@ -1021,7 +1021,7 @@ "# But Python 3.6, 3.7 use protocol 4\n", "# In order to get compatibility we need to:\n", "# 1. Install pickle5 (we done it at the beginning of the colab)\n", - "# 2. Create a custom empty object we pass as paramater to PPO.load()\n", + "# 2. Create a custom empty object we pass as parameter to PPO.load()\n", "custom_objects = {\n", " \"learning_rate\": 0.0,\n", " \"lr_schedule\": lambda _: 0.0,\n", diff --git a/unit1/unit1_optuna_guide.ipynb b/unit1/unit1_optuna_guide.ipynb index 419ea7c..209e9d6 100644 --- a/unit1/unit1_optuna_guide.ipynb +++ b/unit1/unit1_optuna_guide.ipynb @@ -318,7 +318,7 @@ "\n", "- `sampler=TPESampler()`: This specifies that we want to employ a Bayesian optimization algorithm called Tree-structured Parzen Estimator. Other options are `GridSampler()`, `RandomSampler()`, etc. (The full list can be found here.)\n", "- `study_name=\"PPO-LunarLander-v2\"`: This is a name we give to the study (optional).\n", - "- `direction=\"maximize\"`: This is to specify that our objective is to maximize (not mimimize) the score.\n", + "- `direction=\"maximize\"`: This is to specify that our objective is to maximize (not minimize) the score.\n", "\n", "Once our study is created, we call the `optimize()` method on it, specifying that we want to conduct `10` trials.\n", "\n", diff --git a/unit2/unit2.ipynb b/unit2/unit2.ipynb index 497bfee..a5c947e 100644 --- a/unit2/unit2.ipynb +++ b/unit2/unit2.ipynb @@ -191,7 +191,7 @@ "- `pygame`: Used for the FrozenLake-v1 and Taxi-v3 UI.\n", "- `numPy`: Used for handling our Q-table.\n", "\n", - "The Hugging Face Hub ๐Ÿค— works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easilly collaborate with others.\n", + "The Hugging Face Hub ๐Ÿค— works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easily collaborate with others.\n", "\n", "You can see here all the Deep reinforcement Learning models available ๐Ÿ‘‰ https://huggingface.co/models?other=q-learning\n", "\n" @@ -1233,7 +1233,7 @@ "source": [ "3๏ธโƒฃ We're now ready to push our trained agent to the ๐Ÿค— Hub ๐Ÿ”ฅ using `package_to_hub()` function\n", "\n", - "- Let's create **the model dictionnary that contains the hyperparameters and the Q_table**." + "- Let's create **the model dictionary that contains the hyperparameters and the Q_table**." ] }, { @@ -1273,7 +1273,7 @@ "- `repo_id`: the name of the Hugging Face Hub Repository that will be created/updated `\n", "(repo_id = {username}/{repo_name})`\n", "๐Ÿ’ก **A good name is {username}/q-{env_id}**\n", - "- `model`: our model dictionnary containing the hyperparameters and the Qtable.\n", + "- `model`: our model dictionary containing the hyperparameters and the Qtable.\n", "- `env`: the environment.\n", "- `commit_message`: message of the commit" ] @@ -1311,7 +1311,7 @@ "id": "E2875IGsprzq" }, "source": [ - "Congrats ๐Ÿฅณ you've just implented from scratch, trained and uploaded your first Reinforcement Learning agent. \n", + "Congrats ๐Ÿฅณ you've just implemented from scratch, trained and uploaded your first Reinforcement Learning agent. \n", "FrozenLake-v1 no_slippery is very simple environment, let's try an harder one ๐Ÿ”ฅ." ] }, diff --git a/unit5/unit5.ipynb b/unit5/unit5.ipynb index ec2524a..bfed957 100644 --- a/unit5/unit5.ipynb +++ b/unit5/unit5.ipynb @@ -190,7 +190,7 @@ "\n", "- `gym`\n", "- `gym-games`: Extra gym environments made with PyGame.\n", - "- `huggingface_hub`: ๐Ÿค— works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easilly collaborate with others.\n", + "- `huggingface_hub`: ๐Ÿค— works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easily collaborate with others.\n", "\n", "You can see here all the Reinforce models available ๐Ÿ‘‰ https://huggingface.co/models?other=reinforce\n", "\n", From ffd19903dc4f9a0336a5dc7adf626c182643fcb0 Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Tue, 9 Aug 2022 08:31:46 +0200 Subject: [PATCH 2/3] Update info about Double DQN thanks chase for the feedback --- unit3/unit3.ipynb | 2 ++ 1 file changed, 2 insertions(+) diff --git a/unit3/unit3.ipynb b/unit3/unit3.ipynb index 5999413..d353c5f 100644 --- a/unit3/unit3.ipynb +++ b/unit3/unit3.ipynb @@ -20,6 +20,8 @@ "\n", "In this notebook, **you'll train a Deep Q-Learning agent** playing Space Invaders using [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), a training framework based on [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) that provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.\n", "\n", + "We're using the [RL-Baselines-3 Zoo integration, a vanilla version of Deep Q-Learning](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html) with no extensions such as Double-DQN, Dueling-DQN, and Prioritized Experience Replay.\n", + "\n", "โ“ If you have questions, please post them on #study-group-unit3 discord channel ๐Ÿ‘‰ https://discord.gg/aYka4Yhff9\n", "\n", "๐ŸŽฎ Environments: \n", From 40d3251a68f8a53358c4df38116d0ba0daaf6dfd Mon Sep 17 00:00:00 2001 From: Nikita Melkozerov Date: Sun, 14 Aug 2022 19:15:20 +0200 Subject: [PATCH 3/3] Update additional resources for PG chapter. --- unit5/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/unit5/README.md b/unit5/README.md index 419b0df..2b50dbe 100644 --- a/unit5/README.md +++ b/unit5/README.md @@ -42,6 +42,7 @@ You can work directly **with the colab notebook, which allows you not to have to ## Additional readings ๐Ÿ“š - [Foundations of Deep RL Series, L3 Policy Gradients and Advantage Estimation by Pieter Abbeel](https://youtu.be/AKbX1Zvo7r8) - [Policy Gradient Algorithms](https://lilianweng.github.io/posts/2018-04-08-policy-gradient/) +- [An Intuitive Explanation of Policy Gradient](https://towardsdatascience.com/an-intuitive-explanation-of-policy-gradient-part-1-reinforce-aa4392cbfd3c) ## How to make the most of this course