Merge pull request #28 from sambitmukherjee/main

Added Unit 1 Special Content: Optuna Guide.
This commit is contained in:
Thomas Simonini
2022-06-09 17:49:08 +02:00
committed by GitHub

View File

@@ -0,0 +1,546 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Unit 1 Special Content: Optuna Guide.ipynb",
"provenance": [],
"collapsed_sections": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# Unit 1 Special Content: Optuna Guide"
],
"metadata": {
"id": "RpLWg64qubE9"
}
},
{
"cell_type": "markdown",
"source": [
"In this notebook, we shall see how to use Optuna to perform hyperparameter tuning of Unit 1's <a href=\"https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html\" target=\"_blank\">`PPO`</a> model (created using Stable-Baselines3 for the `\"LunarLander-v2\"` Gym environment).\n",
"\n",
"Optuna is an open-source, automatic hyperparameter optimization framework. You can read more about it <a href=\"https://tech.preferred.jp/en/blog/optuna-release/\" target=\"_blank\">here</a>.\n",
"\n",
"**Prerequisite:** Before going through this notebook, you should have completed the <a href=\"https://colab.research.google.com/github/huggingface/deep-rl-class/blob/main/unit1/unit1.ipynb\" target=\"_blank\">Unit 1 hands-on</a>."
],
"metadata": {
"id": "cWpMuD7jrfze"
}
},
{
"cell_type": "markdown",
"source": [
"## Virtual Display Setup"
],
"metadata": {
"id": "InZX9_sRtfbO"
}
},
{
"cell_type": "markdown",
"source": [
"We'll need to generate a replay video. To do so in Colab, we need to have a virtual display to be able to render the environment (and thus record the frames).\n",
"\n",
"The following cell will install virtual display libraries."
],
"metadata": {
"id": "QvvCztQ5tZxq"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "oUfu-vzL4g97"
},
"outputs": [],
"source": [
"!apt install python-opengl\n",
"!apt install ffmpeg\n",
"!apt install xvfb\n",
"!pip install pyvirtualdisplay"
]
},
{
"cell_type": "markdown",
"source": [
"Now, let's create & start a virtual display."
],
"metadata": {
"id": "UVt5deF2qorb"
}
},
{
"cell_type": "code",
"source": [
"from pyvirtualdisplay import Display\n",
"\n",
"virtual_display = Display(visible=0, size=(1400, 900))\n",
"virtual_display.start()"
],
"metadata": {
"id": "6E1inOK_5Jko"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Dependencies, Imports & Gym Environments"
],
"metadata": {
"id": "B2sWJAj8ti1y"
}
},
{
"cell_type": "markdown",
"source": [
"Let's install all the other dependencies we'll need."
],
"metadata": {
"id": "_HVFbCW5tnyw"
}
},
{
"cell_type": "code",
"source": [
"!pip install gym[box2d]\n",
"!pip install stable-baselines3[extra]\n",
"!pip install pyglet\n",
"!pip install ale-py==0.7.4 # To overcome an issue with Gym (https://github.com/DLR-RM/stable-baselines3/issues/875)\n",
"!pip install optuna\n",
"!pip install huggingface_sb3"
],
"metadata": {
"id": "liGH8sAg5dk9"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Next, let's perform all the necessary imports."
],
"metadata": {
"id": "WWcb8nUVt-A9"
}
},
{
"cell_type": "code",
"source": [
"import gym\n",
"\n",
"from stable_baselines3.common.env_util import make_vec_env\n",
"from stable_baselines3.common.monitor import Monitor\n",
"from stable_baselines3 import PPO\n",
"from stable_baselines3.common.evaluation import evaluate_policy\n",
"from stable_baselines3.common.vec_env import DummyVecEnv\n",
"\n",
"import optuna\n",
"from optuna.samplers import TPESampler\n",
"\n",
"from huggingface_hub import notebook_login\n",
"from huggingface_sb3 import package_to_hub"
],
"metadata": {
"id": "DivjESMF5i7M"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Finally, let's create our Gym environments. The training environment is a vectorized environment:"
],
"metadata": {
"id": "uaXEISAluKU2"
}
},
{
"cell_type": "code",
"source": [
"env = make_vec_env(\"LunarLander-v2\", n_envs=16)\n",
"env"
],
"metadata": {
"id": "WTHPMs_s7YFc"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"And the evaluation environment is a separate environment:"
],
"metadata": {
"id": "NTtoXSFiOR6f"
}
},
{
"cell_type": "code",
"source": [
"eval_env = Monitor(gym.make(\"LunarLander-v2\"))"
],
"metadata": {
"id": "ECDFf_6OOXqO"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"We are now ready to dive into hyperparameter tuning!"
],
"metadata": {
"id": "sjyVRth8uSVm"
}
},
{
"cell_type": "markdown",
"source": [
"## Hyperparameter Tuning"
],
"metadata": {
"id": "kz0QGAOnust1"
}
},
{
"cell_type": "markdown",
"source": [
"First, let's define a `run_training()` function that trains a single model (using a particular combination of hyperparameter values), and returns a score. \n",
"\n",
"The score tells us how good the particular combination of hyperparameters is. (In our case, the score is `mean_reward - std_reward`, which is being used in the <a href=\"https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard\" target=\"_blank\">leaderboard</a>.) \n",
"\n",
"The function takes a very special argument - `params`, which is a dictionary. **The keys of this dictionary are the names of the hyperparameters we're tuning**, and **the values are sampled at each trial by Optuna's sampler** (from ranges that we'll specify soon).\n",
"\n",
"For example, in a particular trial, `params` might look like this:\n",
"\n",
"```\n",
"{'n_epochs': 5, 'gamma': 0.9926, 'total_timesteps': 559_621}\n",
"```\n",
"\n",
"And in another trial, `params` might look like this:\n",
"\n",
"```\n",
"{'n_epochs': 3, 'gamma': 0.9974, 'total_timesteps': 1_728_482}\n",
"```"
],
"metadata": {
"id": "02BT2bxVXEHj"
}
},
{
"cell_type": "code",
"source": [
"def run_training(params, verbose=0, save_model=False):\n",
" model = PPO(\n",
" policy='MlpPolicy', \n",
" env=env, \n",
" n_steps=1024,\n",
" batch_size=64, \n",
" n_epochs=params['n_epochs'], # We're tuning this.\n",
" gamma=params['gamma'], # We're tuning this.\n",
" gae_lambda=0.98, \n",
" ent_coef=0.01, \n",
" verbose=verbose\n",
" )\n",
" model.learn(total_timesteps=params['total_timesteps']) # We're tuning this.\n",
"\n",
" mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=50, deterministic=True)\n",
" score = mean_reward - std_reward\n",
"\n",
" if save_model:\n",
" model.save(\"PPO-LunarLander-v2\")\n",
"\n",
" return model, score"
],
"metadata": {
"id": "kpDGvnBS6t57"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Next, we define another function - `objective()`. This function has a single parameter `trial`, which is an object of type `optuna.trial.Trial`. Using this `trial` object, we specify the ranges for the different hyperparameters we want to explore:\n",
"\n",
"- For `n_epochs`: We want to explore integer values between `3` and `5`.\n",
"- For `gamma`: We want to explore floating point values between `0.9900` and `0.9999` (drawn from a uniform distribution).\n",
"- For `total_timesteps`: We want to explore integer values between `500_000` and `2_000_000`.\n",
"\n",
"**Note:** If you have more time available, then you can tune other hyperparameters too. Moreover, you can explore wider ranges for each hyperparameter.\n",
"\n",
"The `trial.suggest_int()` and `trial.suggest_uniform()` methods are used by Optuna to suggest hyperparamter values in the ranges specified. The suggested combination of values are then used to train a model and return the score."
],
"metadata": {
"id": "dORGHcVYdSKp"
}
},
{
"cell_type": "code",
"source": [
"def objective(trial):\n",
" params = {\n",
" \"n_epochs\": trial.suggest_int(\"n_epochs\", 3, 5), \n",
" \"gamma\": trial.suggest_uniform(\"gamma\", 0.9900, 0.9999), \n",
" \"total_timesteps\": trial.suggest_int(\"total_timesteps\", 500_000, 2_000_000)\n",
" }\n",
" model, score = run_training(params)\n",
" return score"
],
"metadata": {
"id": "Wapg9hTI-AGz"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Finally, we use Optuna's `create_study()` function to create a study, passing in:\n",
"\n",
"- `sampler=TPESampler()`: This specifies that we want to employ a Bayesian optimization algorithm called Tree-structured Parzen Estimator. Other options are `GridSampler()`, `RandomSampler()`, etc. (The full list can be found <a href=\"https://optuna.readthedocs.io/en/stable/reference/samplers.html\" target=\"_blank\">here</a>.)\n",
"- `study_name=\"PPO-LunarLander-v2\"`: This is a name we give to the study (optional).\n",
"- `direction=\"maximize\"`: This is to specify that our objective is to maximize (not mimimize) the score.\n",
"\n",
"Once our study is created, we call the `optimize()` method on it, specifying that we want to conduct `10` trials.\n",
"\n",
"**Note:** If you have more time available, then you can conduct more than `10` trials.\n",
"\n",
"**Warning:** The below code cell will take quite a bit of time to run!"
],
"metadata": {
"id": "a2gToNx7gPEG"
}
},
{
"cell_type": "code",
"source": [
"study = optuna.create_study(sampler=TPESampler(), study_name=\"PPO-LunarLander-v2\", direction=\"maximize\")\n",
"study.optimize(objective, n_trials=10)"
],
"metadata": {
"id": "0v9G57g4-gl_"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now that all the `10` trials have concluded, let's print out the score and hyperparameters of the best trial."
],
"metadata": {
"id": "9qRTUBAExSLt"
}
},
{
"cell_type": "code",
"source": [
"print(\"Best trial score:\", study.best_trial.values)\n",
"print(\"Best trial hyperparameters:\", study.best_trial.params)"
],
"metadata": {
"id": "N-JFso9Lvurh"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Recreating & Saving The Best Model"
],
"metadata": {
"id": "qiSBZmZDux4q"
}
},
{
"cell_type": "markdown",
"source": [
"Let's recreate the best model and save it."
],
"metadata": {
"id": "SuZ4OSayyq36"
}
},
{
"cell_type": "code",
"source": [
"model, score = run_training(study.best_trial.params, verbose=1, save_model=True)"
],
"metadata": {
"id": "8JVUHgljIW1E"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Pushing to Hugging Face Hub"
],
"metadata": {
"id": "0eja4YaFu5Vb"
}
},
{
"cell_type": "markdown",
"source": [
"To be able to share your model with the community, there are three more steps to follow:\n",
"\n",
"1. (If not done already) create a Hugging Face account -> https://huggingface.co/join\n",
"\n",
"2. Sign in and then, get your authentication token from the Hugging Face website.\n",
"\n",
"- Create a new token (https://huggingface.co/settings/tokens) **with write role**.\n",
"- Copy the token.\n",
"- Run the cell below and paste the token."
],
"metadata": {
"id": "qDjEim2izuFi"
}
},
{
"cell_type": "code",
"source": [
"notebook_login()\n",
"!git config --global credential.helper store"
],
"metadata": {
"id": "msFVH2qhzwpE"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"If you aren't using Google Colab or Jupyter Notebook, you need to use this command instead: `huggingface-cli login`\n",
"\n",
"3. We're now ready to push our trained agent to the Hub using the `package_to_hub()` function.\n",
"\n",
"Let's fill in the arguments of the `package_to_hub` function:\n",
"\n",
"- `model`: our trained model\n",
"\n",
"- `model_name`: the name of the trained model that we defined in `model.save()`\n",
"\n",
"- `model_architecture`: the model architecture we used (in our case `\"PPO\"`)\n",
"\n",
"- `env_id`: the name of the environment (in our case `\"LunarLander-v2\"`)\n",
"\n",
"- `eval_env`: the evaluation environment\n",
"\n",
"- `repo_id`: the name of the Hugging Face Hub repository that will be created/updated `(repo_id=\"{username}/{repo_name}\")` (**Note:** A good `repo_id` is `\"{username}/{model_architecture}-{env_id}\"`.)\n",
"\n",
"- `commit_message`: the commit message"
],
"metadata": {
"id": "lD4ACH160L_5"
}
},
{
"cell_type": "code",
"source": [
"model_name = \"PPO-LunarLander-v2\"\n",
"model_architecture = \"PPO\"\n",
"env_id = \"LunarLander-v2\"\n",
"eval_env = DummyVecEnv([lambda: gym.make(env_id)])\n",
"repo_id = \"Sadhaklal/PPO-LunarLander-v2\"\n",
"commit_message = \"Upload best PPO LunarLander-v2 agent (tuned with Optuna).\""
],
"metadata": {
"id": "8JkhrjAt0O6a"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"The following function call will evaluate the agent, record a replay, generate a model card, and push your agent to the Hub."
],
"metadata": {
"id": "3HIT-M2C1l4E"
}
},
{
"cell_type": "code",
"source": [
"package_to_hub(\n",
" model=model, \n",
" model_name=model_name, \n",
" model_architecture=model_architecture, \n",
" env_id=env_id, \n",
" eval_env=eval_env, \n",
" repo_id=repo_id, \n",
" commit_message=commit_message\n",
")"
],
"metadata": {
"id": "1D0wQ_PU1mTN"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"That's it! You now know how to perform hyperparameter tuning of Stable-Baselines3 models using Optuna.\n",
"\n",
"To get even better results, try tuning the other hyperparameters of your model."
],
"metadata": {
"id": "7jLc9JDY11EW"
}
},
{
"cell_type": "markdown",
"source": [
"## Final Tips"
],
"metadata": {
"id": "Fa4TtHLAtq7-"
}
},
{
"cell_type": "markdown",
"source": [
"1. Read the <a href=\"https://optuna.readthedocs.io/en/stable/index.html\" target=\"_blank\">Optuna documentation</a> to get more familiar with the library and its features.\n",
"2. You may have noticed that hyperparameter tuning is a time consuming process. However, it can be sped up significantly using parallelization. Check out <a href=\"https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/004_distributed.html\" target=\"_blank\">this guide</a> on how to do so."
],
"metadata": {
"id": "tEgxhSjf4loa"
}
},
{
"cell_type": "code",
"source": [
""
],
"metadata": {
"id": "C7CJ7x7n4HuA"
},
"execution_count": null,
"outputs": []
}
]
}