From fea234e5e2f6b4ed468a8153a24ed49d0121b82e Mon Sep 17 00:00:00 2001 From: Sambit Mukherjee Date: Tue, 17 May 2022 16:10:07 +0530 Subject: [PATCH] Added Unit 1 Special Content: Optuna Guide. Wrote a notebook on how to use Optuna for hyperparameter tuning of Unit 1's PPO model for the "LunarLander-v2" Gym environment. --- unit1/unit1_optuna_guide.ipynb | 546 +++++++++++++++++++++++++++++++++ 1 file changed, 546 insertions(+) create mode 100644 unit1/unit1_optuna_guide.ipynb diff --git a/unit1/unit1_optuna_guide.ipynb b/unit1/unit1_optuna_guide.ipynb new file mode 100644 index 0000000..419ea7c --- /dev/null +++ b/unit1/unit1_optuna_guide.ipynb @@ -0,0 +1,546 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Unit 1 Special Content: Optuna Guide.ipynb", + "provenance": [], + "collapsed_sections": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# Unit 1 Special Content: Optuna Guide" + ], + "metadata": { + "id": "RpLWg64qubE9" + } + }, + { + "cell_type": "markdown", + "source": [ + "In this notebook, we shall see how to use Optuna to perform hyperparameter tuning of Unit 1's `PPO` model (created using Stable-Baselines3 for the `\"LunarLander-v2\"` Gym environment).\n", + "\n", + "Optuna is an open-source, automatic hyperparameter optimization framework. You can read more about it here.\n", + "\n", + "**Prerequisite:** Before going through this notebook, you should have completed the Unit 1 hands-on." + ], + "metadata": { + "id": "cWpMuD7jrfze" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Virtual Display Setup" + ], + "metadata": { + "id": "InZX9_sRtfbO" + } + }, + { + "cell_type": "markdown", + "source": [ + "We'll need to generate a replay video. To do so in Colab, we need to have a virtual display to be able to render the environment (and thus record the frames).\n", + "\n", + "The following cell will install virtual display libraries." + ], + "metadata": { + "id": "QvvCztQ5tZxq" + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oUfu-vzL4g97" + }, + "outputs": [], + "source": [ + "!apt install python-opengl\n", + "!apt install ffmpeg\n", + "!apt install xvfb\n", + "!pip install pyvirtualdisplay" + ] + }, + { + "cell_type": "markdown", + "source": [ + "Now, let's create & start a virtual display." + ], + "metadata": { + "id": "UVt5deF2qorb" + } + }, + { + "cell_type": "code", + "source": [ + "from pyvirtualdisplay import Display\n", + "\n", + "virtual_display = Display(visible=0, size=(1400, 900))\n", + "virtual_display.start()" + ], + "metadata": { + "id": "6E1inOK_5Jko" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Dependencies, Imports & Gym Environments" + ], + "metadata": { + "id": "B2sWJAj8ti1y" + } + }, + { + "cell_type": "markdown", + "source": [ + "Let's install all the other dependencies we'll need." + ], + "metadata": { + "id": "_HVFbCW5tnyw" + } + }, + { + "cell_type": "code", + "source": [ + "!pip install gym[box2d]\n", + "!pip install stable-baselines3[extra]\n", + "!pip install pyglet\n", + "!pip install ale-py==0.7.4 # To overcome an issue with Gym (https://github.com/DLR-RM/stable-baselines3/issues/875)\n", + "!pip install optuna\n", + "!pip install huggingface_sb3" + ], + "metadata": { + "id": "liGH8sAg5dk9" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Next, let's perform all the necessary imports." + ], + "metadata": { + "id": "WWcb8nUVt-A9" + } + }, + { + "cell_type": "code", + "source": [ + "import gym\n", + "\n", + "from stable_baselines3.common.env_util import make_vec_env\n", + "from stable_baselines3.common.monitor import Monitor\n", + "from stable_baselines3 import PPO\n", + "from stable_baselines3.common.evaluation import evaluate_policy\n", + "from stable_baselines3.common.vec_env import DummyVecEnv\n", + "\n", + "import optuna\n", + "from optuna.samplers import TPESampler\n", + "\n", + "from huggingface_hub import notebook_login\n", + "from huggingface_sb3 import package_to_hub" + ], + "metadata": { + "id": "DivjESMF5i7M" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Finally, let's create our Gym environments. The training environment is a vectorized environment:" + ], + "metadata": { + "id": "uaXEISAluKU2" + } + }, + { + "cell_type": "code", + "source": [ + "env = make_vec_env(\"LunarLander-v2\", n_envs=16)\n", + "env" + ], + "metadata": { + "id": "WTHPMs_s7YFc" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "And the evaluation environment is a separate environment:" + ], + "metadata": { + "id": "NTtoXSFiOR6f" + } + }, + { + "cell_type": "code", + "source": [ + "eval_env = Monitor(gym.make(\"LunarLander-v2\"))" + ], + "metadata": { + "id": "ECDFf_6OOXqO" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "We are now ready to dive into hyperparameter tuning!" + ], + "metadata": { + "id": "sjyVRth8uSVm" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Hyperparameter Tuning" + ], + "metadata": { + "id": "kz0QGAOnust1" + } + }, + { + "cell_type": "markdown", + "source": [ + "First, let's define a `run_training()` function that trains a single model (using a particular combination of hyperparameter values), and returns a score. \n", + "\n", + "The score tells us how good the particular combination of hyperparameters is. (In our case, the score is `mean_reward - std_reward`, which is being used in the leaderboard.) \n", + "\n", + "The function takes a very special argument - `params`, which is a dictionary. **The keys of this dictionary are the names of the hyperparameters we're tuning**, and **the values are sampled at each trial by Optuna's sampler** (from ranges that we'll specify soon).\n", + "\n", + "For example, in a particular trial, `params` might look like this:\n", + "\n", + "```\n", + "{'n_epochs': 5, 'gamma': 0.9926, 'total_timesteps': 559_621}\n", + "```\n", + "\n", + "And in another trial, `params` might look like this:\n", + "\n", + "```\n", + "{'n_epochs': 3, 'gamma': 0.9974, 'total_timesteps': 1_728_482}\n", + "```" + ], + "metadata": { + "id": "02BT2bxVXEHj" + } + }, + { + "cell_type": "code", + "source": [ + "def run_training(params, verbose=0, save_model=False):\n", + " model = PPO(\n", + " policy='MlpPolicy', \n", + " env=env, \n", + " n_steps=1024,\n", + " batch_size=64, \n", + " n_epochs=params['n_epochs'], # We're tuning this.\n", + " gamma=params['gamma'], # We're tuning this.\n", + " gae_lambda=0.98, \n", + " ent_coef=0.01, \n", + " verbose=verbose\n", + " )\n", + " model.learn(total_timesteps=params['total_timesteps']) # We're tuning this.\n", + "\n", + " mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=50, deterministic=True)\n", + " score = mean_reward - std_reward\n", + "\n", + " if save_model:\n", + " model.save(\"PPO-LunarLander-v2\")\n", + "\n", + " return model, score" + ], + "metadata": { + "id": "kpDGvnBS6t57" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Next, we define another function - `objective()`. This function has a single parameter `trial`, which is an object of type `optuna.trial.Trial`. Using this `trial` object, we specify the ranges for the different hyperparameters we want to explore:\n", + "\n", + "- For `n_epochs`: We want to explore integer values between `3` and `5`.\n", + "- For `gamma`: We want to explore floating point values between `0.9900` and `0.9999` (drawn from a uniform distribution).\n", + "- For `total_timesteps`: We want to explore integer values between `500_000` and `2_000_000`.\n", + "\n", + "**Note:** If you have more time available, then you can tune other hyperparameters too. Moreover, you can explore wider ranges for each hyperparameter.\n", + "\n", + "The `trial.suggest_int()` and `trial.suggest_uniform()` methods are used by Optuna to suggest hyperparamter values in the ranges specified. The suggested combination of values are then used to train a model and return the score." + ], + "metadata": { + "id": "dORGHcVYdSKp" + } + }, + { + "cell_type": "code", + "source": [ + "def objective(trial):\n", + " params = {\n", + " \"n_epochs\": trial.suggest_int(\"n_epochs\", 3, 5), \n", + " \"gamma\": trial.suggest_uniform(\"gamma\", 0.9900, 0.9999), \n", + " \"total_timesteps\": trial.suggest_int(\"total_timesteps\", 500_000, 2_000_000)\n", + " }\n", + " model, score = run_training(params)\n", + " return score" + ], + "metadata": { + "id": "Wapg9hTI-AGz" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Finally, we use Optuna's `create_study()` function to create a study, passing in:\n", + "\n", + "- `sampler=TPESampler()`: This specifies that we want to employ a Bayesian optimization algorithm called Tree-structured Parzen Estimator. Other options are `GridSampler()`, `RandomSampler()`, etc. (The full list can be found here.)\n", + "- `study_name=\"PPO-LunarLander-v2\"`: This is a name we give to the study (optional).\n", + "- `direction=\"maximize\"`: This is to specify that our objective is to maximize (not mimimize) the score.\n", + "\n", + "Once our study is created, we call the `optimize()` method on it, specifying that we want to conduct `10` trials.\n", + "\n", + "**Note:** If you have more time available, then you can conduct more than `10` trials.\n", + "\n", + "**Warning:** The below code cell will take quite a bit of time to run!" + ], + "metadata": { + "id": "a2gToNx7gPEG" + } + }, + { + "cell_type": "code", + "source": [ + "study = optuna.create_study(sampler=TPESampler(), study_name=\"PPO-LunarLander-v2\", direction=\"maximize\")\n", + "study.optimize(objective, n_trials=10)" + ], + "metadata": { + "id": "0v9G57g4-gl_" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Now that all the `10` trials have concluded, let's print out the score and hyperparameters of the best trial." + ], + "metadata": { + "id": "9qRTUBAExSLt" + } + }, + { + "cell_type": "code", + "source": [ + "print(\"Best trial score:\", study.best_trial.values)\n", + "print(\"Best trial hyperparameters:\", study.best_trial.params)" + ], + "metadata": { + "id": "N-JFso9Lvurh" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Recreating & Saving The Best Model" + ], + "metadata": { + "id": "qiSBZmZDux4q" + } + }, + { + "cell_type": "markdown", + "source": [ + "Let's recreate the best model and save it." + ], + "metadata": { + "id": "SuZ4OSayyq36" + } + }, + { + "cell_type": "code", + "source": [ + "model, score = run_training(study.best_trial.params, verbose=1, save_model=True)" + ], + "metadata": { + "id": "8JVUHgljIW1E" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Pushing to Hugging Face Hub" + ], + "metadata": { + "id": "0eja4YaFu5Vb" + } + }, + { + "cell_type": "markdown", + "source": [ + "To be able to share your model with the community, there are three more steps to follow:\n", + "\n", + "1. (If not done already) create a Hugging Face account -> https://huggingface.co/join\n", + "\n", + "2. Sign in and then, get your authentication token from the Hugging Face website.\n", + "\n", + "- Create a new token (https://huggingface.co/settings/tokens) **with write role**.\n", + "- Copy the token.\n", + "- Run the cell below and paste the token." + ], + "metadata": { + "id": "qDjEim2izuFi" + } + }, + { + "cell_type": "code", + "source": [ + "notebook_login()\n", + "!git config --global credential.helper store" + ], + "metadata": { + "id": "msFVH2qhzwpE" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "If you aren't using Google Colab or Jupyter Notebook, you need to use this command instead: `huggingface-cli login`\n", + "\n", + "3. We're now ready to push our trained agent to the Hub using the `package_to_hub()` function.\n", + "\n", + "Let's fill in the arguments of the `package_to_hub` function:\n", + "\n", + "- `model`: our trained model\n", + "\n", + "- `model_name`: the name of the trained model that we defined in `model.save()`\n", + "\n", + "- `model_architecture`: the model architecture we used (in our case `\"PPO\"`)\n", + "\n", + "- `env_id`: the name of the environment (in our case `\"LunarLander-v2\"`)\n", + "\n", + "- `eval_env`: the evaluation environment\n", + "\n", + "- `repo_id`: the name of the Hugging Face Hub repository that will be created/updated `(repo_id=\"{username}/{repo_name}\")` (**Note:** A good `repo_id` is `\"{username}/{model_architecture}-{env_id}\"`.)\n", + "\n", + "- `commit_message`: the commit message" + ], + "metadata": { + "id": "lD4ACH160L_5" + } + }, + { + "cell_type": "code", + "source": [ + "model_name = \"PPO-LunarLander-v2\"\n", + "model_architecture = \"PPO\"\n", + "env_id = \"LunarLander-v2\"\n", + "eval_env = DummyVecEnv([lambda: gym.make(env_id)])\n", + "repo_id = \"Sadhaklal/PPO-LunarLander-v2\"\n", + "commit_message = \"Upload best PPO LunarLander-v2 agent (tuned with Optuna).\"" + ], + "metadata": { + "id": "8JkhrjAt0O6a" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "The following function call will evaluate the agent, record a replay, generate a model card, and push your agent to the Hub." + ], + "metadata": { + "id": "3HIT-M2C1l4E" + } + }, + { + "cell_type": "code", + "source": [ + "package_to_hub(\n", + " model=model, \n", + " model_name=model_name, \n", + " model_architecture=model_architecture, \n", + " env_id=env_id, \n", + " eval_env=eval_env, \n", + " repo_id=repo_id, \n", + " commit_message=commit_message\n", + ")" + ], + "metadata": { + "id": "1D0wQ_PU1mTN" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "That's it! You now know how to perform hyperparameter tuning of Stable-Baselines3 models using Optuna.\n", + "\n", + "To get even better results, try tuning the other hyperparameters of your model." + ], + "metadata": { + "id": "7jLc9JDY11EW" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Final Tips" + ], + "metadata": { + "id": "Fa4TtHLAtq7-" + } + }, + { + "cell_type": "markdown", + "source": [ + "1. Read the Optuna documentation to get more familiar with the library and its features.\n", + "2. You may have noticed that hyperparameter tuning is a time consuming process. However, it can be sped up significantly using parallelization. Check out this guide on how to do so." + ], + "metadata": { + "id": "tEgxhSjf4loa" + } + }, + { + "cell_type": "code", + "source": [ + "" + ], + "metadata": { + "id": "C7CJ7x7n4HuA" + }, + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file