mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-02-03 10:24:53 +08:00
Merge pull request #90 from ryanrussell/unit1-readability
docs: unit1 readability fixups
This commit is contained in:
@@ -345,7 +345,7 @@
|
||||
"3️⃣ Get an action using our model (in our example we take a random action)\n",
|
||||
"\n",
|
||||
"4️⃣ Using `env.step(action)`, we perform this action in the environment and get\n",
|
||||
"- `obsevation`: The new state (st+1)\n",
|
||||
"- `observation`: The new state (st+1)\n",
|
||||
"- `reward`: The reward we get after executing the action\n",
|
||||
"- `done`: Indicates if the episode terminated\n",
|
||||
"- `info`: A dictionary that provides additional information (depends on the environment).\n",
|
||||
|
||||
@@ -131,7 +131,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Initilaize wandb\n",
|
||||
"# Initialize wandb\n",
|
||||
"# https://docs.wandb.ai/guides/integrations/other/stable-baselines-3\n",
|
||||
"\n",
|
||||
"config = {\n",
|
||||
@@ -182,7 +182,7 @@
|
||||
],
|
||||
"source": [
|
||||
"env = make_vec_env('LunarLander-v2', n_envs=16)\n",
|
||||
"# Use the folling line with caution. The video recorder will try to render the agent on the screen, so that ffmpeg can caputre it. Here, we have 16 envs set. Trying to render 16 envs on screen will\n",
|
||||
"# Use the following line with caution. The video recorder will try to render the agent on the screen, so that ffmpeg can capture it. Here, we have 16 envs set. Trying to render 16 envs on screen will\n",
|
||||
"# be pretty resource intensive. \n",
|
||||
"# env = VecVideoRecorder(env, f\"videos/{run.id}\", record_video_trigger=lambda x: x % 2000 == 0, video_length=200) # Set the video recorder, to record our agent during training\n",
|
||||
"\n",
|
||||
|
||||
@@ -287,7 +287,7 @@
|
||||
"\n",
|
||||
"**Note:** If you have more time available, then you can tune other hyperparameters too. Moreover, you can explore wider ranges for each hyperparameter.\n",
|
||||
"\n",
|
||||
"The `trial.suggest_int()` and `trial.suggest_uniform()` methods are used by Optuna to suggest hyperparamter values in the ranges specified. The suggested combination of values are then used to train a model and return the score."
|
||||
"The `trial.suggest_int()` and `trial.suggest_uniform()` methods are used by Optuna to suggest hyperparameter values in the ranges specified. The suggested combination of values are then used to train a model and return the score."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "dORGHcVYdSKp"
|
||||
|
||||
Reference in New Issue
Block a user