mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-03-30 08:40:27 +08:00
Confusing wording in self-play.mdx
This commit is contained in:
@@ -37,7 +37,7 @@ The theory behind self-play is not something new. It was already used by Arthur
|
||||
|
||||
Self-Play is integrated into the MLAgents library and is managed by multiple hyperparameters that we’re going to study. But the main focus, as explained in the documentation, is the **tradeoff between the skill level and generality of the final policy and the stability of learning**.
|
||||
|
||||
Training against a set of slowly changing or unchanging adversaries with low diversity **results in more stable training. But a risk to overfit if the change is too slow.**
|
||||
Training against a set of slowly changing or unchanging adversaries with low diversity **results in more stable training. But there is a risk of overfitting if the change is too slow.**
|
||||
|
||||
So we need to control:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user