mirror of
https://github.com/huggingface/deep-rl-class.git
synced 2026-04-01 01:30:56 +08:00
33 lines
2.3 KiB
Plaintext
33 lines
2.3 KiB
Plaintext
# Model Based Reinforcement Learning (MBRL)
|
|
|
|
Model-based reinforcement learning only differs from its model-free counterpart in learning a *dynamics model*, but that has substantial downstream effects on how the decisions are made.
|
|
|
|
The dynamics model usually models the environment transition dynamics, \\( s_{t+1} = f_\theta (s_t, a_t) \\), but things like inverse dynamics models (mapping from states to actions) or reward models (predicting rewards) can be used in this framework.
|
|
|
|
|
|
## Simple definition
|
|
|
|
- There is an agent that repeatedly tries to solve a problem, **accumulating state and action data**.
|
|
- With that data, the agent creates a structured learning tool, *a dynamics model*, to reason about the world.
|
|
- With the dynamics model, the agent **decides how to act by predicting the future**.
|
|
- With those actions, **the agent collects more data, improves said model, and hopefully improves future actions**.
|
|
|
|
## Academic definition
|
|
|
|
Model-based reinforcement learning (MBRL) follows the framework of an agent interacting in an environment, **learning a model of said environment**, and then **leveraging the model for control (making decisions).
|
|
|
|
Specifically, the agent acts in a Markov Decision Process (MDP) governed by a transition function \\( s_{t+1} = f (s_t , a_t) \\) and returns a reward at each step \\( r(s_t, a_t) \\). With a collected dataset \\( D :={ s_i, a_i, s_{i+1}, r_i} \\), the agent learns a model, \\( s_{t+1} = f_\theta (s_t , a_t) \\) **to minimize the negative log-likelihood of the transitions**.
|
|
|
|
We employ sample-based model-predictive control (MPC) using the learned dynamics model, which optimizes the expected reward over a finite, recursively predicted horizon, \\( \tau \\), from a set of actions sampled from a uniform distribution \\( U(a) \\), (see [paper](https://arxiv.org/pdf/2002.04523) or [paper](https://arxiv.org/pdf/2012.09156.pdf) or [paper](https://arxiv.org/pdf/2009.01221.pdf)).
|
|
|
|
## Further reading
|
|
|
|
For more information on MBRL, we recommend you check out the following resources:
|
|
|
|
- A [blog post on debugging MBRL](https://www.natolambert.com/writing/debugging-mbrl).
|
|
- A [recent review paper on MBRL](https://arxiv.org/abs/2006.16712),
|
|
|
|
## Author
|
|
|
|
This section was written by <a href="https://twitter.com/natolambert"> Nathan Lambert </a>
|