From 3d043967505700fef295f1725c691f90ed8f754e Mon Sep 17 00:00:00 2001 From: Thomas Simonini Date: Fri, 24 Feb 2023 14:17:33 +0100 Subject: [PATCH] Update language-models.mdx --- units/en/unitbonus3/language-models.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/units/en/unitbonus3/language-models.mdx b/units/en/unitbonus3/language-models.mdx index db36cf7..0fffc19 100644 --- a/units/en/unitbonus3/language-models.mdx +++ b/units/en/unitbonus3/language-models.mdx @@ -20,7 +20,7 @@ There is therefore a potential synergy between LMs which can bring knowledge abo As a first attempt, the paper [“Grounding Large Language Models with Online Reinforcement Learning”](https://arxiv.org/abs/2302.02662v1) tackled the problem of **adapting or aligning a LM to a textual environment using PPO**. They showed that the knowledge encoded in the LM lead to a fast adaptation to the environment (opening avenue for sample efficiency RL agents) but also that such knowledge allowed the LM to better generalize to new tasks once aligned. -