update marl sys

This commit is contained in:
quantumiracle
2022-05-22 12:15:56 -04:00
parent 19b633fbff
commit be349f7eca
6 changed files with 801 additions and 2 deletions

View File

@@ -12,7 +12,7 @@
* **智能体个数带来的复杂度**从单智能体系统到多智能体系统最直接的变化就是智能体个数从1变为大于1个。对于一个各个智能体独立的$N$智能体系统而言,这种变化带来的策略空间表示复杂度是指数增加的,即$\tilde{O}(e^N)$。举个简单的例子,对于一个离散空间的单智能体系统,假设其状态空间大小为$S$, 动作空间大小为$A$,游戏步长为$H$,那么这个离散策略空间的大小为$O(HSA)$;而直接将该游戏扩展为$N$玩家游戏后,在最一般的情况下,即所有玩家有对称的动作空间动作空间大小为$A$且不共享任何结构信息,所有玩家策略的联合分布空间大小为$O(HSA^N)$。这是因为每个独立玩家的策略空间构成联合策略空间是乘积关系$\mathcal{A}=\mathcal{A}_1\times\dots\mathcal{A}_N$。而这将直接导致算法搜索复杂度提升。
* **游戏类型带来的复杂度**:从博弈论的角度,多智能系统所产生的游戏类型是复杂的。从最直接的分类角度,有竞争型、合作型、混合型。在竞争型游戏中,最典型的研究模型是二人零和博弈,如前一小结中提到的剪刀-石头-布的游戏。这类游戏中的纳什均衡策略一般为混合型策略即无法通过单一纯策略达到均衡条件。纯策略纳什均衡存在于少数零和游戏中。合作型游戏即多个智能体需要通过合作来提升整体奖励。在这类问题研究中一般采用基于值分解的思路将所有智能体得到的奖励值分配到单个智能体作为其奖励值。这一类的算法有VDN :cite:`sunehag2017value`, COMA :cite:`foerster2018counterfactual`, QMIX :cite:`rashid2018qmix`等。在混合型游戏中部分智能体之间为合作关系部分智能体或智能体的集合间为竞争关系。一般的非零和博弈且非纯合作型游戏为混合型游戏举个简单的例子如囚徒困境Prisoner's Dilemma 其奖励值表如 :numref:`tab_ch12_ch12_marl_prison`所示。囚徒困境的两个玩家各有两个动作沉默和背叛。可以用警察审查两名罪犯来理解奖励值的绝对值即他们将被判处的年数。纯所有玩家的奖励值之和非常数故其为非零和博弈型游戏。因此这一游戏不能被认为是纯竞争型或纯合作型游戏因为当他们中的一方选择沉默一方选择背叛时二者没有有效合作而一方拿到了0的奖励另一方为-3。而两者都选择沉默时是一种合作策略各自拿到-1的奖励值。尽管这一策略看起来优于其他策略但是这并不是这个游戏的纳什均衡策略因为纳什均衡策略假设玩家间策略需要单独制定无法形成联合策略分布。这实际上切断了玩家间的信息沟通和潜在合作的可能。因此囚徒困境的纳什均衡策略是两个玩家都选择背叛对方。诸如此类的博弈论游戏类型导致单智能体强化学习不能被直接用来优化多智能体系统中的各个智能体的策略。单智能体强化学习一般是找极值的过程而多智能体系统求解纳什均衡策略往往是找极大极小值即鞍点的过程从优化的角度看这也是不同的。复杂的关系需要更普适的系统进行表达这也对多智能体系统的构建提出了挑战。多智能体游戏类型也有许多其他的分类角度如单轮进行的游戏、多轮进行的游戏、多智能体同时决策的、多智能体序贯决策等等每一类不同的游戏都有相应不同的算法。而现有的多智能体系统往往针对单一类型游戏或者单一算法缺少普适性多智能体强化学习系统尤其是分布式的系统。
* **游戏类型带来的复杂度**:从博弈论的角度,多智能系统所产生的游戏类型是复杂的。从最直接的分类角度,有竞争型、合作型、混合型。在竞争型游戏中,最典型的研究模型是二人零和博弈,如前一小结中提到的剪刀-石头-布的游戏。这类游戏中的纳什均衡策略一般为混合型策略即无法通过单一纯策略达到均衡条件。纯策略纳什均衡存在于少数零和游戏中。合作型游戏即多个智能体需要通过合作来提升整体奖励。在这类问题研究中一般采用基于值分解的思路将所有智能体得到的奖励值分配到单个智能体作为其奖励值。这一类的算法有VDN :cite:`sunehag2017value`, COMA :cite:`foerster2018counterfactual`, QMIX :cite:`rashid2018qmix`等。在混合型游戏中部分智能体之间为合作关系部分智能体或智能体的集合间为竞争关系。一般的非零和博弈且非纯合作型游戏为混合型游戏举个简单的例子如囚徒困境Prisoner's Dilemma 其奖励值表如 :numref:`tab_ch12_ch12_marl_prison`所示。囚徒困境的两个玩家各有两个动作沉默和背叛。可以用警察审查两名罪犯来理解奖励值的绝对值即他们将被判处的年数。纯所有玩家的奖励值之和非常数故其为非零和博弈型游戏。因此这一游戏不能被认为是纯竞争型或纯合作型游戏因为当他们中的一方选择沉默一方选择背叛时二者没有有效合作而一方拿到了0的奖励另一方为-3。而两者都选择沉默时是一种合作策略各自拿到-1的奖励值。尽管这一策略看起来优于其他策略但是这并不是这个游戏的纳什均衡策略因为纳什均衡策略假设玩家间策略需要单独制定无法形成联合策略分布。这实际上切断了玩家间的信息沟通和潜在合作的可能。因此囚徒困境的纳什均衡策略是两个玩家都选择背叛对方。诸如此类的博弈论游戏类型导致单智能体强化学习不能被直接用来优化多智能体系统中的各个智能体的策略。单智能体强化学习一般是找极值的过程而多智能体系统求解纳什均衡策略往往是找极大-极小值即鞍点的过程,从优化的角度看这也是不同的。复杂的关系需要更普适的系统进行表达,这也对多智能体系统的构建提出了挑战。多智能体游戏类型也有许多其他的分类角度,如单轮进行的游戏、多轮进行的游戏、多智能体同时决策的、多智能体序贯决策等等,每一类不同的游戏都有相应不同的算法。而现有的多智能体系统往往针对单一类型游戏或者单一算法,缺少普适性多智能体强化学习系统,尤其是分布式的系统。
:囚徒困境奖励值
@@ -25,4 +25,12 @@
* **算法的异构**从前面介绍的几个简单的多智能体算法如自学习、虚构自学习等可以看出多智能体算法有时由许多轮单智能体强化学习过程组成。而对不同的游戏类型算法的类型也不相同。比如对合作型游戏许多算法是基于奖励分配Credit Assignment的思想如何将多个智能体获得的共同奖励合理分配给单个智能体是这类算法的核心。而这里面按照具体算法执行方式也可以分为集成训练统一执行的Centralized Training Centralized Execution、集成训练分别执行的Centralized Training Decentralized Execution、分别训练并分别执行Decentralized Training Decentralized Execution的几类来描述不同智能体训练过程和执行过程的统一性。对于竞争型游戏往往采用各种计算纳什均衡的近似方法如前面提到的虚构自学习、Double Oracle、Mirror Descent等等将获取单个最优策略的单智能体强化学习过程看做一个“动作”而对这些“动作”组成的元问题上进行纳什均衡近似。现有的算法在类似问题上有很大的差异性使得构建一个统一的多智能体强化学习系统比较困难。
* **学习方法组合**在前面提到的AlphaStar :cite:`vinyals2019grandmaster`等工作中,多智能体系统中优化得到一个好的策略往往不只需要强化学习算法,还需要其他学习方法如模仿学习等的辅助。比如从一些顶级人类玩家的游戏记录中形成有标签的训练样本,来预训练智能体。由于这些大规模游戏的复杂性,这往往是一个在训练前期快速提升智能体表现的有效方式。而对于整个学习系统而言,这就需要对不同学习范式进行结合,如合理地在模仿学习和强化学习之间进行切换等。这也使得大规模多智能体系统不单一是构建强化学习系统的问题,而需要许多其他学习机制和协调机制的配合实现。
* **学习方法组合**在前面提到的AlphaStar :cite:`vinyals2019grandmaster`等工作中,多智能体系统中优化得到一个好的策略往往不只需要强化学习算法,还需要其他学习方法如模仿学习等的辅助。比如从一些顶级人类玩家的游戏记录中形成有标签的训练样本,来预训练智能体。由于这些大规模游戏的复杂性,这往往是一个在训练前期快速提升智能体表现的有效方式。而对于整个学习系统而言,这就需要对不同学习范式进行结合,如合理地在模仿学习和强化学习之间进行切换等。这也使得大规模多智能体系统不单一是构建强化学习系统的问题,而需要许多其他学习机制和协调机制的配合实现。
如 :numref:`ch12/ch12_marl_sys`所示为一个分布式多智能体强化学习系统。图中的两个智能体可以类似扩展到多个智能体。每个智能体包含多个行动者Actor用于采样和学习者Learner用于更新模型这些行动者和学习者可以并行处理来加速训练过程具体方法可以参考单智能体分布式系统章节介绍的A3C和IMPALA架构。训练好的模型被统一存储和管理在模型存储器中是否对各个智能体的模型分别存储取决于各个智能体是否对称。存储器中的模型可以被模型评估器用来打分从而为下一步模型选择器做准备。模型选择器根据模型评估器或者元学习者如PSRO算法 :cite:`lanctot2017unified`以及均衡求解器等进行模型选择并将选出的模型分发到各个智能体的行动者上。这一处理过程我们称为联盟型管理League-based Management。对于与环境交互的部分分布式系统可以通过一个推理服务器Inference Server对各个并行进程中的模型进行集中推理将基于观察量Observation的动作Action发送给环境。环境部分也可以是并行的。推理服务器将采集到的交互轨迹发送给各个智能体进行模型训练。以上为一个分布式多智能体系统的例子实际中根据不同的游戏类型和算法结构可能会有不同的设计。
![分布式多智能体强化学习系统](../img/ch12/ch12-marl-sys.png)
:width:`800px`
:label:`ch12/ch12_marl_sys`

Binary file not shown.

Before

Width:  |  Height:  |  Size: 45 KiB

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 40 KiB

After

Width:  |  Height:  |  Size: 30 KiB

BIN
img/ch12/ch12-marl-sys.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 161 KiB

782
img/ch12/ch12-marl-sys.svg Normal file
View File

@@ -0,0 +1,782 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
width="466.7132mm"
height="299.44547mm"
viewBox="0 0 466.71319 299.44546"
version="1.1"
id="svg5"
inkscape:version="1.1.1 (1:1.1+202109281949+c3084ef5ed)"
sodipodi:docname="marl_sys.svg"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg">
<sodipodi:namedview
id="namedview7"
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1.0"
inkscape:pageshadow="2"
inkscape:pageopacity="0.0"
inkscape:pagecheckerboard="0"
inkscape:document-units="mm"
showgrid="false"
inkscape:zoom="0.32026164"
inkscape:cx="507.39763"
inkscape:cy="710.35668"
inkscape:window-width="1848"
inkscape:window-height="1136"
inkscape:window-x="72"
inkscape:window-y="27"
inkscape:window-maximized="1"
inkscape:current-layer="layer1"
inkscape:snap-global="false"
fit-margin-top="0"
fit-margin-left="0"
fit-margin-right="0"
fit-margin-bottom="0" />
<defs
id="defs2">
<marker
style="overflow:visible"
id="Arrow1Mend"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend"
inkscape:isstock="true">
<path
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:context-stroke;fill-rule:evenodd;stroke:context-stroke;stroke-width:1pt"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path129232" />
</marker>
<marker
style="overflow:visible"
id="Arrow1Mend-3"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend"
inkscape:isstock="true">
<path
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:context-stroke;fill-rule:evenodd;stroke:context-stroke;stroke-width:1pt"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path129232-3" />
</marker>
<marker
style="overflow:visible"
id="Arrow1Mend-3-8"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend"
inkscape:isstock="true">
<path
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:context-stroke;fill-rule:evenodd;stroke:context-stroke;stroke-width:1pt"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path129232-3-6" />
</marker>
<marker
style="overflow:visible"
id="Arrow1Mend-3-4"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend"
inkscape:isstock="true">
<path
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:context-stroke;fill-rule:evenodd;stroke:context-stroke;stroke-width:1pt"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path129232-3-8" />
</marker>
<marker
style="overflow:visible"
id="Arrow1Mend-3-89"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend"
inkscape:isstock="true">
<path
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:context-stroke;fill-rule:evenodd;stroke:context-stroke;stroke-width:1pt"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path129232-3-7" />
</marker>
<marker
style="overflow:visible"
id="Arrow1Mend-3-6"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend"
inkscape:isstock="true">
<path
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:context-stroke;fill-rule:evenodd;stroke:context-stroke;stroke-width:1pt"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path129232-3-4" />
</marker>
<marker
style="overflow:visible"
id="Arrow1Mend-0"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend"
inkscape:isstock="true">
<path
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:context-stroke;fill-rule:evenodd;stroke:context-stroke;stroke-width:1pt"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path129232-30" />
</marker>
<marker
style="overflow:visible"
id="Arrow1Mend-2"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend"
inkscape:isstock="true">
<path
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:context-stroke;fill-rule:evenodd;stroke:context-stroke;stroke-width:1pt"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path129232-5" />
</marker>
<marker
style="overflow:visible"
id="Arrow1Mend-3-89-0"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend"
inkscape:isstock="true">
<path
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:context-stroke;fill-rule:evenodd;stroke:context-stroke;stroke-width:1pt"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path129232-3-7-5" />
</marker>
<marker
style="overflow:visible"
id="Arrow1Mend-28"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend"
inkscape:isstock="true">
<path
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:context-stroke;fill-rule:evenodd;stroke:context-stroke;stroke-width:1pt"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path129232-9" />
</marker>
<marker
style="overflow:visible"
id="Arrow1Mend-2-6"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend"
inkscape:isstock="true">
<path
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:context-stroke;fill-rule:evenodd;stroke:context-stroke;stroke-width:1pt"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path129232-5-8" />
</marker>
<marker
style="overflow:visible"
id="Arrow1Mend-21"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend"
inkscape:isstock="true">
<path
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:context-stroke;fill-rule:evenodd;stroke:context-stroke;stroke-width:1pt"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path129232-0" />
</marker>
</defs>
<g
inkscape:label="Layer 1"
inkscape:groupmode="layer"
id="layer1"
transform="translate(153.48541,7.5859966)">
<rect
style="fill:#ffffff;fill-opacity:0;stroke:#000000;stroke-width:1.5;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none"
id="rect2439"
width="319.29016"
height="73.025307"
x="-6.8123627"
y="-6.8359966"
ry="7.4795961" />
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="0.35719228"
y="7.3493328"
id="text6769"><tspan
sodipodi:role="line"
id="tspan6767"
style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-family:sans-serif;-inkscape-font-specification:'sans-serif Bold';stroke-width:0.264583"
x="0.35719228"
y="7.3493328">智能体1</tspan></text>
<rect
style="fill:#ffffff;fill-opacity:0;stroke:#000000;stroke-width:1.5;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none"
id="rect14994"
width="108.53062"
height="38.626457"
x="1.1066673"
y="13.8151"
ry="6.3606167" />
<rect
style="fill:#ffffff;fill-opacity:0.999939;stroke:#000000;stroke-width:1.5;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none"
id="rect14994-9"
width="108.53062"
height="38.626457"
x="9.5172548"
y="20.080038"
ry="6.3606167" />
<rect
style="fill:#ffffff;fill-opacity:0.999939;stroke:#000000;stroke-width:0.886039;stroke-linejoin:round"
id="rect14994-9-6"
width="48.133919"
height="23.788153"
x="20.969793"
y="26.469824"
ry="3.9171939" />
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="33.587643"
y="42.48008"
id="text18868"><tspan
sodipodi:role="line"
id="tspan18866"
style="stroke-width:0.264583"
x="33.587643"
y="42.48008">模型</tspan></text>
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="84.019226"
y="32.250854"
id="text18868-3"><tspan
sodipodi:role="line"
id="tspan18866-8"
style="stroke-width:0.264583"
x="84.019226"
y="32.250854">行动者</tspan></text>
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="135.93124"
y="34.698502"
id="text18868-3-1"><tspan
sodipodi:role="line"
id="tspan18866-8-1"
style="stroke-width:0.264583"
x="135.93124"
y="34.698502">轨迹</tspan></text>
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="55.129414"
y="111.61241"
id="text18868-3-1-5"><tspan
sodipodi:role="line"
id="tspan18866-8-1-0"
style="font-style:italic;font-variant:normal;font-weight:normal;font-stretch:normal;font-family:sans-serif;-inkscape-font-specification:'sans-serif Italic';stroke-width:0.264583"
x="55.129414"
y="111.61241">模型分发</tspan></text>
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="243.21516"
y="90.411438"
id="text18868-3-1-5-8"><tspan
sodipodi:role="line"
id="tspan18866-8-1-0-62"
style="font-style:italic;font-variant:normal;font-weight:normal;font-stretch:normal;font-family:sans-serif;-inkscape-font-specification:'sans-serif Italic';stroke-width:0.264583"
x="243.21516"
y="90.411438">模型更新存储</tspan></text>
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="-26.510138"
y="181.73605"
id="text18868-3-1-5-2"><tspan
sodipodi:role="line"
id="tspan18866-8-1-0-5"
style="font-style:italic;font-variant:normal;font-weight:normal;font-stretch:normal;font-family:sans-serif;-inkscape-font-specification:'sans-serif Italic';stroke-width:0.264583"
x="-26.510138"
y="181.73605">集中推理</tspan></text>
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="-26.780876"
y="110.9239"
id="text18868-3-1-5-4"><tspan
sodipodi:role="line"
id="tspan18866-8-1-0-6"
style="font-style:italic;font-variant:normal;font-weight:normal;font-stretch:normal;font-family:sans-serif;-inkscape-font-specification:'sans-serif Italic';stroke-width:0.264583"
x="-26.780876"
y="110.9239">传回轨迹</tspan></text>
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="132.00789"
y="260.86563"
id="text18868-3-1-0"><tspan
sodipodi:role="line"
id="tspan18866-8-1-8"
style="stroke-width:0.264583"
x="132.00789"
y="260.86563">轨迹</tspan></text>
<rect
style="fill:#ffffff;fill-opacity:0;stroke:#000000;stroke-width:1.31629;stroke-linejoin:round"
id="rect14994-8-2"
width="72.150307"
height="35.024509"
x="-148.56631"
y="119.91573"
ry="5.7674842" />
<rect
style="fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:1.31629;stroke-linejoin:round"
id="rect14994-8"
width="72.150307"
height="35.024509"
x="-141.41985"
y="127.66415"
ry="5.7674842" />
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="-117.13503"
y="148.94951"
id="text18868-8"><tspan
sodipodi:role="line"
id="tspan18866-5"
style="stroke-width:0.264583"
x="-117.13503"
y="148.94951">环境</tspan></text>
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="-66.873871"
y="158.91365"
id="text18868-8-4"><tspan
sodipodi:role="line"
id="tspan18866-5-6"
style="stroke-width:0.264583"
x="-66.873871"
y="158.91365">观察量</tspan></text>
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="-59.924511"
y="132.58028"
id="text18868-8-4-9"><tspan
sodipodi:role="line"
id="tspan18866-5-6-2"
style="stroke-width:0.264583"
x="-59.924511"
y="132.58028">动作</tspan></text>
<rect
style="fill:#ffffff;fill-opacity:0;stroke:#000000;stroke-width:1.31629;stroke-linejoin:round"
id="rect14994-8-8"
width="72.150307"
height="35.024509"
x="-33.468338"
y="125.49976"
ry="5.7674842" />
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="-22.948561"
y="148.04997"
id="text18868-8-1"><tspan
sodipodi:role="line"
id="tspan18866-5-0"
style="stroke-width:0.264583"
x="-22.948561"
y="148.04997">推理服务器</tspan></text>
<rect
style="fill:#ffffff;fill-opacity:0;stroke:#000000;stroke-width:1.31629;stroke-linejoin:round"
id="rect14994-8-8-3"
width="72.150307"
height="35.024509"
x="232.87144"
y="127.03558"
ry="5.7674842" />
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="243.4552"
y="147.87918"
id="text18868-8-1-0"><tspan
sodipodi:role="line"
id="tspan18866-5-0-4"
style="stroke-width:0.264583"
x="243.4552"
y="147.87918">模型存储器</tspan></text>
<rect
style="fill:#ffffff;fill-opacity:0;stroke:#000000;stroke-width:1.31629;stroke-linejoin:round"
id="rect14994-8-8-3-4"
width="72.150307"
height="35.024509"
x="144.12167"
y="125.81642"
ry="5.7674842" />
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="153.62067"
y="146.87305"
id="text18868-8-1-0-4"><tspan
sodipodi:role="line"
id="tspan18866-5-0-4-4"
style="stroke-width:0.264583"
x="153.62067"
y="146.87305">模型评估器</tspan></text>
<rect
style="fill:#ffffff;fill-opacity:0;stroke:#000000;stroke-width:1.31629;stroke-linejoin:round"
id="rect14994-8-8-3-4-7"
width="72.150307"
height="35.024509"
x="55.503185"
y="126.00691"
ry="5.7674842" />
<rect
style="fill:#ffffff;fill-opacity:0;stroke:#000000;stroke-width:1.5;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:4.5, 1.5;stroke-dashoffset:0"
id="rect14994-8-8-3-4-7-4"
width="260.65747"
height="135.01796"
x="50.785854"
y="74.764908"
ry="8.7207499" />
<rect
style="fill:#ffffff;fill-opacity:0;stroke:#000000;stroke-width:1.5;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:4.5, 1.5;stroke-dashoffset:0"
id="rect14994-8-8-3-4-7-4-4"
width="195.73186"
height="95.42511"
x="-152.73541"
y="94.206604"
ry="6.163465" />
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="65.438759"
y="147.22806"
id="text18868-8-1-0-4-6"><tspan
sodipodi:role="line"
id="tspan18866-5-0-4-4-3"
style="stroke-width:0.264583"
x="65.438759"
y="147.22806">模型选择器</tspan></text>
<rect
style="fill:#ffffff;fill-opacity:0;stroke:#000000;stroke-width:1.31629;stroke-linejoin:round"
id="rect14994-8-8-3-4-7-1"
width="72.150307"
height="35.024509"
x="146.03395"
y="77.397316"
ry="5.7674842" />
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="159.88176"
y="98.891731"
id="text18868-8-1-0-4-6-7"><tspan
sodipodi:role="line"
id="tspan18866-5-0-4-4-3-5"
style="stroke-width:0.264583"
x="159.88176"
y="98.891731">元学习者</tspan></text>
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="56.849323"
y="89.429657"
id="text18868-8-1-0-4-6-7-4"><tspan
sodipodi:role="line"
id="tspan18866-5-0-4-4-3-5-7"
style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-family:sans-serif;-inkscape-font-specification:'sans-serif Bold';stroke-width:0.264583"
x="56.849323"
y="89.429657">联盟型管理</tspan></text>
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="-143.45882"
y="108.3296"
id="text18868-8-1-0-4-6-7-4-8"><tspan
sodipodi:role="line"
id="tspan18866-5-0-4-4-3-5-7-1"
style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-family:sans-serif;-inkscape-font-specification:'sans-serif Bold';stroke-width:0.264583"
x="-143.45882"
y="108.3296">批量推理</tspan></text>
<rect
style="fill:#ffffff;fill-opacity:0;stroke:#000000;stroke-width:1.31629;stroke-linejoin:round"
id="rect14994-8-8-3-4-7-1-9"
width="72.150307"
height="35.024509"
x="144.70303"
y="171.50504"
ry="5.7674842" />
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="149.94156"
y="193.65414"
id="text18868-8-1-0-4-6-7-6"><tspan
sodipodi:role="line"
id="tspan18866-5-0-4-4-3-5-2"
style="stroke-width:0.264583"
x="149.94156"
y="193.65414">均衡求解器等</tspan></text>
<rect
style="fill:#ffffff;fill-opacity:0;stroke:#000000;stroke-width:1.5;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none"
id="rect14994-5"
width="108.53062"
height="38.626457"
x="172.19795"
y="14.38378"
ry="6.3606167" />
<rect
style="fill:#ffffff;fill-opacity:0.999939;stroke:#000000;stroke-width:1.5;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none"
id="rect14994-9-61"
width="108.53062"
height="38.626457"
x="180.60855"
y="20.648718"
ry="6.3606167" />
<rect
style="fill:#ffffff;fill-opacity:0.999939;stroke:#000000;stroke-width:0.886039;stroke-linejoin:round"
id="rect14994-9-6-1"
width="48.133919"
height="23.788153"
x="192.06108"
y="27.038504"
ry="3.9171939" />
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="206.19484"
y="43.035706"
id="text18868-5"><tspan
sodipodi:role="line"
id="tspan18866-9"
style="stroke-width:0.264583"
x="206.19484"
y="43.035706">模型</tspan></text>
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="255.11052"
y="32.819534"
id="text18868-3-8"><tspan
sodipodi:role="line"
id="tspan18866-8-4"
style="stroke-width:0.264583"
x="255.11052"
y="32.819534">学习者</tspan></text>
<rect
style="fill:#ffffff;fill-opacity:0;stroke:#000000;stroke-width:1.5;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none"
id="rect2439-17"
width="319.29016"
height="73.025307"
x="-11.073907"
y="218.08417"
ry="7.4795961" />
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="-3.9043519"
y="232.2695"
id="text6769-8"><tspan
sodipodi:role="line"
id="tspan6767-5"
style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-family:sans-serif;-inkscape-font-specification:'sans-serif Bold';stroke-width:0.264583"
x="-3.9043519"
y="232.2695">智能体2</tspan></text>
<rect
style="fill:#ffffff;fill-opacity:0;stroke:#000000;stroke-width:1.5;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none"
id="rect14994-7"
width="108.53062"
height="38.626457"
x="-3.1548784"
y="238.73528"
ry="6.3606167" />
<rect
style="fill:#ffffff;fill-opacity:0.999939;stroke:#000000;stroke-width:1.5;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none"
id="rect14994-9-4"
width="108.53062"
height="38.626457"
x="5.2557054"
y="245.00021"
ry="6.3606167" />
<rect
style="fill:#ffffff;fill-opacity:0.999939;stroke:#000000;stroke-width:0.886039;stroke-linejoin:round"
id="rect14994-9-6-18"
width="48.133919"
height="23.788153"
x="16.708244"
y="251.39"
ry="3.9171939" />
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="29.326096"
y="267.40024"
id="text18868-59"><tspan
sodipodi:role="line"
id="tspan18866-7"
style="stroke-width:0.264583"
x="29.326096"
y="267.40024">模型</tspan></text>
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="79.75769"
y="257.17102"
id="text18868-3-5"><tspan
sodipodi:role="line"
id="tspan18866-8-3"
style="stroke-width:0.264583"
x="79.75769"
y="257.17102">行动者</tspan></text>
<rect
style="fill:#ffffff;fill-opacity:0;stroke:#000000;stroke-width:1.5;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none"
id="rect14994-5-8"
width="108.53062"
height="38.626457"
x="167.9364"
y="239.30396"
ry="6.3606167" />
<rect
style="fill:#ffffff;fill-opacity:0.999939;stroke:#000000;stroke-width:1.5;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none"
id="rect14994-9-61-8"
width="108.53062"
height="38.626457"
x="176.34698"
y="245.56889"
ry="6.3606167" />
<rect
style="fill:#ffffff;fill-opacity:0.999939;stroke:#000000;stroke-width:0.886039;stroke-linejoin:round"
id="rect14994-9-6-1-3"
width="48.133919"
height="23.788153"
x="187.79953"
y="251.95868"
ry="3.9171939" />
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="201.93329"
y="267.95587"
id="text18868-5-1"><tspan
sodipodi:role="line"
id="tspan18866-9-8"
style="stroke-width:0.264583"
x="201.93329"
y="267.95587">模型</tspan></text>
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:10.5833px;line-height:1.25;font-family:sans-serif;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
x="250.84897"
y="257.73969"
id="text18868-3-8-9"><tspan
sodipodi:role="line"
id="tspan18866-8-4-6"
style="stroke-width:0.264583"
x="250.84897"
y="257.73969">学习者</tspan></text>
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Mend)"
d="m 221.73029,51.680793 41.05117,73.696017"
id="path129221"
sodipodi:nodetypes="cc" />
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Mend-3)"
d="M 221.60084,238.39922 263.16343,164.2437"
id="path129221-3"
sodipodi:nodetypes="cc" />
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Mend-3)"
d="m 232.07068,144.81429 -14.1921,0.11901"
id="path129647" />
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Mend-3-8)"
d="m 143.41514,143.74434 -14.1921,0.11901"
id="path129647-0" />
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Mend-3-89)"
d="m -33.402256,137.51102 -32.818613,0.10115"
id="path129647-7"
sodipodi:nodetypes="cc" />
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Mend-3-89-0)"
d="m -68.294265,145.97536 32.818613,0.10115"
id="path129647-7-9"
sodipodi:nodetypes="cc" />
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Mend-3)"
d="M 145.25257,93.372474 128.91909,131.5397"
id="path129935" />
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Mend-3-6)"
d="M 145.12679,192.6959 128.79331,154.52868"
id="path129935-3" />
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Mend)"
d="M 55.519965,130.52022 49.772467,52.460192"
id="path130093"
sodipodi:nodetypes="cc" />
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Mend-0)"
d="M 54.993175,148.814 49.288436,250.07616"
id="path130093-9"
sodipodi:nodetypes="cc" />
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Mend)"
d="M 38.762972,59.572454 19.500412,124.86725"
id="path130388"
sodipodi:nodetypes="cc" />
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Mend-28)"
d="M 13.755602,126.03049 32.884955,60.936682"
id="path130388-3"
sodipodi:nodetypes="cc" />
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Mend-2)"
d="M 40.851068,237.41642 22.417294,163.06537"
id="path130388-4"
sodipodi:nodetypes="cc" />
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Mend-2-6)"
d="m 27.302794,161.36631 18.154932,75.43181"
id="path130388-4-0"
sodipodi:nodetypes="cc" />
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Mend)"
d="m 118.75326,38.552769 60.05676,0.0488"
id="path153359"
sodipodi:nodetypes="cc" />
<path
style="fill:none;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-end:url(#Arrow1Mend-21)"
d="m 114.4343,264.28683 60.05676,0.0488"
id="path153359-5"
sodipodi:nodetypes="cc" />
</g>
</svg>

After

Width:  |  Height:  |  Size: 31 KiB

View File

@@ -20,6 +20,15 @@
publisher={MIT Press}
}
@article{lanctot2017unified,
title={A unified game-theoretic approach to multiagent reinforcement learning},
author={Lanctot, Marc and Zambaldi, Vinicius and Gruslys, Audrunas and Lazaridou, Angeliki and Tuyls, Karl and P{\'e}rolat, Julien and Silver, David and Graepel, Thore},
journal={Advances in neural information processing systems},
volume={30},
year={2017}
}
@article{mnih2013playing,
title={Playing atari with deep reinforcement learning},
author={Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Graves, Alex and Antonoglou, Ioannis and Wierstra, Daan and Riedmiller, Martin},