diff --git a/README.md b/README.md index df5f56d..d9e08b7 100644 --- a/README.md +++ b/README.md @@ -64,6 +64,8 @@ - **可解释性AI系统:** 随着机器学习在安全攸关(Safety-critical)领域的应用,机器学习系统越来越需要对决策给出充分解释。本书将会讨论可解释AI系统的常用方法和落地实践经验。 +- **机器人学习系统:** 随着机器学习在机器人领域的应用,机器人学习及相关系统设计成为日益重要的研究领域。本书将会讨论机器人学习的常用方法及其相关系统设计。 + 我们在持续拓展拓展本书的内容,如元学习系统,自动并行,深度学习集群调度,绿色AI系统,图学习系统等。我们也非常欢迎社区对于新内容提出建议,贡献章节。 ## 构建指南 diff --git a/chapter_rl_sys/control.md b/chapter_rl_sys/control.md new file mode 100644 index 0000000..060d65a --- /dev/null +++ b/chapter_rl_sys/control.md @@ -0,0 +1,22 @@ +## 控制系统 + +虽然控制理论已牢牢植根于基于模型(Model-based)的设计传统,但丰富的数据和机器学习给控制理论带来了新的机遇。控制理论和机器学习的交叉点涵盖了广泛的研究方向,包括但不限于动态系统的学习、在线学习和控制、深度学习的控制理论观点、强化学习以及在各种现实世界系统中的应用。 +从机器学习的角度来看,未来的主要挑战之一是超越模式识别并解决数据驱动控制和动态过程优化方面的问题。 + +理论方面,线性二次控制(Linear-Quadratic +Control)是经典的控制方法,最近有关于图神经网络在分布式线性二次控制的研究。作者称将线性二次问题转换为自监督学习问题,能够找到基于图神经网络(Graph +Neural +Networks,GNN)的最佳分布式控制器,他们还推导出了所得闭环系统稳定的充分条件。随着基于数据和学习的机器人控制方法不断得到重视,研究人员必须了解何时以及如何在现实世界中最好地利用这些方法,因为安全是至关重要的,有的研究通过学习不确定的动力学来安全地提高性能,鼓励安全或稳健的强化学习方法,以及可以正式认证所学控制策略的安全性的方法。图:numref:`safe\_learning\_control`展示了安全学习控制(Safe Learning +Control)系统的框架图,用数据驱动的方法来学习控制策略,兼顾安全性。Lyapunov +函数是评估非线性动力系统稳定性的有效工具,最近有人提出Neural +Lyapunov来将安全性纳入考虑。 + +应用方面,有基于神经网络的自动驾驶汽车模型预测控制,也有研究将最优控制和学习相结合并应用在陌生环境中的视觉导航,该研究将基于模型的控制与基于学习的感知相结合来解决。基于学习的感知模块产生一系列航路点通过无碰撞路径引导机器人到达目标。基于模型的规划器使用这些航路点来生成平滑且动态可行的轨迹,该轨迹使用反馈控制在物理系统上执行。在模拟的现实世界杂乱环境和实际地面车辆上的实验表明,与纯粹基于几何映射或基于端到端学习的替代方案相比,这种新的系统可以在新环境中更可靠、更有效地到达目标位置。强化学习和模仿学习与控制论有密切联系:LEOC整合了强化学习和经典控制理论的原则方法。有人将基于模型的离线强化学习算法扩展到高维视觉观察空间并在真实机器人上执行基于图像的抽屉关闭任务方面表现出色。控制部分通过神经网络优化可以更加平滑、节能、安全,如何将 +神经网络和传统控制理论结合,特别是和运动学算法相结合,将会是一个有趣的方向。 + +![安全学习控制系统,数据被用来更新控制策略或或安全滤波器](../img/ch13/safe_learning_control.png) + +:width:`800px` + +:label:`safe\_learning\_control` + diff --git a/chapter_rl_sys/index.md b/chapter_rl_sys/index.md new file mode 100644 index 0000000..e26b2e6 --- /dev/null +++ b/chapter_rl_sys/index.md @@ -0,0 +1,20 @@ +# 机器人学习系统 + +在本章中,我们介绍机器学习的一个重要分支——机器人学习及其在系统方面的知识。本章的学习目标包括: + +- 掌握机器人学习基本知识。 + +- 掌握通用机器人操作系统。 + +- 掌握感知系统、规划系统、控制系统。 + +```toc +:maxdepth: 2 + +rl_sys_intro +ros +perception +planning +control +summary +``` diff --git a/chapter_rl_sys/perception.md b/chapter_rl_sys/perception.md new file mode 100644 index 0000000..580b358 --- /dev/null +++ b/chapter_rl_sys/perception.md @@ -0,0 +1,23 @@ +## 感知系统 + +感知系统不仅可以包括视觉,还可以包含触觉、声音等。在未知环境中,机器人想实现自主移动和导航必须知道自己在哪(例如通过相机重定位),周围什么情况(例如通过3D物体检测或语义分割),这些要依靠感知系统来实现。 +一提到感知系统,不得不提的就是即时定位与建图(Simultaneous Localization +and +Mapping,SLAM)系统。SLAM大致过程包括地标提取、数据关联、状态估计、状态更新以及地标更新等。视觉里程计Visual +Odometry是SLAM中的重要部分,它估计两个时刻机器人的相对运动(Ego-motion)。ORB-SLAM系列是视觉SLAM中有代表性的工作,图 :numref:`orbslam3` 展示了最新的ORB-SLAM3的主要系统组件。香港科技大学开源的基于单目视觉与惯导融合的SLAM技术VINS-Mono也很值得关注。多传感器融合、优化数据关联与回环检测、与前端异构处理器集成、提升鲁棒性和重定位精度都是SLAM技术接下来的发展方向。 + +最近,随着机器学习的兴起,基于学习的SLAM框架也被提了出来。TartanVO是第一个基于学习的视觉里程计(VO)模型,该模型可以推广到多个数据集和现实世界场景,并优于传统基于几何的方法。 +UnDeepVO是一个无监督深度学习方案,能够通过使用深度神经网络估计单目相机的 +6-DoF 位姿及其视图深度。DROID-SLAM是用于单目、立体和 +RGB-D 相机的深度视觉 SLAM,它通过Bundle +Adjustment层对相机位姿和像素深度的反复迭代更新,具有很强的鲁棒性,故障大大减少,尽管对单目视频进行了训练,但它可以利用立体声或 +RGB-D 视频在测试时提高性能。其中,Bundle Adjustment +(BA)与机器学习的结合被广泛研究。CMU提出通过主动神经 +SLAM +的模块化系统帮助智能机器人在未知环境中的高效探索。 + +![ORB-SLAM3主要系统组件](../img/ch13/orbslam3.png) + +:width:`800px` + +:label:`orbslam3` diff --git a/chapter_rl_sys/planning.md b/chapter_rl_sys/planning.md new file mode 100644 index 0000000..cac0535 --- /dev/null +++ b/chapter_rl_sys/planning.md @@ -0,0 +1,14 @@ +## 规划系统 + +规划不仅包含运动路径规划,还包含高级任务规划。其中,运动规划是机器人技术的核心问题之一,应用范围从导航到复杂环境中的操作。它具有悠久的研究历史,方法需要有概率完整性和最优性的保证。然而,当经典运动规划在处理现实世界的机器人问题(在高维空间中)时,挑战仍然存在。研究人员在继续开发新算法来克服与这些方法相关的限制,包括优化计算和内存负载、更好的规划表示和处理维度灾难等。 + +相比之下,机器学习的最新进展为机器人专家研究运动规划问题开辟了新视角:经典运动规划器的瓶颈可以以数据驱动的方式解决;基于深度学习的规划器可以避免几何输入的局限性,例如使用视觉或语义输入进行规划等。最近的工作有:基于深度神经网络的四足机器人快速运动规划框架,通过贝叶斯学习进行运动规划,通过运动规划器指导的视觉运动策略学习。ML4KP是一个用于有效运动动力学运动规划的C++库,该库可以轻松地将机器学习方法集成到规划过程中。 +自动驾驶领域和行人和车辆轨迹预测方面也涌现出使用机器学习解决运动规划的工作,比如斯坦福大学提出Trajectron++。强化学习在规划系统上也有重要应用,最近有一些关于多智能体强化学习,多智能体车流模拟、驾驶行为分析,考虑安全性因素的强化学习,以及拓展到由真人专家在旁边监督,出现危险的时候接管的专家参与的强化学习工作(Online +Imitation Learning、Offline +RL),样本效率极高,是单纯强化学习算法的50倍。为了更好地说明强化学习是如何应用在自动驾驶中的,图 :numref:`rl\_ad`展示了一个基于深度强化学习的自动驾驶POMDP模型。 + +![基于深度强化学习的自动驾驶POMDP模型](../img/ch13/rl_ad.png) + +:width:`800px` + +:label:`rl\_ad` \ No newline at end of file diff --git a/chapter_rl_sys/rl_sys_intro.md b/chapter_rl_sys/rl_sys_intro.md new file mode 100644 index 0000000..54d7405 --- /dev/null +++ b/chapter_rl_sys/rl_sys_intro.md @@ -0,0 +1,30 @@ +## 概述 + +机器人学是一个交叉学科,它涉及了计算机科学、机械工程、电气工程、生物医学工程、数学等多种学科,并有诸多应用,比如自动驾驶汽车、机械臂、无人机、医疗机器人等。机器人能够自主地完成一种或多种任务或者辅助人类完成指定任务。通常,人们把机器人系统划分为感知系统、决策(规划)和控制系统等组成部分。 + +近些年,随着机器学习的兴起,经典机器人技术出现和机器学习技术结合的趋势,称为机器人学习(Robot +Learning)。机器人学习包含了计算机视觉、自然语言处理、语音处理、强化学习和模仿学习等人工智能技术在机器人上的应用,让机器人通过学习,自主地执行各种决策控制任务。 + +机器人学习系统(Robot Learning +System)是一个较新的概念。作为系统和机器人学习的交叉方向,仿照机器学习系统的概念,我们把机器人学习系统定义为"支持机器人模型训练和部署的系统"。按照涉及的机器人数量,可以划分为单机器人学习系统和多机器人学习系统。多机器人学习系统协作和沟通中涉及的安全和隐私问题,也会是一个值得研究的方向。最近机器人学习系统在室内自主移动,道路自动驾驶,机械臂工业操作等行业场景得到充分应用和发展。一些机器人学习基础设施项目也在进行中,如具备从公开可用的互联网资源、计算机模拟和 +真实机器人试验中学习能力的大规模的计算系统RobotBrain。在自动驾驶领域,受联网的自动驾驶汽车 +(CAV) 对传统交通运输行业的影响,"车辆计算"(Vehicle Computing) +(如图:numref:`vehicle-computing`)概念引起广泛关注,并激发了如何让计算能力有限使用周围的CAV计算平台来执行复杂的计算任务的研究。最近,有很多自动驾驶系统的模拟器,代表性的比如CARLA,支持安全RL、MARL、真实地图数据导入、泛化性测试等任务的MetaDrive,还有CarSim和 +TruckSim,它们可以作为各种自动驾驶算法的训练场并对算法效果进行评估。另外针对自动驾驶的系统开发平台也不断涌现,如ERDOS, +D3 (Dynamic +Deadline-Driven)和强调模块化思想的Pylot,可以让模型训练与部署系统与这些平台对接。 + +![车辆计算框架图](../img/ch13/vehicle_computing.png) + +:width:`800px` + +:label:`vehicle\_computing` + +图 :numref:`learning\_decision\_module`是一个典型的感知、规划、控制的模块化设计的自动驾驶系统框架图,接下来,我们也将按照这个顺序依次介绍通用框架、感知系统、规划系统和控制系统。 + +![通过模仿学习进行自动驾驶框架图。 +绿线表示自主驾驶系统的模块化流程。橙色实线表示神经判别器的训练。而橙色虚线表示规划和控制模块是不可微的。但是决策策略可以通过判别器对控制行动的奖励,重新参数化技术进行训练,如蓝色虚线所示。](../img/ch13/idm.png) + +:width:`800px` + +:label:`learning\_decision\_module` diff --git a/chapter_rl_sys/ros.md b/chapter_rl_sys/ros.md new file mode 100644 index 0000000..ea71c10 --- /dev/null +++ b/chapter_rl_sys/ros.md @@ -0,0 +1,106 @@ +## 通用机器人操作系统 + +![ROS/ROS2架构概述](../img/ch13/ROS2_arch.png) + +:width:`800px` + +:label:`ROS2\_arch` + +机器人操作系统(ROS)起源于斯坦福大学人工智能实验室的一个机器人项目。它是一个自由、开源的框架,提供接口、工具来构建先进的机器人。由于机器人领域的快速发展和复杂化,代码复用和模块化的需求日益强烈,ROS适用于机器人这种多节点多任务的复杂场景。目前也有一些机器人、无人机甚至无人车都开始采用ROS作为开发平台。在机器人学习方面,ROS/ROS2可以与深度学习结合,有开发人员为ROS/ROS2开发了的深度学习节点,并支持NVIDIA +Jetson和TensorRT。NVIDIA +Jetson是NVIDIA为自主机器开发的一个嵌入式系统,包括CPU、GPU、PMIC、DRAM +和闪存的一个模组化系统,可以将自主机器软件运作系统运行速率提升。TensorRT +是由 Nvidia 发布的机器学习框架,用于在其硬件上运行机器学习推理。 + +作为一个适用于机器人编程的框架,ROS把原本松散的零部件耦合在了一起,为他们提供了通信架构。虽然叫做"操作系统",ROS更像是一个中间件,给各种基于ROS的应用程序建立起了沟通的桥梁,通过这个中间件,机器人的感知、决策、控制算法可以组织和运行。ROS采用了分布式的设计思想,支持C++、Pyhton等多种编程语言,方便移植。对ROS来讲,最小的进程单元是节点,由节点管理器来管理。参数配置存储在参数服务器中。ROS的通信方式包含:主题(Topic)、服务(Service)、参数服务器(Parameter +Server)、动作库(ActionLib)这四种。 + +ROS提供了很多内置工具,比如三维可视化器rviz,用于可视化机器人、它们工作的环境和传感器数据。它是一个高度可配置的工具,具有许多不同类型的可视化和插件。catkin是ROS +构建系统(类似于Linux下的CMake),Catkin +Workspace是创建、修改、编译catkin软件包的目录。roslaunch可用于在本地和远程启动多个ROS +节点以及在ROS参数服务器上设置参数的工具。此外还有机器人仿真工具Gazebo和移动操作软件和规划框架MoveIt!。ROS为机器人开发者提供了不同编程语言的接口,比如C++语言ROS接口roscpp,python语言的ROS接口rospy。ROS中提供了许多机器人的统一机器人描述格式URDF(Unified +Robot Description +Format)文件,URDF使用XML格式描述机器人文件。ROS也有一些需要提高的地方,比如它的通信实时性能有限,与工业级要求的系统稳定性还有一定差距。 + +ROS2项目在ROSCon 2014上被宣布,第一个ROS2发行版 +Ardent Apalone +是于2017年发布。ROS2增加了对多机器人系统的支持,提高了多机器人之间通信的网络性能,而且支持微控制器和跨系统平台,不仅可以运行在现有的X86和ARM系统上,还将支持MCU等嵌入式微控制器,不止能运行在Linux系统之上,还增加了对Windows、MacOS、RTOS等系统的支持。更重要的是,ROS +2还加入了实时控制的支持,可以提高控制的时效性和整体机器人的性能。ROS +2的通信系统基于DDS(Data Distribution +Service),即数据分发服务,如图:numref:`ROS2\_arch`所示。 + +ROS2依赖于使用shell环境组合工作区。"工作区"(Workspace)是一个ROS术语,表示使用ROS2进行开发的系统位置。核心ROS2 +工作区称为Underlay。随后的工作区称为Overlays。使用ROS2 +进行开发时,通常会同时有多个工作区处于活动状态。接下来我们详细介绍一下ROS2的核心概念(这一部分我们参考了文献 [^1])。 + +### ROS2 Nodes + +ROS +Graph是一个由ROS2元素组成的网络,在同一时间一起处理数据。它包括所有的可执行文件和它们之间的联系。ROS2 +中的每个节点都应负责一个单一的模块用途(例如,一个节点用于控制车轮马达,一个节点用于控制激光测距仪等)。每个节点都可以通过主题、服务、动作或参数向其他节点发送和接收数据。一个完整的机器人系统由许多协同工作的节点组成。在 +ROS 2 中,单个可执行文件(C++ 程序、Python +程序等)可以包含一个或多个节点,如图:numref:`ros2\_graph`。 + +![一个完整的机器人系统由许多协同工作的节点组成。在ROS 2 +中,单个可执行文件(C++ 程序、Python +程序等)可以包含一个或多个节点](../img/ch13/ros2_graph.png) + +:width:`800px` + +:label:`ros2\_graph` + +节点之间的互相发现是通过ROS2底层的中间件实现的。 过程总结如下 + +- 当一个节点启动后, 它会向其他拥有相同ROS域名(ROS domain, + 可以通过设置ROS\_DOMAIN\_ID环境变量来设 + 置)的节点进行广播,说明它已经上线。 + 其他节点在收到广播后返回自己的相关信息,这样节点间的连接就可以建 + 立了,之后就可以通信了。 + +- 节点会定时广播它的信息,这样即使它已经错过了最初的发现过程,它也可以和新上线的节点进行连接。 + +- 节点在下线前它也会广播其他节点自己要下线了。 + +### ROS2 Topics + +ROS2将复杂系统分解为许多模块化节点。主题(Topics)是 ROS +Graph的重要元素,它充当节点交换消息的总线。一个节点可以向任意数量的主题发布数据,同时订阅任意数量的主题,如图:numref:`ros2\_topics`所示。主题是数据在节点之间以及因此在系统的不同部分之间移动的主要方式之一。 + +rqt是ROS的一个软件框架,以插件的形式实现了各种 GUI 工具。可以在 rqt +中将所有现有的GUI工具作为可停靠窗口运行!这些工具仍然可以以传统的独立方法运行,但rqt可以更轻松地同时管理屏幕上的所有各种窗口。 + +![一个节点可以向任意数量的主题发布数据,同时订阅任意数量的主题](../img/ch13/ros2_topics.png) + +:width:`800px` + +:label:`ros2\_topics` + +### ROS 2 Services + +服务(Services)是 ROS +图中节点的另一种通信方式。服务基于调用和响应模型,而不是主题的发布者-订阅者模型。虽然主题允许节点订阅数据流并获得持续更新,但服务仅在客户端专门调用它们时才提供数据。节点可以使用ROS2中的服务进行通信。与主题那种单向通信模式,节点发布可由一个或多个订阅者使用的信息的方式不同 +服务是客户端向节点发出请求的请求/响应模式提供服务,服务处理请求并生成响应。 + +![ROS2服务](../img/ch13/ros2_services.png) + +:width:`800px` + +:label:`ros2\_services` + +### ROS 2 Parameters + +参数(Parameters)是节点的配置值。您可以将参数视为节点设置。节点可以将参数存储为整数、浮点数、布尔值、字符串和列表。在ROS2 +中,每个节点都维护自己的参数。 + +### ROS 2 Actions + +动作(Actions)是ROS2中的一种通信类型,适用于长时间运行的任务。它们由三个部分组成:目标、反馈和结果。动作建立在主题和服务之上。它们的功能类似于服务,除了可以取消动作。它们还提供稳定的反馈,而不是返回单一响应的服务。动作使用客户端-服务器模型,类似于发布者-订阅者模型(在主题教程中描述)。"动作客户端"节点将目标发送到"动作服务器"节点,该节点确认目标并返回反馈流和结果。动作类似于允许您执行长时间运行的任务、提供定期反馈并且可以取消的服务。机器人系统可能会使用动作进行导航。动作目标可以告诉机器人前往某个位置。当机器人导航到该位置时,它可以沿途发送更新(即反馈),然后在到达目的地后发送最终结果消息。 + +![ROS2动作](../img/ch13/ros2_actions.png) + +:width:`800px` + +:label:`ros2\_actions` + + +[^1]: https://docs.ros.org/en/foxy/Tutorials/Understanding-ROS2-Nodes.html \ No newline at end of file diff --git a/chapter_rl_sys/summary.md b/chapter_rl_sys/summary.md new file mode 100644 index 0000000..ba7bb28 --- /dev/null +++ b/chapter_rl_sys/summary.md @@ -0,0 +1,3 @@ +## 小结 + +在这一章,我们简单介绍了机器人学习系统的基本概念,包括通用机器人操作系统、感知系统、规划系统和控制系统等,给读者对机器人学习问题的基本认识。当前,机器人学习是一个快速发展的人工智能分支,许多实际问题都有可能通过机器人学习算法的进一步发展得到解决。另一方面,由于机器人学习问题设置的特殊性,也使得相应系统与相关硬件的耦合程度更高、更复杂:如何更好地平衡各种传感器负载?如何在计算资源有限的情况下最大化计算效率(实时性)?等等,都需要对计算机系统的设计和使用有更好的理解。 diff --git a/img/ch13/ROS2_arch.png b/img/ch13/ROS2_arch.png new file mode 100644 index 0000000..1e5a8ef Binary files /dev/null and b/img/ch13/ROS2_arch.png differ diff --git a/img/ch13/idm.png b/img/ch13/idm.png new file mode 100644 index 0000000..75cd9db Binary files /dev/null and b/img/ch13/idm.png differ diff --git a/img/ch13/orbslam3.png b/img/ch13/orbslam3.png new file mode 100644 index 0000000..44c2863 Binary files /dev/null and b/img/ch13/orbslam3.png differ diff --git a/img/ch13/rl_ad.png b/img/ch13/rl_ad.png new file mode 100644 index 0000000..f186382 Binary files /dev/null and b/img/ch13/rl_ad.png differ diff --git a/img/ch13/ros2_actions.png b/img/ch13/ros2_actions.png new file mode 100644 index 0000000..0fcffae Binary files /dev/null and b/img/ch13/ros2_actions.png differ diff --git a/img/ch13/ros2_graph.png b/img/ch13/ros2_graph.png new file mode 100644 index 0000000..b934e3c Binary files /dev/null and b/img/ch13/ros2_graph.png differ diff --git a/img/ch13/ros2_services.png b/img/ch13/ros2_services.png new file mode 100644 index 0000000..2369bd4 Binary files /dev/null and b/img/ch13/ros2_services.png differ diff --git a/img/ch13/ros2_topics.png b/img/ch13/ros2_topics.png new file mode 100644 index 0000000..075d094 Binary files /dev/null and b/img/ch13/ros2_topics.png differ diff --git a/img/ch13/safe_learning_control.png b/img/ch13/safe_learning_control.png new file mode 100644 index 0000000..5b67f5f Binary files /dev/null and b/img/ch13/safe_learning_control.png differ diff --git a/img/ch13/vehicle_computing.png b/img/ch13/vehicle_computing.png new file mode 100644 index 0000000..d600e54 Binary files /dev/null and b/img/ch13/vehicle_computing.png differ diff --git a/index.md b/index.md index fcc54be..c876304 100644 --- a/index.md +++ b/index.md @@ -24,6 +24,7 @@ chapter_recommender_system/index chapter_federated_learning/index chapter_reinforcement_learning/index chapter_explainable_AI/index +chapter_rl_sys/index ``` @@ -32,4 +33,4 @@ chapter_explainable_AI/index appendix_machine_learning_introduction/index chapter_references/index -``` \ No newline at end of file +``` diff --git a/info/editors.md b/info/editors.md index 5131e2c..e07d5b5 100644 --- a/info/editors.md +++ b/info/editors.md @@ -36,7 +36,9 @@ 第13章 - 可解释性AI系统:[@HaoyangLee](https://github.com/HaoyangLee) -附录:机器学习介绍:[@zsdonghao](https://github.com/zsdonghao) +第14章 - 机器人学习系统:[@HaoyangLee](https://github.com/HaoyangLee) + +附录:机器学习介绍:[@Jack](https://github.com/Jiankai-Sun) ## 加入我们 diff --git a/mlsys.bib b/mlsys.bib index 5193c97..3e910be 100644 --- a/mlsys.bib +++ b/mlsys.bib @@ -872,3 +872,550 @@ series = {PPoPP '21} pages={94--103}, year={2007}, } + +@inproceedings{quigley2009ros, + title={ROS: an open-source Robot Operating System}, + author={Quigley, Morgan and Conley, Ken and Gerkey, Brian and Faust, Josh and Foote, Tully and Leibs, Jeremy and Wheeler, Rob and Ng, Andrew Y and others}, + booktitle={ICRA workshop on open source software}, + volume={3}, + number={3.2}, + pages={5}, + year={2009}, + organization={Kobe, Japan} +} + +@inproceedings{maruyama2016exploring, + title={Exploring the performance of ROS2}, + author={Maruyama, Yuya and Kato, Shinpei and Azumi, Takuya}, + booktitle={Proceedings of the 13th ACM SIGBED International Conference on Embedded Software (EMSOFT)}, + pages={1--10}, + year={2016} +} + +@inproceedings{ding2019camnet, + title={CamNet: Coarse-to-fine retrieval for camera re-localization}, + author={Ding, Mingyu and Wang, Zhe and Sun, Jiankai and Shi, Jianping and Luo, Ping}, + booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, + pages={2871--2880}, + year={2019} +} + +@inproceedings{yi2020segvoxelnet, + title={Segvoxelnet: Exploring semantic context and depth-aware features for 3d vehicle detection from point cloud}, + author={Yi, Hongwei and Shi, Shaoshuai and Ding, Mingyu and Sun, Jiankai and Xu, Kui and Zhou, Hui and Wang, Zhe and Li, Sheng and Wang, Guoping}, + booktitle={2020 IEEE International Conference on Robotics and Automation (ICRA)}, + pages={2274--2280}, + year={2020}, + organization={IEEE} +} + +@ARTICLE{9712373, author={Sun, Jiankai and Huang, De-An and Lu, Bo and Liu, Yun-Hui and Zhou, Bolei and Garg, Animesh}, journal={IEEE Robotics and Automation Letters}, title={PlaTe: Visually-Grounded Planning With Transformers in Procedural Tasks}, year={2022}, volume={7}, number={2}, pages={4924-4930}, doi={10.1109/LRA.2022.3150855}} + +@inproceedings{li2018undeepvo, + title={Undeepvo: Monocular visual odometry through unsupervised deep learning}, + author={Li, Ruihao and Wang, Sen and Long, Zhiqiang and Gu, Dongbing}, + booktitle={2018 IEEE international conference on robotics and automation (ICRA)}, + pages={7286--7291}, + year={2018}, + organization={IEEE} +} + +@inproceedings{quintero2021motion, + title={Motion planning via bayesian learning in the dark}, + author={Quintero-Pena, Carlos and Chamzas, Constantinos and Unhelkar, Vaibhav and Kavraki, Lydia E}, + booktitle={ICRA: Workshop on Machine Learning for Motion Planning}, + year={2021} +} + +@MISC{ML4KP, +author = {Edgar Granados and Aravind Sivaramakrishnan and Troy McMahon and Zakary Littlefield and Kostas E. Bekris}, +title = {Machine Learning for Kinodynamic Planning (ML4KP)}, +howpublished = {\url{https://github.com/PRX-Kinodynamic/ML4KP}}, +year = {2021--2021} +} + +@article{kadubandimotion, + title={Motion Planner Guided Visuomotor Policy Learning}, + author={Kadubandi, Venkata Pradeep and Salhotra, Gautam and Sukhatme, Gaurav S and Englert, Peter} +} + +@article{jangdeep, + title={Deep Neural Network-based Fast Motion Planning Framework for Quadrupedal Robot}, + author={Jang, Jinhyeok and Shin, Heechan and Yoon, Minsung and Hong, Seungwoo and Park, Hae-Won and Yoon, Sung-Eui} +} + + +@article{aradi2020survey, + title={Survey of deep reinforcement learning for motion planning of autonomous vehicles}, + author={Aradi, Szil{\'a}rd}, + journal={IEEE Transactions on Intelligent Transportation Systems}, + year={2020}, + publisher={IEEE} +} + +@article{vianna2021neural, + title={Neural Network Based Model Predictive Control for an Autonomous Vehicle}, + author={Vianna, Maria Luiza Costa and Goubault, Eric and Putot, Sylvie}, + journal={arXiv preprint arXiv:2107.14573}, + year={2021} +} + +@article{tartanvo2020corl, + title = {TartanVO: A Generalizable Learning-based VO}, + author = {Wang, Wenshan and Hu, Yaoyu and Scherer, Sebastian}, + booktitle = {Conference on Robot Learning (CoRL)}, + year = {2020} +} + +@article{qiu2021egocentric, + title={Egocentric Human Trajectory Forecasting with a Wearable Camera and Multi-Modal Fusion}, + author={Qiu, Jianing and Chen, Lipeng and Gu, Xiao and Lo, Frank P-W and Tsai, Ya-Yen and Sun, Jiankai and Liu, Jiaqi and Lo, Benny}, + journal={arXiv preprint arXiv:2111.00993}, + year={2021} +} + +@InProceedings{pmlr-v155-huang21a, + title = {Learning a Decision Module by Imitating Driver’s Control Behaviors}, + author = {Huang, Junning and Xie, Sirui and Sun, Jiankai and Ma, Qiurui and Liu, Chunxiao and Lin, Dahua and Zhou, Bolei}, + booktitle = {Proceedings of the 2020 Conference on Robot Learning}, + pages = {1--10}, + year = {2021}, + editor = {Kober, Jens and Ramos, Fabio and Tomlin, Claire}, + volume = {155}, + series = {Proceedings of Machine Learning Research}, + month = {16--18 Nov}, + publisher = {PMLR}, + pdf = {https://proceedings.mlr.press/v155/huang21a/huang21a.pdf}, + url = {https://proceedings.mlr.press/v155/huang21a.html}, + abstract = {Autonomous driving systems have a pipeline of perception, decision, planning, and control. The decision module processes information from the perception module and directs the execution of downstream planning and control modules. On the other hand, the recent success of deep learning suggests that this pipeline could be replaced by end-to-end neural control policies, however, safety cannot be well guaranteed for the data-driven neural networks. In this work, we propose a hybrid framework to learn neural decisions in the classical modular pipeline through end-to-end imitation learning. This hybrid framework can preserve the merits of the classical pipeline such as the strict enforcement of physical and logical constraints while learning complex driving decisions from data. To circumvent the ambiguous annotation of human driving decisions, our method learns high-level driving decisions by imitating low-level control behaviors. We show in the simulation experiments that our modular driving agent can generalize its driving decision and control to various complex scenarios where the rule-based programs fail. It can also generate smoother and safer driving trajectories than end-to-end neural policies. Demo and code are available at https://decisionforce.github.io/modulardecision/.} +} + + +@InProceedings{pmlr-v155-sun21a, + title = {Neuro-Symbolic Program Search for Autonomous Driving Decision Module Design}, + author = {Sun, Jiankai and Sun, Hao and Han, Tian and Zhou, Bolei}, + booktitle = {Proceedings of the 2020 Conference on Robot Learning}, + pages = {21--30}, + year = {2021}, + editor = {Kober, Jens and Ramos, Fabio and Tomlin, Claire}, + volume = {155}, + series = {Proceedings of Machine Learning Research}, + month = {16--18 Nov}, + publisher = {PMLR}, + pdf = {https://proceedings.mlr.press/v155/sun21a/sun21a.pdf}, + url = {https://proceedings.mlr.press/v155/sun21a.html}, + abstract = {As a promising topic in cognitive robotics, neuro-symbolic modeling integrates symbolic reasoning and neural representation altogether. However, previous neuro-symbolic models usually wire their structures and the connections manually, making the underlying parameters sub-optimal. In this work, we propose the Neuro-Symbolic Program Search (NSPS) to improve the autonomous driving system design. NSPS is a novel automated search method that synthesizes the Neuro-Symbolic Programs. It can produce robust and expressive Neuro-Symbolic Programs and automatically tune the hyper-parameters. We validate NSPS in the CARLA driving simulation environment. The resulting Neuro-Symbolic Decision Programs successfully handle multiple traffic scenarios. Compared with previous neural-network-based driving and rule-based methods, our neuro-symbolic driving pipeline achieves more stable and safer behaviors in complex driving scenarios while maintaining an interpretable symbolic decision-making process.} +} + +@ARTICLE{9491826, author={Lu, Sidi and Shi, Weisong}, journal={IEEE Internet Computing}, title={The Emergence of Vehicle Computing}, year={2021}, volume={25}, number={3}, pages={18-22}, doi={10.1109/MIC.2021.3066076}} + +@article{benekohal1988carsim, + title={CARSIM: Car-following model for simulation of traffic in normal and stop-and-go conditions}, + author={Benekohal, Rahim F and Treiterer, Joseph}, + journal={Transportation research record}, + volume={1194}, + pages={99--111}, + year={1988}, + publisher={SAGE Publishing} +} + +@book{buehler2009darpa, + title={The DARPA urban challenge: autonomous vehicles in city traffic}, + author={Buehler, Martin and Iagnemma, Karl and Singh, Sanjiv}, + volume={56}, + year={2009}, + publisher={springer} +} + + +@InProceedings{pmlr-v100-bansal20a, + title = {Combining Optimal Control and Learning for Visual Navigation in Novel Environments}, + author = {Bansal, Somil and Tolani, Varun and Gupta, Saurabh and Malik, Jitendra and Tomlin, Claire}, + booktitle = {Proceedings of the Conference on Robot Learning}, + pages = {420--429}, + year = {2020}, + editor = {Kaelbling, Leslie Pack and Kragic, Danica and Sugiura, Komei}, + volume = {100}, + series = {Proceedings of Machine Learning Research}, + month = {30 Oct--01 Nov}, + publisher = {PMLR}, + pdf = {http://proceedings.mlr.press/v100/bansal20a/bansal20a.pdf}, + url = {https://proceedings.mlr.press/v100/bansal20a.html}, + abstract = {Model-based control is a popular paradigm for robot navigation because it can leverage a known dynamics model to efficiently plan robust robot trajectories. However, it is challenging to use model-based methods in settings where the environment is a priori unknown and can only be observed partially through onboard sensors on the robot. In this work, we address this short-coming by coupling model-based control with learning-based perception. The learning-based perception module produces a series of waypoints that guide the robot to the goal via a collision-free path. These waypoints are used by a model-based planner to generate a smooth and dynamically feasible trajectory that is executed on the physical system using feedback control. Our experiments in simulated real-world cluttered environments and on an actual ground vehicle demonstrate that the proposed approach can reach goal locations more reliably and efficiently in novel environments as compared to purely geometric mapping-based or end-to-end learning-based alternatives. Our approach does not rely on detailed explicit 3D maps of the environment, works well with low frame rates, and generalizes well from simulation to the real world. Videos describing our approach and experiments are available on the project website4.} +} + +@article{levine2018learning, + title={Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection}, + author={Levine, Sergey and Pastor, Peter and Krizhevsky, Alex and Ibarz, Julian and Quillen, Deirdre}, + journal={The International journal of robotics research}, + volume={37}, + number={4-5}, + pages={421--436}, + year={2018}, + publisher={SAGE Publications Sage UK: London, England} +} + +@incollection{peters2016robot, + title={Robot learning}, + author={Peters, Jan and Lee, Daniel D and Kober, Jens and Nguyen-Tuong, Duy and Bagnell, J Andrew and Schaal, Stefan}, + booktitle={Springer Handbook of Robotics}, + pages={357--398}, + year={2016}, + publisher={Springer} +} + +@article{saxena2014robobrain, + title={Robobrain: Large-scale knowledge engine for robots}, + author={Saxena, Ashutosh and Jain, Ashesh and Sener, Ozan and Jami, Aditya and Misra, Dipendra K and Koppula, Hema S}, + journal={arXiv preprint arXiv:1412.0691}, + year={2014} +} + +@inproceedings{zhu2017target, + title={Target-driven visual navigation in indoor scenes using deep reinforcement learning}, + author={Zhu, Yuke and Mottaghi, Roozbeh and Kolve, Eric and Lim, Joseph J and Gupta, Abhinav and Fei-Fei, Li and Farhadi, Ali}, + booktitle={2017 IEEE international conference on robotics and automation (ICRA)}, + pages={3357--3364}, + year={2017}, + organization={IEEE} +} + +@ARTICLE{9123682, author={Pan, Bowen and Sun, Jiankai and Leung, Ho Yin Tiga and Andonian, Alex and Zhou, Bolei}, journal={IEEE Robotics and Automation Letters}, title={Cross-View Semantic Segmentation for Sensing Surroundings}, year={2020}, volume={5}, number={3}, pages={4867-4873}, doi={10.1109/LRA.2020.3004325}} + +@article{tang2018ba, + title={Ba-net: Dense bundle adjustment network}, + author={Tang, Chengzhou and Tan, Ping}, + journal={arXiv preprint arXiv:1806.04807}, + year={2018} +} + +@inproceedings{tanaka2021learning, + title={Learning To Bundle-Adjust: A Graph Network Approach to Faster Optimization of Bundle Adjustment for Vehicular SLAM}, + author={Tanaka, Tetsuya and Sasagawa, Yukihiro and Okatani, Takayuki}, + booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, + pages={6250--6259}, + year={2021} +} + +@inproceedings{tobin2017domain, + title={Domain randomization for transferring deep neural networks from simulation to the real world}, + author={Tobin, Josh and Fong, Rachel and Ray, Alex and Schneider, Jonas and Zaremba, Wojciech and Abbeel, Pieter}, + booktitle={2017 IEEE/RSJ international conference on intelligent robots and systems (IROS)}, + pages={23--30}, + year={2017}, + organization={IEEE} +} + +@inproceedings{finn2017deep, + title={Deep visual foresight for planning robot motion}, + author={Finn, Chelsea and Levine, Sergey}, + booktitle={2017 IEEE International Conference on Robotics and Automation (ICRA)}, + pages={2786--2793}, + year={2017}, + organization={IEEE} +} + +@article{duan2017one, + title={One-shot imitation learning}, + author={Duan, Yan and Andrychowicz, Marcin and Stadie, Bradly and Jonathan Ho, OpenAI and Schneider, Jonas and Sutskever, Ilya and Abbeel, Pieter and Zaremba, Wojciech}, + journal={Advances in neural information processing systems}, + volume={30}, + year={2017} +} + +@book{koubaa2017robot, + title={Robot Operating System (ROS).}, + author={Koub{\^a}a, Anis and others}, + volume={1}, + year={2017}, + publisher={Springer} +} + +@article{coleman2014reducing, + title={Reducing the barrier to entry of complex robotic software: a moveit! case study}, + author={Coleman, David and Sucan, Ioan and Chitta, Sachin and Correll, Nikolaus}, + journal={arXiv preprint arXiv:1404.3785}, + year={2014} +} + +@inproceedings{salzmann2020trajectron++, + title={Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data}, + author={Salzmann, Tim and Ivanovic, Boris and Chakravarty, Punarjay and Pavone, Marco}, + booktitle={European Conference on Computer Vision}, + pages={683--700}, + year={2020}, + organization={Springer} +} + +@inproceedings{gog2021pylot, + title={Pylot: A modular platform for exploring latency-accuracy tradeoffs in autonomous vehicles}, + author={Gog, Ionel and Kalra, Sukrit and Schafhalter, Peter and Wright, Matthew A and Gonzalez, Joseph E and Stoica, Ion}, + booktitle={2021 IEEE International Conference on Robotics and Automation (ICRA)}, + pages={8806--8813}, + year={2021}, + organization={IEEE} +} + +@inproceedings{Dosovitskiy17, + title = { {CARLA}: {An} Open Urban Driving Simulator}, + author = {Alexey Dosovitskiy and German Ros and Felipe Codevilla and Antonio Lopez and Vladlen Koltun}, + booktitle = {Proceedings of the 1st Annual Conference on Robot Learning}, + pages = {1--16}, + year = {2017} +} + +@inproceedings{10.1145/3492321.3519576, +author = {Gog, Ionel and Kalra, Sukrit and Schafhalter, Peter and Gonzalez, Joseph E. and Stoica, Ion}, +title = {D3: A Dynamic Deadline-Driven Approach for Building Autonomous Vehicles}, +year = {2022}, +isbn = {9781450391627}, +publisher = {Association for Computing Machinery}, +address = {New York, NY, USA}, +url = {https://doi.org/10.1145/3492321.3519576}, +doi = {10.1145/3492321.3519576}, +abstract = {Autonomous vehicles (AVs) must drive across a variety of challenging environments that impose continuously-varying deadlines and runtime-accuracy tradeoffs on their software pipelines. A deadline-driven execution of such AV pipelines requires a new class of systems that enable the computation to maximize accuracy under dynamically-varying deadlines. Designing these systems presents interesting challenges that arise from combining ease-of-development of AV pipelines with deadline specification and enforcement mechanisms.Our work addresses these challenges through D3 (Dynamic Deadline-Driven), a novel execution model that centralizes the deadline management, and allows applications to adjust their computation by modeling missed deadlines as exceptions. Further, we design and implement ERDOS, an open-source realization of D3 for AV pipelines that exposes finegrained execution events to applications, and provides mechanisms to speculatively execute computation and enforce deadlines between an arbitrary set of events. Finally, we address the crucial lack of AV benchmarks through our state-of-the-art open-source AV pipeline, Pylot, that works seamlessly across simulators and real AVs. We evaluate the efficacy of D3 and ERDOS by driving Pylot across challenging driving scenarios spanning 50km, and observe a 68% reduction in collisions as compared to prior execution models.}, +booktitle = {Proceedings of the Seventeenth European Conference on Computer Systems}, +pages = {453–471}, +numpages = {19}, +location = {Rennes, France}, +series = {EuroSys '22} +} + +@article{li2021metadrive, + author = {Li, Quanyi and Peng, Zhenghao and Xue, Zhenghai and Zhang, Qihang and Zhou, Bolei}, + journal = {ArXiv preprint}, + title = {Metadrive: Composing diverse driving scenarios for generalizable reinforcement learning}, + url = {https://arxiv.org/abs/2109.12674}, + volume = {abs/2109.12674}, + year = {2021} +} + +@article{peng2021learning, + author = {Peng, Zhenghao and Li, Quanyi and Hui, Ka Ming and Liu, Chunxiao and Zhou, Bolei}, + journal = {Advances in Neural Information Processing Systems}, + title = {Learning to Simulate Self-Driven Particles System with Coordinated Policy Optimization}, + volume = {34}, + year = {2021} +} + + +@inproceedings{peng2021safe, + author = {Peng, Zhenghao and Li, Quanyi and Liu, Chunxiao and Zhou, Bolei}, + booktitle = {5th Annual Conference on Robot Learning}, + title = {Safe Driving via Expert Guided Policy Optimization}, + year = {2021} +} + +@ARTICLE{8421746, author={Qin, Tong and Li, Peiliang and Shen, Shaojie}, journal={IEEE Transactions on Robotics}, title={VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator}, year={2018}, volume={34}, number={4}, pages={1004-1020}, doi={10.1109/TRO.2018.2853729}} + +@article{campos2021orb, + title={Orb-slam3: An accurate open-source library for visual, visual--inertial, and multimap slam}, + author={Campos, Carlos and Elvira, Richard and Rodr{\'\i}guez, Juan J G{\'o}mez and Montiel, Jos{\'e} MM and Tard{\'o}s, Juan D}, + journal={IEEE Transactions on Robotics}, + volume={37}, + number={6}, + pages={1874--1890}, + year={2021}, + publisher={IEEE} +} + +@inproceedings{li2021efficient, + author = {Li, Quanyi and Peng, Zhenghao and Zhou, Bolei}, + booktitle = {International Conference on Learning Representations}, + title = {Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization}, + year = {2021} +} + +@article{chaplot2020learning, + title={Learning to explore using active neural slam}, + author={Chaplot, Devendra Singh and Gandhi, Dhiraj and Gupta, Saurabh and Gupta, Abhinav and Salakhutdinov, Ruslan}, + journal={arXiv preprint arXiv:2004.05155}, + year={2020} +} + +@article{teed2021droid, + title={Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras}, + author={Teed, Zachary and Deng, Jia}, + journal={Advances in Neural Information Processing Systems}, + volume={34}, + year={2021} +} + +@article{brunke2021safe, + title={Safe learning in robotics: From learning-based control to safe reinforcement learning}, + author={Brunke, Lukas and Greeff, Melissa and Hall, Adam W and Yuan, Zhaocong and Zhou, Siqi and Panerati, Jacopo and Schoellig, Angela P}, + journal={Annual Review of Control, Robotics, and Autonomous Systems}, + volume={5}, + year={2021}, + publisher={Annual Reviews} +} + + +@InProceedings{pmlr-v144-gama21a, + title = {Graph Neural Networks for Distributed Linear-Quadratic Control}, + author = {Gama, Fernando and Sojoudi, Somayeh}, + booktitle = {Proceedings of the 3rd Conference on Learning for Dynamics and Control}, + pages = {111--124}, + year = {2021}, + editor = {Jadbabaie, Ali and Lygeros, John and Pappas, George J. and A. Parrilo, Pablo and Recht, Benjamin and Tomlin, Claire J. and Zeilinger, Melanie N.}, + volume = {144}, + series = {Proceedings of Machine Learning Research}, + month = {07 -- 08 June}, + publisher = {PMLR}, + pdf = {http://proceedings.mlr.press/v144/gama21a/gama21a.pdf}, + url = {https://proceedings.mlr.press/v144/gama21a.html}, + abstract = {The linear-quadratic controller is one of the fundamental problems in control theory. The optimal solution is a linear controller that requires access to the state of the entire system at any given time. When considering a network system, this renders the optimal controller a centralized one. The interconnected nature of a network system often demands a distributed controller, where different components of the system are controlled based only on local information. Unlike the classical centralized case, obtaining the optimal distributed controller is usually an intractable problem. Thus, we adopt a graph neural network (GNN) as a parametrization of distributed controllers. GNNs are naturally local and have distributed architectures, making them well suited for learning nonlinear distributed controllers. By casting the linear-quadratic problem as a self-supervised learning problem, we are able to find the best GNN-based distributed controller. We also derive sufficient conditions for the resulting closed-loop system to be stable. We run extensive simulations to study the performance of GNN-based distributed controllers and showcase that they are a computationally efficient parametrization with scalability and transferability capabilities.} +} + + +@InProceedings{pmlr-v144-mehrjou21a, + title = {Neural Lyapunov Redesign}, + author = {Mehrjou, Arash and Ghavamzadeh, Mohammad and Sch\"olkopf, Bernhard}, + booktitle = {Proceedings of the 3rd Conference on Learning for Dynamics and Control}, + pages = {459--470}, + year = {2021}, + editor = {Jadbabaie, Ali and Lygeros, John and Pappas, George J. and A. Parrilo, Pablo and Recht, Benjamin and Tomlin, Claire J. and Zeilinger, Melanie N.}, + volume = {144}, + series = {Proceedings of Machine Learning Research}, + month = {07 -- 08 June}, + publisher = {PMLR}, + pdf = {http://proceedings.mlr.press/v144/mehrjou21a/mehrjou21a.pdf}, + url = {https://proceedings.mlr.press/v144/mehrjou21a.html}, + abstract = {Learning controllers merely based on a performance metric has been proven effective in many physical and non-physical tasks in both control theory and reinforcement learning. However, in practice, the controller must guarantee some notion of safety to ensure that it does not harm either the agent or the environment. Stability is a crucial notion of safety, whose violation can certainly cause unsafe behaviors. Lyapunov functions are effective tools to assess stability in nonlinear dynamical systems. In this paper, we combine an improving Lyapunov function with automatic controller synthesis in an iterative fashion to obtain control policies with large safe regions. We propose a two-player collaborative algorithm that alternates between estimating a Lyapunov function and deriving a controller that gradually enlarges the stability region of the closed-loop system. We provide theoretical results on the class of systems that can be treated with the proposed algorithm and empirically evaluate the effectiveness of our method using an exemplary dynamical system.} +} + + +@InProceedings{pmlr-v144-zhang21b, + title = {{LEOC}: A Principled Method in Integrating Reinforcement Learning and Classical Control Theory}, + author = {Zhang, Naifu and Capel, Nicholas}, + booktitle = {Proceedings of the 3rd Conference on Learning for Dynamics and Control}, + pages = {689--701}, + year = {2021}, + editor = {Jadbabaie, Ali and Lygeros, John and Pappas, George J. and A. Parrilo, Pablo and Recht, Benjamin and Tomlin, Claire J. and Zeilinger, Melanie N.}, + volume = {144}, + series = {Proceedings of Machine Learning Research}, + month = {07 -- 08 June}, + publisher = {PMLR}, + pdf = {http://proceedings.mlr.press/v144/zhang21b/zhang21b.pdf}, + url = {https://proceedings.mlr.press/v144/zhang21b.html}, + abstract = {There have been attempts in reinforcement learning to exploit a priori knowledge about the structure of the system. This paper proposes a hybrid reinforcement learning controller which dynamically interpolates a model-based linear controller and an arbitrary differentiable policy. The linear controller is designed based on local linearised model knowledge, and stabilises the system in a neighbourhood about an operating point. The coefficients of interpolation between the two controllers are determined by a scaled distance function measuring the distance between the current state and the operating point. The overall hybrid controller is proven to maintain the stability guarantee around the neighborhood of the operating point and still possess the universal function approximation property of the arbitrary non-linear policy. Learning has been done on both model-based (PILCO) and model-free (DDPG) frameworks. Simulation experiments performed in OpenAI gym demonstrate stability and robustness of the proposed hybrid controller. This paper thus introduces a principled method allowing for the direct importing of control methodology into reinforcement learning.} +} + + +@InProceedings{pmlr-v144-rafailov21a, + title = {Offline Reinforcement Learning from Images with Latent Space Models}, + author = {Rafailov, Rafael and Yu, Tianhe and Rajeswaran, Aravind and Finn, Chelsea}, + booktitle = {Proceedings of the 3rd Conference on Learning for Dynamics and Control}, + pages = {1154--1168}, + year = {2021}, + editor = {Jadbabaie, Ali and Lygeros, John and Pappas, George J. and A. Parrilo, Pablo and Recht, Benjamin and Tomlin, Claire J. and Zeilinger, Melanie N.}, + volume = {144}, + series = {Proceedings of Machine Learning Research}, + month = {07 -- 08 June}, + publisher = {PMLR}, + pdf = {http://proceedings.mlr.press/v144/rafailov21a/rafailov21a.pdf}, + url = {https://proceedings.mlr.press/v144/rafailov21a.html}, + abstract = {Offline reinforcement learning (RL) refers to the task of learning policies from a static dataset of environment interactions. Offline RL enables extensive utilization and re-use of historical datasets, while also alleviating safety concerns associated with online exploration, thereby expanding the real-world applicability of RL. Most prior work in offline RL has focused on tasks with compact state representations. However, the ability to learn directly from rich observation spaces like images is critical for real-world applications like robotics. In this work, we build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces. Model-based offline RL algorithms have achieved state of the art results in state based tasks and are minimax optimal. However, they rely crucially on the ability to quantify uncertainty in the model predictions. This is particularly challenging with image observations. To overcome this challenge, we propose to learn a latent-state dynamics model, and represent the uncertainty in the latent space. Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP. Through experiments on a range of challenging image-based locomotion and robotic manipulation tasks, we find that our algorithm significantly outperforms previous offline model-free RL methods as well as state-of-the-art online visual model-based RL methods. Moreover, we also find that our approach excels on an image-based drawer closing task on a real robot using a pre-existing dataset. All results including videos can be found online at \url{https://sites.google.com/view/lompo/}.} +} + +@inproceedings{chen2020transferable, + title={Transferable active grasping and real embodied dataset}, + author={Chen, Xiangyu and Ye, Zelin and Sun, Jiankai and Fan, Yuda and Hu, Fang and Wang, Chenxi and Lu, Cewu}, + booktitle={2020 IEEE International Conference on Robotics and Automation (ICRA)}, + pages={3611--3618}, + year={2020}, + organization={IEEE} +} + +@article{sun2021adversarial, + title={Adversarial inverse reinforcement learning with self-attention dynamics model}, + author={Sun, Jiankai and Yu, Lantao and Dong, Pinqian and Lu, Bo and Zhou, Bolei}, + journal={IEEE Robotics and Automation Letters}, + volume={6}, + number={2}, + pages={1880--1886}, + year={2021}, + publisher={IEEE} +} + +@article{huang2018navigationnet, + title={NavigationNet: A large-scale interactive indoor navigation dataset}, + author={Huang, He and Shen, Yujing and Sun, Jiankai and Lu, Cewu}, + journal={arXiv preprint arXiv:1808.08374}, + year={2018} +} + +@inproceedings{xu2019depth, + title={Depth completion from sparse lidar data with depth-normal constraints}, + author={Xu, Yan and Zhu, Xinge and Shi, Jianping and Zhang, Guofeng and Bao, Hujun and Li, Hongsheng}, + booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, + pages={2811--2820}, + year={2019} +} + +@inproceedings{zhu2020ssn, + title={Ssn: Shape signature networks for multi-class object detection from point clouds}, + author={Zhu, Xinge and Ma, Yuexin and Wang, Tai and Xu, Yan and Shi, Jianping and Lin, Dahua}, + booktitle={European Conference on Computer Vision}, + pages={581--597}, + year={2020}, + organization={Springer} +} + +@inproceedings{huang2019prior, + title={Prior guided dropout for robust visual localization in dynamic environments}, + author={Huang, Zhaoyang and Xu, Yan and Shi, Jianping and Zhou, Xiaowei and Bao, Hujun and Zhang, Guofeng}, + booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, + pages={2791--2800}, + year={2019} +} + +@article{xu2020selfvoxelo, + title={Selfvoxelo: Self-supervised lidar odometry with voxel-based deep neural networks}, + author={Xu, Yan and Huang, Zhaoyang and Lin, Kwan-Yee and Zhu, Xinge and Shi, Jianping and Bao, Hujun and Zhang, Guofeng and Li, Hongsheng}, + journal={arXiv preprint arXiv:2010.09343}, + year={2020} +} + +@article{huang2021life, + title={LIFE: Lighting Invariant Flow Estimation}, + author={Huang, Zhaoyang and Pan, Xiaokun and Xu, Runsen and Xu, Yan and Zhang, Guofeng and Li, Hongsheng and others}, + journal={arXiv preprint arXiv:2104.03097}, + year={2021} +} + +@inproceedings{huang2021vs, + title={VS-Net: Voting with Segmentation for Visual Localization}, + author={Huang, Zhaoyang and Zhou, Han and Li, Yijin and Yang, Bangbang and Xu, Yan and Zhou, Xiaowei and Bao, Hujun and Zhang, Guofeng and Li, Hongsheng}, + booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, + pages={6101--6111}, + year={2021} +} + +@article{yang2021pdnet, + title={PDNet: Towards Better One-stage Object Detection with Prediction Decoupling}, + author={Yang, Li and Xu, Yan and Wang, Shaoru and Yuan, Chunfeng and Zhang, Ziqi and Li, Bing and Hu, Weiming}, + journal={arXiv preprint arXiv:2104.13876}, + year={2021} +} + +@article{xu2022robust, + title={Robust Self-supervised LiDAR Odometry via Representative Structure Discovery and 3D Inherent Error Modeling}, + author={Xu, Yan and Lin, Junyi and Shi, Jianping and Zhang, Guofeng and Wang, Xiaogang and Li, Hongsheng}, + journal={IEEE Robotics and Automation Letters}, + year={2022}, + publisher={IEEE} +} + +@article{xu2022rnnpose, + title={RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization}, + author={Xu, Yan and Lin, Junyi and Zhang, Guofeng and Wang, Xiaogang and Li, Hongsheng}, + journal={arXiv preprint arXiv:2203.12870}, + year={2022} +} + +@inproceedings{Sun2022SelfSupervisedTA, + title={Self-Supervised Traffic Advisors: Distributed, Multi-view Traffic Prediction for Smart Cities}, + author={Jiankai Sun and Shreyas Kousik and David Fridovich-Keil and Mac Schwager}, + year={2022} +}