Loading [MathJax]/jax/output/CommonHTML/config.js
前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

作者头像
WZEARW
发布于 2018-06-05 08:27:22
发布于 2018-06-05 08:27:22
7370
举报
文章被收录于专栏:专知专知

【导读】专知内容组整理了最近六篇强化学习(Reinforcement Learning)相关文章,为大家进行介绍,欢迎查看!

1. Multiagent Soft Q-Learning



作者:Ermo Wei,Drew Wicke,David Freelan,Sean Luke

机构:George Mason University

摘要:Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.

期刊:arXiv, 2018年4月26日

网址

http://www.zhuanzhi.ai/document/8f0ac64488ab8fa83554b4da6cc2f69d

2. Variance Reduction Methods for Sublinear Reinforcement Learning(Sublinear强化学习的方差缩减方法)



作者:Sham Kakade,Mengdi Wang,Lin F. Yang

机构:Princeton University,University of Washington

摘要:This work considers the problem of provably optimal reinforcement learning for episodic finite horizon MDPs, i.e. how an agent learns to maximize his/her long term reward in an uncertain environment. The main contribution is in providing a novel algorithm --- Variance-reduced Upper Confidence Q-learning (vUCQ) --- which enjoys a regret bound of $\widetilde{O}(\sqrt{HSAT} + H^5SA)$, where the $T$ is the number of time steps the agent acts in the MDP, $S$ is the number of states, $A$ is the number of actions, and $H$ is the (episodic) horizon time. This is the first regret bound that is both sub-linear in the model size and asymptotically optimal. The algorithm is sub-linear in that the time to achieve $\epsilon$-average regret for any constant $\epsilon$ is $O(SA)$, which is a number of samples that is far less than that required to learn any non-trivial estimate of the transition model (the transition model is specified by $O(S^2A)$ parameters). The importance of sub-linear algorithms is largely the motivation for algorithms such as $Q$-learning and other "model free" approaches. vUCQ algorithm also enjoys minimax optimal regret in the long run, matching the $\Omega(\sqrt{HSAT})$ lower bound. Variance-reduced Upper Confidence Q-learning (vUCQ) is a successive refinement method in which the algorithm reduces the variance in $Q$-value estimates and couples this estimation scheme with an upper confidence based algorithm. Technically, the coupling of both of these techniques is what leads to the algorithm enjoying both the sub-linear regret property and the asymptotically optimal regret.

期刊:arXiv, 2018年4月26日

网址

http://www.zhuanzhi.ai/document/298d70f33245af2313394e0f6de96a73

3. Reinforced Mnemonic Reader for Machine Reading Comprehension(基于强化记忆的机器阅读理解)



作者:Minghao Hu,Yuxing Peng,Zhen Huang,Xipeng Qiu,Furu Wei,Ming Zhou

机构:Fudan University,National University of Defense Technology

摘要:In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.

期刊:arXiv, 2018年4月25日

网址

http://www.zhuanzhi.ai/document/37c93c1bdb68c3559d4f5f1740093d7d

4. Accelerated Reinforcement Learning(加速强化学习)



作者:K. Lakshmanan

摘要:Policy gradient methods are widely used in reinforcement learning algorithms to search for better policies in the parameterized policy space. They do gradient search in the policy space and are known to converge very slowly. Nesterov developed an accelerated gradient search algorithm for convex optimization problems. This has been recently extended for non-convex and also stochastic optimization. We use Nesterov's acceleration for policy gradient search in the well-known actor-critic algorithm and show the convergence using ODE method. We tested this algorithm on a scheduling problem. Here an incoming job is scheduled into one of the four queues based on the queue lengths. We see from experimental results that algorithm using Nesterov's acceleration has significantly better performance compared to algorithm which do not use acceleration. To the best of our knowledge this is the first time Nesterov's acceleration has been used with actor-critic algorithm.

期刊:arXiv, 2018年4月25日

网址

http://www.zhuanzhi.ai/document/23e73fe759219ab8d58317acce28dc5f

5. No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling(没有一个标准是完美的:对视觉叙事的对抗性奖励学习)



作者:Xin Wang,Wenhu Chen,Yuan-Fang Wang,William Yang Wang

机构:University of California

摘要:Though impressive results have been achieved in visual captioning, the task of generating abstract stories from photo streams is still a little-tapped problem. Different from captions, stories have more expressive language styles and contain many imaginary concepts that do not appear in the images. Thus it poses challenges to behavioral cloning algorithms. Furthermore, due to the limitations of automatic metrics on evaluating story quality, reinforcement learning methods with hand-crafted rewards also face difficulties in gaining an overall performance boost. Therefore, we propose an Adversarial REward Learning (AREL) framework to learn an implicit reward function from human demonstrations, and then optimize policy search with the learned reward function. Though automatic evaluation indicates slight performance boost over state-of-the-art (SOTA) methods in cloning expert behaviors, human evaluation shows that our approach achieves significant improvement in generating more human-like stories than SOTA systems.

期刊:arXiv, 2018年4月25日

网址

http://www.zhuanzhi.ai/document/5dc12a9a8438755a167e2aa4a12f3fff

6. Neural Network Based Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction(用基于神经网络的强化学习做人机交互中的视听注视控制)



作者:Stéphane Lathuilière,Benoit Massé,Pablo Mesejo,Radu Horaud

摘要:This paper introduces a novel neural network-based reinforcement learning approach for robot gaze control. Our approach enables a robot to learn and to adapt its gaze control strategy for human-robot interaction neither with the use of external sensors nor with human supervision. The robot learns to focus its attention onto groups of people from its own audio-visual experiences, independently of the number of people, of their positions and of their physical appearances. In particular, we use a recurrent neural network architecture in combination with Q-learning to find an optimal action-selection policy; we pre-train the network using a simulated environment that mimics realistic scenarios that involve speaking/silent participants, thus avoiding the need of tedious sessions of a robot interacting with people. Our experimental evaluation suggests that the proposed method is robust against parameter estimation, i.e. the parameter values yielded by the method do not have a decisive impact on the performance. The best results are obtained when both audio and visual information is jointly used. Experiments with the Nao robot indicate that our framework is a step forward towards the autonomous learning of socially acceptable gaze behavior.

期刊:arXiv, 2018年4月23日

网址

http://www.zhuanzhi.ai/document/1a62223dcd16d5d4af934daaba9c11b6

-END-

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2018-04-28,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 专知 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
暂无评论
推荐阅读
编辑精选文章
换一批
【论文推荐】最新5篇深度强化学习相关论文推荐—经验驱动的网络、自动数据库管理、双光技术推荐系统、UAVs、多代理竞争对手
【导读】专知内容组整理了最近强化学习相关文章,为大家进行介绍,欢迎查看! 1. Experience-driven Networking: A Deep Reinforcement Learning based Approach(经验驱动的网络:一种基于深度强化学习的方法) ---- ---- 作者:Zhiyuan Xu,Jian Tang,Jingsong Meng,Weiyi Zhang,Yanzhi Wang,Chi Harold Liu,Dejun Yang 摘要:Modern communicat
WZEARW
2018/04/13
4.6K0
【论文推荐】最新5篇深度强化学习相关论文推荐—经验驱动的网络、自动数据库管理、双光技术推荐系统、UAVs、多代理竞争对手
【论文推荐】最新5篇知识图谱相关论文—强化学习、习知识图谱的表示、词义消除歧义、并行翻译嵌入、图数据库
【导读】专知内容组整理了最近五篇知识图谱(Knowledge Graph)相关文章,为大家进行介绍,欢迎查看! 1. DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning(DeepPath:一种知识图推理的强化学习方法) ---- 作者:Wenhan Xiong,Thien Hoang,William Yang Wang 摘要:We study the problem of learning to reason in
WZEARW
2018/04/12
1.6K0
【论文推荐】最新5篇知识图谱相关论文—强化学习、习知识图谱的表示、词义消除歧义、并行翻译嵌入、图数据库
【论文推荐】最新八篇强化学习相关论文—残差网络、QMIX、元学习、动态速率分配、分层强化学习、抽象概况、快速物体检测、SOM
【导读】专知内容组整理了最近八篇强化学习(Reinforcement learning)相关文章,为大家进行介绍,欢迎查看! 1.BlockDrop: Dynamic Inference Paths in Residual Networks(BlockDrop:残差网络中的动态推断路径) ---- ---- 作者:Zuxuan Wu,Tushar Nagarajan,Abhishek Kumar,Steven Rennie,Larry S. Davis,Kristen Grauman,Rogerio Fer
WZEARW
2018/04/13
1.5K0
【论文推荐】最新八篇强化学习相关论文—残差网络、QMIX、元学习、动态速率分配、分层强化学习、抽象概况、快速物体检测、SOM
【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配
【导读】专知内容组整理了最近五篇度量学习(Metric Learning )相关文章,为大家进行介绍,欢迎查看! 1.Mining on Manifolds: Metric Learning without Labels(对流形的挖掘:无标签的度量学习) ---- ---- 作者:Ahmet Iscen,Giorgos Tolias,Yannis Avrithis,Ondrej Chum 摘要:In this work we present a novel unsupervised framework fo
WZEARW
2018/04/13
1.2K0
【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配
Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦
References on Optimal Control, Reinforcement Learning and Motion Planning
AI研习社
2019/05/08
2.2K0
Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦
【重磅最新】ICLR2023顶会376篇深度强化学习论文得分出炉(376/4753,占比8%)
声明:本文整理自顶会ICLR-2023官方,强化学习相关文章大约共计376篇(376/4753), 占比8%,整理难免有不足之处,还望交流指正。
深度强化学习实验室
2022/12/31
5.7K0
【重磅最新】ICLR2023顶会376篇深度强化学习论文得分出炉(376/4753,占比8%)
【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标
【导读】专知内容组整理了最近六篇图像描述生成(Image Caption)相关文章,为大家进行介绍,欢迎查看! 1. Unpaired Image Captioning by Language Pivoting(以语言为枢纽生成不成对图像的描述) ---- 作者:Jiuxiang Gu,Shafiq Joty,Jianfei Cai,Gang Wang 机构:Alibaba AI Labs,Nanyang Technological University 摘要:Image captioning is a m
WZEARW
2018/04/08
9740
【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标
【论文推荐】最新5篇图像分割(Image Segmentation)相关论文—多重假设、超像素分割、自监督、图、生成对抗网络
【导读】专知内容组整理了最近五篇图像分割(Image Segmentation)相关文章,为大家进行介绍,欢迎查看! 1. Improved Image Segmentation via Cost Minimization of Multiple Hypotheses(通过多重假设最小化损失改进图像分割性能) ---- ---- 作者:Marc Bosch,Christopher M. Gifford,Austin G. Dress,Clare W. Lau,Jeffrey G. Skibo,Gordon
WZEARW
2018/04/13
1.3K0
【论文推荐】最新5篇图像分割(Image Segmentation)相关论文—多重假设、超像素分割、自监督、图、生成对抗网络
【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割
【导读】专知内容组整理了最近六篇图像分割(Image Segmentation)相关文章,为大家进行介绍,欢迎查看! 1.Combination of Hidden Markov Random Field and Conjugate Gradient for Brain Image Segmentation(基于隐马尔可夫随机场与共轭梯度结合的脑图像分割) 作者:EL-Hachemi Guerrout,Samy Ait-Aoudia,Dominique Michelucci,Ramdane Mahiou 机
WZEARW
2018/04/08
1.1K0
【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割
【论文推荐】最新七篇推荐系统相关论文—正则化奇异值、用户视角、CTR预测、Top-k、人机交互、隐反馈
【导读】既昨天推出六篇推荐系统(Recommended System)相关,专知内容组今天又推出最近七篇推荐系统相关文章,为大家进行介绍,欢迎查看! 1. Regularized Singular Value Decomposition and Application to Recommender System(正则化奇异值分解和其在推荐系统的应用) ---- ---- 作者:Shuai Zheng,Chris Ding,Feiping Nie 机构:University of Texas at Arlin
WZEARW
2018/06/05
7190
为你分享73篇论文解决深度强化学习的18个关键问题
来源:PaperWeekly 作者:王凌霄 本文共2434字,建议阅读5分钟。 本文为大家分享了73篇论文,介绍深度学习的方法策略以及关键问题分析。 这两天我阅读了两篇篇猛文 A Brief Survey of Deep Reinforcement Learning 和 Deep Reinforcement Learning: An Overview,作者排山倒海的引用了 200 多篇文献,阐述强化学习未来的方向。 论文:A Brief Survey of Deep Reinforcement Lear
数据派THU
2018/01/29
1K0
为你分享73篇论文解决深度强化学习的18个关键问题
【最全总结】离线强化学习(Offline RL)数据集、Benchmarks、经典算法、软件、竞赛、落地应用、核心算法解读汇总
Supported by: Nanjing University and Polixir
深度强化学习实验室
2022/12/31
3K0
【最全总结】离线强化学习(Offline RL)数据集、Benchmarks、经典算法、软件、竞赛、落地应用、核心算法解读汇总
博弈论与多智能体强化学习「建议收藏」
Ann Nowe´, Peter Vrancx, and Yann-Michae¨l De Hauwere
全栈程序员站长
2022/11/07
1.9K0
【论文推荐】最新八篇目标跟踪相关论文—自适应相关滤波、因果关系图模型、TrackingNet、ClickBAIT、图像矩模型
【导读】专知内容组整理了最近八篇目标跟踪(Object Tracking)相关文章,为大家进行介绍,欢迎查看! 1. Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking(基于具有长期和短期自适应记忆相关滤波的目标跟踪) 作者:Chao Ma,Jia-Bin Huang,Xiaokang Yang,Ming-Hsuan Yang 机构:National Tsing Hua Universit
WZEARW
2018/04/25
1.1K0
【论文推荐】最新八篇目标跟踪相关论文—自适应相关滤波、因果关系图模型、TrackingNet、ClickBAIT、图像矩模型
【论文推荐】最新八篇主题模型相关论文—主题建模优化、变分推断、情绪强度、神经语言模型、搜索、社区聚合、主题建模的问题、光谱学习
【导读】专知内容组整理了最近八篇主题模型(Topic Model)相关文章,为大家进行介绍,欢迎查看! 1. Application of Rényi and Tsallis Entropies to Topic Modeling Optimization(Renyi和Tsallis熵在主题建模优化中的应用) ---- ---- 作者:Koltcov Sergei 机构:National Research University Higher School of Economics, 摘要:This is f
WZEARW
2018/04/16
1.3K0
【论文推荐】最新八篇主题模型相关论文—主题建模优化、变分推断、情绪强度、神经语言模型、搜索、社区聚合、主题建模的问题、光谱学习
【强化学习纲要】8 模仿学习「建议收藏」
周博磊《强化学习纲要》 学习笔记 课程资料参见: https://github.com/zhoubolei/introRL. 教材:Sutton and Barton 《 Reinforcement Learning: An Introduction》
全栈程序员站长
2022/10/02
9000
【强化学习纲要】8 模仿学习「建议收藏」
【深度强化学习】—— 入门
‍Deep RL(Deep Reinforcement Learning) is a type of Machine Learning where an agent learns how to behave in an environment by performing actions and seeing the results.(译:强化学习是机器学习的一个分支,强化学习最大的特点是在交互中学习(Learning from Interaction)。Agent 在与环境的交互中根据获得的奖励或惩罚不断的学习知识,更加适应环境。RL学习的范式非常类似于我们人类学习知识的过程,也正因此,RL被视为实现通用AI重要途径。)
WEBJ2EE
2022/03/30
6390
【深度强化学习】—— 入门
【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割
【导读】专知内容组整理了最近六篇图像分割(Image Segmentation)相关文章,为大家进行介绍,欢迎查看! 1.Virtual-to-Real: Learning to Control in Visual Semantic Segmentation 作者:Zhang-Wei Hong,Chen Yu-Ming,Shih-Yang Su,Tzu-Yun Shann,Yi-Hsiang Chang,Hsuan-Kung Yang,Brian Hsi-Lin Ho,Chih-Chieh Tu,Yueh-
WZEARW
2018/04/16
8930
【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割
【论文推荐】最新六篇生成式对抗网络(GAN)相关论文—半监督学习、对偶、交互生成对抗网络、激活、纳什均衡、tempoGAN
【导读】专知内容组整理了最近六篇生成式对抗网络(GAN)相关文章,为大家进行介绍,欢迎查看! 1. Exploiting the potential of unlabeled endoscopic video data with self-supervised learning(基于半监督学习的无标签内窥镜视频数据分析方法) ---- ---- 作者:Tobias Ross,David Zimmerer,Anant Vemuri,Fabian Isensee,Manuel Wiesenfarth,Sebas
WZEARW
2018/04/16
1.1K0
【论文推荐】最新六篇生成式对抗网络(GAN)相关论文—半监督学习、对偶、交互生成对抗网络、激活、纳什均衡、tempoGAN
【论文推荐】最新5篇推荐系统相关论文—文档向量矩阵分解、异构网络融合、树结构深度模型、深度强化学习、负二项矩阵分解
【导读】专知内容组整理了最近五篇推荐系统(Recommender System)相关文章,为大家进行介绍,欢迎查看! 1. ParVecMF: A Paragraph Vector-based Matrix Factorization Recommender System(ParVecMF:基于文档向量矩阵分解模型的推荐系统) ---- ---- 作者:Georgios Alexandridis,Georgios Siolas,Andreas Stafylopatis 摘要:Review-based rec
WZEARW
2018/04/13
1.2K0
【论文推荐】最新5篇推荐系统相关论文—文档向量矩阵分解、异构网络融合、树结构深度模型、深度强化学习、负二项矩阵分解
推荐阅读
【论文推荐】最新5篇深度强化学习相关论文推荐—经验驱动的网络、自动数据库管理、双光技术推荐系统、UAVs、多代理竞争对手
4.6K0
【论文推荐】最新5篇知识图谱相关论文—强化学习、习知识图谱的表示、词义消除歧义、并行翻译嵌入、图数据库
1.6K0
【论文推荐】最新八篇强化学习相关论文—残差网络、QMIX、元学习、动态速率分配、分层强化学习、抽象概况、快速物体检测、SOM
1.5K0
【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配
1.2K0
Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦
2.2K0
【重磅最新】ICLR2023顶会376篇深度强化学习论文得分出炉(376/4753,占比8%)
5.7K0
【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标
9740
【论文推荐】最新5篇图像分割(Image Segmentation)相关论文—多重假设、超像素分割、自监督、图、生成对抗网络
1.3K0
【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割
1.1K0
【论文推荐】最新七篇推荐系统相关论文—正则化奇异值、用户视角、CTR预测、Top-k、人机交互、隐反馈
7190
为你分享73篇论文解决深度强化学习的18个关键问题
1K0
【最全总结】离线强化学习(Offline RL)数据集、Benchmarks、经典算法、软件、竞赛、落地应用、核心算法解读汇总
3K0
博弈论与多智能体强化学习「建议收藏」
1.9K0
【论文推荐】最新八篇目标跟踪相关论文—自适应相关滤波、因果关系图模型、TrackingNet、ClickBAIT、图像矩模型
1.1K0
【论文推荐】最新八篇主题模型相关论文—主题建模优化、变分推断、情绪强度、神经语言模型、搜索、社区聚合、主题建模的问题、光谱学习
1.3K0
【强化学习纲要】8 模仿学习「建议收藏」
9000
【深度强化学习】—— 入门
6390
【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割
8930
【论文推荐】最新六篇生成式对抗网络(GAN)相关论文—半监督学习、对偶、交互生成对抗网络、激活、纳什均衡、tempoGAN
1.1K0
【论文推荐】最新5篇推荐系统相关论文—文档向量矩阵分解、异构网络融合、树结构深度模型、深度强化学习、负二项矩阵分解
1.2K0
相关推荐
【论文推荐】最新5篇深度强化学习相关论文推荐—经验驱动的网络、自动数据库管理、双光技术推荐系统、UAVs、多代理竞争对手
更多 >
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档